top of page
realcode4you

Python Programming Help | Python Tutorial



In this blog we will cover all of the python topics which is important for everyone. If you have not any previous knowledge about python then learn python tutorial from: If you have no knowledge of programming, then the official Python tutorial is a good place to start.


Boolean Operations

Frequently, one wants to combine or modify boolean values. Python has several operations for just this purpose:

  • not a: returns the opposite value of a.

  • a and b: returns true if and only if both a and b are true.

  • a or b: returns true either a or b are true, or both.

Like mathematical expressions, boolean expressions can be nested using parentheses.


var1 = 5
var2 = 6
var3 = 7 

print(var1 + var2 == 11 and var2 + var3 == 13)
print(var1 + var2 == 12 and var2 + var3 == 13)
print(var1 + var2 == 12 or var2 + var3 == 13)

Output

True False True



String Formatting:


Often one wants to embed other information into strings, sometimes with special formatting constraints. In python, one may insert special formatting characters into strings that convey what type of data should be inserted and where, and how the "stringified" form should be formatted. For instance, one may wish to insert an integer into a string:


message = 2
print("To be or not %.2f be" % message)

Output

To be or not 2.00 be


Note the %d formatting (or conversion) specifier in the string. This is stating that you wish to insert an integer value (more on these conversion specifiers below). Then the value you wish to insert into the string is separated by a % character placed after the string. If you wish to insert more than one value into the string being formatted, they can be placed in a comma separated list, surrounded by parentheses after the %


print("%d be or not %d be" % (2, 2))

Output

2 be or not 2 be


In detail, a conversion specifier contains two or more characters which must occur in order with the following components:

  • The % character which marks the start of the specifier

  • An optional minimum field width. The value being read is padded to be at least this width

  • An optional precision value, given as a "." followed by the number of digits precision.

  • Conversion specifier flag specified below.

For a more detailed treatment on string formatting options, see here.

Some common conversion flag characters are:

  • d: Signed integer decimal.

  • i: Signed integer decimal.

  • e: Floating point exponential format (lowercase).

  • E: Floating point exponential format (uppercase).

  • f: Floating point decimal format.

  • c: Single character (accepts integer or single character string).

  • r: String (converts any python object using repr()).

  • s: String (converts any python object using str()).


print("%d %s or not %02.3f %c" % (2, "be", 10.0/3, 'b'))

Output

2 be or not 3.333 b


Exercise

  • Divide 10 by 3 and print the percentage with two decimal points

print("%s is %02.2f" % ("The Result upto 2 decimal places", 10.0/3))

Output

The Result upto 2 decimal places is 3.33



Python Data Structures


We have covered in detail much of the basics of python's primitive data types. Its now useful to consider how these basic types can be collected in ways that are meaningful and useful for a variety of tasks. Data structures are a fundamental component of programming, a collection of elements of data that adhere to certain properties, depending on the type. In these notes, we'll present three basic data structures, the list, the set, and the dictionary. Python data structures are very rich, and beyond the scope of this simple primer. Please see the documentation for a more complete view.


List:

A list, sometimes called an array or a vector is an ordered collection of values. The value of a particular element in a list is retrieved by querying for a specific index into an array. Lists allow duplicate values, but indicies are unique. In python, like most programming languages, list indices start at 0, that is, to get the first element in a list, request the element at index 0. Lists provide very fast access to elements at specific positions, but are inefficient at "membership queries," determining if an element is in the array.

In python, lists are specified by square brackets, [ ], containing zero or more values, separated by commas. Lists are the most common data structure, and are often generated as a result of other functions, for instance, a_string.split(" ").

To query a specific value from a list, pass in the requested index into square brackets following the name of the list. Negative indices can be used to traverse the list from the right.


a_list = [1, 2, 3, 0, 5, 10, 11]
a_list[0:2]

Output

[1, 2]


print(a_list[-1]) # indexing from the right
print(a_list[2:])

Output

11 [3, 0, 5, 10, 11]



another_list = ["a", "b", "c"]
empty_list = []
mixed_list = [1, "a"]

print(another_list[1])

print(empty_list)

Output

b []



Some common functionality of lists:

  • list.append(x): add an element ot the end of a list

  • list_1.extend(list_2): add all elements in the second list to the end of the first list

  • list.insert(index, x): insert element x into the list at the specified index. Elements to the right of this index are shifted over

  • list.pop(index): remove the element at the specified position

  • list.index(x): looks through the list to find the specified element, returning it's position if it's found, else throws an error

  • list.count(x): counts the number of occurrences of the input element

  • list.sort(): sorts the list of items, list.sort(key = , reverse = True/False) default is False.

  • list.reverse(): reverses the order of the list


a = ["Python", "is", "one"]
a.sort(key=len, reverse = True)
a

Output

['Python', 'one', 'is']



Exercise

  • Add the letter "d" in another_list and print the result

  • Add the letter "c" in another_list and print the result

  • If you search for "c" in another_list using the list.index(x) command, what is the result?

  • Sort another_list and print the result


#Add the letter "d" in another_list and print the result
another_list.append("d")
print(another_list)

Output

['a', 'b', 'c', 'd']



#Add the letter "c" in another_list and print the result
another_list.append("c")
print(another_list)

Output

['a', 'b', 'c', 'd', 'c']



#If you search for "c" in another_list using the list.index(x) command, what is the result?
another_list.index("c")

Output

2



#Sort another_list and print the result
another_list.sort()

print(another_list)

Output

['a', 'b', 'c', 'c', 'd']



Set

A set is a data structure where all elements are unique. Sets are unordered. In fact, the order of the elements observed when printing a set might change at different points during a programs execution, depending on the state of python's internal representation of the set. Sets are ideal for membership queries, for instance, is a user amongst those users who have received a promotion?

Sets are specified by curly braces, { }, containing one or more comma separated values. To specify an empty set, you can use the alternative construct, set().


some_set = {1, 2, 3, 4}
another_set = {4, 5, 6}
empty_set = set()

print(some_set)
print(another_set)
print(empty_set)

len(some_set)

Output

{1, 2, 3, 4}

{4, 5, 6}

set()

4


We can also create a set from a list:

my_list = [1, 2, 3, 0, 5, 10, 11, 1, 5]
your_list = [1, 2, 3, 0, 11, 10]
my_set = set(my_list)
your_set = set(your_list)
print(my_set)
print(your_set)

Output

{0, 1, 2, 3, 5, 10, 11}

{0, 1, 2, 3, 10, 11}


type(my_set)

Output:

set


print(my_set.issubset(your_set))
print(type(my_set))
print(len(my_set))

Output:

False <class 'set'> 7


The easiest way to check for membership in a set is to use the `in` keyword, checking if a needle is "`in`" the haystack set.


my_set = {1, 2, 3}
print("The value 1 appears in the variable some_set:", 1 in my_set)
print("The value 0 appears in the variable some_set:", 0 in my_set)
print(1 in my_set)
print(4 not in my_set)

Output:

The value 1 appears in the variable some_set: True The value 0 appears in the variable some_set: False True True



We also have the "not in" operator

some_set = {1, 2, 3, 4}
val = 5
print("Check that the value %d does not appear in the variable some_set:" % val, (val not in some_set))
val = 1
print("Check that the value %d does not appear in the variable some_set:" % val, (val not in some_set))

Output:

Check that the value 5 does not appear in the variable some_set: True Check that the value 1 does not appear in the variable some_set: False



Some other common set functionality:

  • set_a.add(x): add an element to a set

  • set_a.remove(x): remove an element from a set

  • set_a - set_b: elements in a but not in b. Equivalent to set_a.difference(set_b)

  • set_a | set_b: elements in a or b. Equivalent to set_a.union(set_b)

  • set_a & set_b: elements in both a and b. Equivalent to set_a.intersection(set_b)

  • set_a ^ set_b: elements in a or b but not both. Equivalent to set_a.symmetric_difference(set_b)

  • set_a <= set_b: tests whether every element in set_a is in set_b. Equivalent to set_a.issubset(set_b)


some_set = {1, 2, 3, 4}
another_set = {4, 5, 6}


Exercise

Try the above yourself using the some_set and another_set variables from above

  • find the set of elements in some_set but not in another_set

  • find the set of elements in either some_set or in another_set

  • find the set of elements in both some_set and another_set

  • remove the element 4 from some_set

  • add an element 7 to "another_set


#find the set of elements in some_set but not in another_set
some_set.difference(another_set)

Output:

{1, 2, 3}


#find the set of elements in either some_set or in another_set
some_set.union(another_set)

Output:

{1, 2, 3, 4, 5, 6}



#find the set of elements in both some_set and another_set
some_set.intersection(another_set)

Output

{4}


#remove the element 4 from some_set
some_set.remove(4)
some_set

Output

{1, 2, 3}


#add an element 7 to "another_set
another_set.add(7)
another_set

Output

{4, 5, 6, 7}



Tuples

A tuple is a sequence of immutable Python objects. Tuples are sequences, just like lists. The differences between tuples and lists are, the tuples cannot be changed unlike lists and tuples use parentheses, whereas lists use square brackets.


t = 12345, 54321, 'hello!'
print(t)
type(t)

Output:

(12345, 54321, 'hello!') tuple


print(t[2])

Output

hello!


len(t)

Output:

3


# Empty tuple
t_2=()
print(t_2)
type(t_2)

Output:

() tuple


t_3 = (1,)
type(t_3)
len(t_3)

Output:

1


#Concatenation
print((1, 2, 3) + (4, 5, 6))

Output:

(1, 2, 3, 4, 5, 6)


#Repetition
print(('Hi!') * 4)

Output:

Hi!Hi!Hi!Hi!


#Membership
print(3 in (1, 2, 3))

Output:

True


#Iteration
for x in (1, 2, 3): 
    print(x)

Output:

1 2 3



print("Two elements. The first one: %s and the second one %s:" % ("UCI", "merage"))

Output:

Two elements. The first one: UCI and the second one merage:



No Enclosing Delimiters

Any set of multiple objects, comma-separated, written without identifying symbols, i.e., brackets for lists, parentheses for tuples, etc., default to tuples, as indicated below:


print('abc', -4.24e93, 18+6.6j, 'xyz')
x, y = 1, 2
print("Value of x , y : ", x,y)
print(x)
print(y)
type(('abc', -4.24e93, 18+6.6j, 'xyz'))

Output:

abc -4.24e+93 (18+6.6j) xyz

Value of x , y : 1 2

1

2

tuple


Let's rewrite the Fibonacci script from above using tuples, to avoid the temporary variable t:

# Fibonacci series:
# the sum of two elements defines the next
a = 0
b = 1
while b < 20: 
    print(b)
    t = a # temporary variable
    a = b
    b = t+b

Output:

1 1 2 3 5 8 13


# Fibonacci series:
# the sum of two elements defines the next
a = 0
b = 1
while b < 20:
    #print(a, b)
    (a, b) = (b, a+b)
    print(a, b)


Output:

1 1 1 2 2 3 3 5 5 8 8 13 13 21



Dictionaries:

Dictionaries, sometimes called dicts, maps, or, rarely, hashes are data structures containing key-value pairs. Dictionaries have a set of unique keys and are used to retrieve the value information associated with these keys. For instance, a dictionary might be used to store for each user, that user's location, or for a product id, the description associated with that product. Lookup into a dictionary is very efficient, and because these data structures are very common, they are frequently used and encountered in practice.


Dictionaries are specified by curly braces, { }, containing zero or more comma separated key-value pairs, where the keys and values are separated by a colon, :. Like a list, values for a particular key are retrieved by passing the query key into square brackets.


a_dict = {1:["python", "is", 2], "b":2, "c":3, "d": 4}
another_dict = {"c":5, "d":6}
empty_dict = {}
print(a_dict["c"])
len(a_dict)
type(a_dict)

Output:

3 dict


a_dict.keys()

Output:

dict_keys([1, 'b', 'c', 'd'])


a_dict.values()

Output:

dict_values([['python', 'is', 2], 2, 3, 4])



Like the set, the easiest way to check if a particular key is in a map is through the in keyword:

print("a" in a_dict)
print("b" in another_dict)

Output:

False False



Some common operations on dictionaries:

  • dict.keys(): returns a list containing the keys of a dictionary

  • dict.values(): returns a list containing the values in a dictionary

  • dict.pop(x): removes the key and its associated value from the dictionary


data = {
        "Ben": {
                  "Job": "Professor",
                  "YOB": "1976",
                  "Children":["Gregory","Anna"]
                  },
        "Joe": {
                "Job": "Data Scientist",
                  "YOB": "1981"}
}
print("Job" in data["Ben"])
print("1981" in data["Joe"])
print("1981" == data["Joe"]["YOB"])
print("1981" in data["Joe"].keys())

Output:

True False True False



Exercise

  • Find the length of a_dict

  • Find the common keys in a_dict and another_dict

  • Find the common values in a_dict and another_dict

  • Check whether "1981" is in data


#Find the length of a_dict
len(a_dict)

Output:

3


#Find the common keys in a_dict and another_dict

#first convert it into set and then find the intersection
a_dict_set = set(a_dict)
another_dict_set = set(another_dict)

for common_keys in a_dict_set.intersection(another_dict_set):
    print(common_keys)

Output:

c


#Find the common values in a_dict and another_dict

#first convert it into set and then find the intersection
a_dict_set = set(a_dict.values())
another_dict_set = set(another_dict.values())

for common_values in a_dict_set.intersection(another_dict_set):
    print(common_values)

#Check whether "1981" is in data
data = {
        "Ben": {
                  "Job": "Professor",
                  "YOB": "1976",
                  "Children":["Gregory","Anna"]
                  },
        "Joe": {
                "Job": "Data Scientist",
                  "YOB": "1981"}
}

for data_name, data_info in data.items():
    
    for key in data_info:
        if data_info[key] == '1981':
            print(key + ':', data_info[key])

Output:

YOB: 1981



Combining (Nesting) Data Structures:

There are many opportunities to combine data types in python. Lists can be populated by arbitrary data structures. Similarly, you can use any type as the value in a dictionary. However, the elements of sets, and the keys of dictionaries need to have some special properties that allow the mechanics of the data structure to determine how to store the element.


print("lists of lists")
lol = [[1, 2, 3], [4, 5, 6, 7]]
lol_2 = [[4, 5, 6], [7, 8, 9]]
print(lol)
print(lol[1][1])
print(len(lol[0]))

Output:

lists of lists [[1, 2, 3], [4, 5, 6, 7]] 5 3


print("lists of lists of lists")
lolol = [lol, lol_2]
print(lolol)
lolol[0][0][0]

Output:

lists of lists of lists [[[1, 2, 3], [4, 5, 6, 7]], [[4, 5, 6], [7, 8, 9]]]

1


print("retrieving data from this data structure")
print(lolol[0])
print(lolol[0][0])
print(lolol[0][0][0])

Output:

retrieving data from this data structure [[1, 2, 3], [4, 5, 6, 7]] [1, 2, 3] 1



print("data structures as values in a dictionary")
dlol = {"lol":lol, "lol_2":lol_2}
print(dlol)
print(dlol["lol"][0][0])

Output:

data structures as values in a dictionary {'lol': [[1, 2, 3], [4, 5, 6, 7]], 'lol_2': [[4, 5, 6], [7, 8, 9]]} 1


print("retrieving data from this dictionary")
print(dlol["lol"])
print(dlol["lol"][0])
print(dlol["lol"][0][0])

Output:

retrieving data from this dictionary [[1, 2, 3], [4, 5, 6, 7]] [1, 2, 3] 1



Control Structures We've spent some time going into detail about some of the data types and structures available in python. It's now time to talk about how to navigate through some of this data, and use data to make decisions. Traversing over data and making decisions based upon data are a common aspect of every programming language, known as control flow. Python provides a rich control flow, with a lot of conveniences for the power users. Here, we're just going to talk about the basics, and to learn more, please consult the documentation. A common theme throughout this discussion of control structures is the notion of a "block of code." Blocks of code are demarcated by a specific level of indentation, typically separated from the surrounding code by some control structure elements, immediately preceeded by a colon, :. We'll see examples below. Finally, note that control structures can be nested arbitrarily, depending on the tasks you're trying to accomplish.

if Statements: If statements are perhaps the most widely used of all control structures. An if statement consists of a code block and an argument. The if statement evaluates the boolean value of it's argument, executing the code block if that argument is true.

if False:
    print("duh")
print("bye")

Output:

bye


if 1+1 == 2:
    print("easy")

Output:

easy


items = {1, 2, 3}
if 2 in items:
    print("found it! I found the element 5")

Output:

found it! I found the element 5


Each argument in the above if statements is a boolean expression. Often you want to have alternatives, blocks of code that get evaluated in the event that the argument to an if statement is false. This is where elif (else if) and else come in.


An elif is evaluated if all preceeding if or elif arguments have evaluted to false. The else statement is the last resort, assigning the code that gets exectued if no if or elif above it is true. These statements are optional, and can be added to an if statement in any order, with at most one code block being evaluated. An else will always have it's code be exectued, if nothing above it is true.


if 1+2 == 2:
    print("whoa")
    x = 5+1
    print("done")
elif 1+1 == 3:
    print("that explains it")
elif 5+5 == 10:
    print("something")
    if "something":
        print("hi")
    else:
        print("what I expected")
# else:
#     print("omg")

Output:

something hi


x = {1,2,3}
if 5 in x:
    print("found it")
else:
    print("didn't find it")
    x.add(5)

print(x)

Output:

didn't find it {1, 2, 3, 5}



for Statements:

for statements are a convenient way to iterate through the values contained in a data structure. Going through the elements in a data structure one at a time, this element is assigned to variable. The code block associated with the for statement (or for loop) is then evaluated with this value.


my_set = {1, 2, 3, 4}
for foobar in my_set:
    print(foobar, " squared is:", foobar*foobar)

Output:

1 squared is: 1 2 squared is: 4 3 squared is: 9 4 squared is: 16



print("a more complex block")
for num in my_set:
    if num >= 3:
        print(num+5)

Output:

a more complex block 8 9



print("this also works for lists")
my_list = [1,2,3]
for num in my_list:
    if num >= 2:
        print(num+5)

Output:

this also works for lists 7 8



print("dictionaries let you iterate through keys, values, or both")
my_dict = {"a":1, "b":2}

for k in my_dict.keys():
    value = my_dict[k]
    print(k)
    print(value)
    
#my_dict.keys()
#my_dict['a']

Output:

dictionaries let you iterate through keys, values, or both a 1 b 2



for v in my_dict.values():
    print(v)

Output:

1

2



#dict.items(): Return a copy of the dictionary’s list of (key, value) pairs.

#dict.iteritems(): Return an iterator over the dictionary’s (key, value) pairs.
for k,v in my_dict.items(): 
    print(k, v)
    if v == 1:
        print(k)

Output:

a 1 a b 2



print("dictionaries let you iterate through keys, values, or both")
my_dict = {"a":1, "b":2}

for k,v in my_dict.items(): #Returns an iterator over the dictionary’s (key, value) pairs
    if v == my_dict[k]:
        print("whew! the value %05.1f" % v, " is in the dictionary, with a key %s" % k)

Output:

dictionaries let you iterate through keys, values, or both whew! the value 001.0 is in the dictionary, with a key a whew! the value 002.0 is in the dictionary, with a key b



print("dictionaries let you iterate through keys, values, or both")
my_dict = {"a":1, "b":2, "c": 3, "d":4, "e": 4, "f": 4}

for k,v in my_dict.items(): #Returns an iterator over the dictionary’s (key, value) pairs
    if v == 4:
        print("The key %s has the value 4" % k)

Output:

dictionaries let you iterate through keys, values, or both The key d has the value 4 The key e has the value 4 The key f has the value 4



Break and Continue:

These two statements are used to modify iteration of loops. Break is used to exit the most inner loop in which it appears. Continue the current pass through the loop, going on to the next iteration.


x = [1,3,4,5]
for y in x:
    print(y)
    for num in x:
        if num > 2:
            break
        print(num)
         

Output:

1 1 3 1 4 1 5 1



y = ["a", "b", "c", "d"]
for letter in y:
    if letter == "b":
        continue
    print(letter)

Output:

a c d



y = ["a", "b", "c", "d"]
for letter in y:
    if letter == "b":
        break
    print(letter)

Output:

a



Ranges of Integers:

Often it is convenient to define (and iterate through) ranges of integers. Python has a convenient range function that allows you to do just this.


print(range(-5, 5, 2))

Output:

range(-5, 5, 2)



print(range(3)) # start at zero, < the specified ceiling value
print(range(-5, 5)) #from the left value, < right value
print(range(-5, 5, 2)) #from the left value, to the middle value, incrementing by the right value

for x in range(-5, 5):
    if x > 0:
        print("%d is positive" % x)

Output:

range(0, 3) range(-5, 5) range(-5, 5, 2) 1 is positive 2 is positive 3 is positive 4 is positive



User Defined Functions

Functions assign a name to a block of code the way variables assign names to bits of data. This seeminly benign naming of things is incredibly powerful; allowing one to reuse common functionality over and over. Well-tested functions form building blocks for large, complex systems. As you progress through python, you'll find yourself using powerful functions defined in some of python's vast libraries of code.


Function definitions begin with the def keyword, followed by the name you wish to assign to a function. Following this name are parentheses, ( ), containing zero or more variable names, those values that are passed into the function. There is then a colon, followed by a code block defining the actions of the function:



def print_hi():
    print("hi!")
    print("hihihi")
    
print_hi()

Output:

hi! hihihi



def hi_you(name):
    print("hi %s!" % name.upper())

hi_you("Ben")
hi_you("Dan")
hi_you("Adam")

Output:

hi BEN! hi DAN! hi ADAM!



def square(num):
    squared = num*num
    return squared

print(square(5))

Output:

25


for i in range(15):
    print("The square of %2d" %i, "is %3d" % square(i)) #Remember that 3d means the integer is of a minimum width of 3

Output:

The square of 0 is 0 The square of 1 is 1 The square of 2 is 4 The square of 3 is 9 The square of 4 is 16 The square of 5 is 25 The square of 6 is 36 The square of 7 is 49 The square of 8 is 64 The square of 9 is 81 The square of 10 is 100 The square of 11 is 121 The square of 12 is 144 The square of 13 is 169 The square of 14 is 196



for i in range(15):
    print("The square of %d" %i, "is %d" % square(i))

Output:

The square of 0 is 0 The square of 1 is 1 The square of 2 is 4 The square of 3 is 9 The square of 4 is 16 The square of 5 is 25 The square of 6 is 36 The square of 7 is 49 The square of 8 is 64 The square of 9 is 81 The square of 10 is 100 The square of 11 is 121 The square of 12 is 144 The square of 13 is 169 The square of 14 is 196


Note that the fucntion square has a special keyword return. The argument to return is passed to whatever piece of code is calling the function. In this case, the square of the number that was input.


Variables set inside of functions are said to be scoped to those functions: changes, including any new variables created, are only accessible while in the function code block (with some exceptions). If "outside" variables are modified inside a function's context, the contents of that variable are first copied.


Similarly, changes or modifications to a function's arguments aren't reflected once the scope is returned; The variable will continue to point to the original thing. However, it is possible to modify the thing that is passed, assuming that it is mutable.



# inside a function's context, changes to a variable defined outside that
# context aren't reflected once the context is returned

name = "Ben"
def do_something():
    print("We are now in the function!")
    name = "not Ben"
    print(name)
    print("something! ... and we are out")
    
do_something()

Output:

We are now in the function! not Ben something! ... and we are out


print(name)

Output:

Ben


# but outside variables can be read!
def do_something_else():
     print(name)
do_something_else()

Output:

Ben


def do_something_new(some_name):
    some_name = "nothing"
    print(some_name)
do_something_new(name)

Output:

nothing



# mutable objects can be modified
a_list = [1,2,3]
def add_sum(some_list):
    s = sum(some_list)
    some_list.append(s) # numbers, strings and tuples are immutable, while dictionaries, sets and lists are mutable.
    return s

tot = add_sum(a_list)
print(tot)
print(a_list)

Output:

6 [1, 2, 3, 6]



# try again!
tot = add_sum(a_list)
print(tot)
print(a_list)

Output:

12 [1, 2, 3, 6, 12]



# variables created in a function aren't accessible 
# outside that function's context
def do_something_new():
    #thing = "123"
    print("Hi!")
do_something_new()
#print(thing)

Output:

Hi!



def times_two(my_input):
    my_input = 2*my_input
    return my_input

a = 4
print(times_two(a))
print(a)

Output:

8

4



Files and Printing

You'll often be reading data from a file, or writing the output of your python scripts back into a file. Python makes this very easy. You need to open a file in the appropriate mode, using the open function, then you can read or write to accomplish your task. The open function takes two arguments, the name of the file, and the mode. The mode is a single letter string that specifies if you're going to be reading from a file, writing to a file, or appending to the end of an existing file. The function returns a file object that performs the various tasks you'll be performing: a_file = open(filename, mode). The modes are:

  • 'r': open a file for reading

  • 'w': open a file for writing. Caution: this will overwrite any previously existing file

  • 'a': append. Write to the end of a file.

When reading, you typically want to iterate through the lines in a file using a for loop, as above. Some other common methods for dealing with files are:

  • file.read(): read the entire contents of a file into a string

  • file.readline(): read one line of a file

  • file.write(some_string): writes to the file, note this doesn't automatically include any new lines. Also note that sometimes writes are buffered- python will wait until you have several writes pending, and perform them all at once

  • file.flush(): write out any buffered writes

  • file.close(): close the open file. This will free up some computer resources occupied by keeping a file open.

  • file.seek(position): moves to a specific position within a file. Note that position is specified in bytes.

Here is an example using files:


my_file = open("temp.txt", "w")
my_list = ["a", "b", "c", "d"]
my_set = {1, 2, 3, 4}

for x in my_list:
    my_file.write("letter: %s\n" % x)
    print("letter: %s\n" % x)
for n in my_set:
    my_file.write("number: %d\n" % n)
    print("number: %d\n" % n)
my_file.flush()
my_file.close()

Output:

letter: a letter: b letter: c letter: d number: 1 number: 2 number: 3 number: 4



file_2 = open("temp.txt", "r")
for line in file_2:
    print(line) # note that this doesn't strip off the newlines
file_2.close()

Output:

letter: a letter: b letter: c letter: d number: 1 number: 2 number: 3 number: 4


file_3 = open("temp.txt", "r")
content = file_3.read()
print(content)
file_3.close()

Output:

letter: a letter: b letter: c letter: d number: 1 number: 2 number: 3 number: 4



# filter rows
file_4 = open("temp.txt", "r")
for line in file_4:
    if line.count("m") > 0:
        break
    print(line.strip()) # remove the extra newline.
file_4.close()

Output:

letter: a letter: b letter: c letter: d



# filter columns
file_5 = open("temp.txt", "r")
for line in file_5:
    columns = line.strip().split(": ") # create a list by splitting the line on the " " and ":" characters
    print("_".join(columns)) # prints the columns as a string, using the "_" char as a separator
    if columns[1] != "b": # if the second element of the list is NOT b, 
        print(columns) # then print the list
    
file_5.close()

Output:

letter_a ['letter', 'a'] letter_b letter_c ['letter', 'c'] letter_d ['letter', 'd'] number_1 ['number', '1'] number_2 ['number', '2'] number_3 ['number', '3'] number_4 ['number', '4']



Importing Libraries

One of the greatest strengths of the python programming language is its rich set of libraries- pre-written code that implements a variety of functionality. For the data scientist, python's libraries (also called "modules") are particularly valuable. With a little bit of research into the details of python's libraries, a lot of common data tasks are little more than a function call away. Libraries exist for doing data cleaning, analysis, visualization, machine learning and statistics.

This XKCD cartoon pretty much summarizes what Python libraries can do...

In order to have access to a libraries functionality in a block of code, you must first import it. Importing a library tells python that while executing your code, it should not only consider the code and functions that you have written, but code and functions in the libraries that you have imported.

There are several ways to import modules in python, some have ebetter properties than others. Below we see the preferred general way to import modules. In documentation, you may see other ways to import libraries (from a_library import foo). There is no risk to just copying this pattern if it is known to work.

Imagine I want to import a library called some_python_library. This can be done using the import commands. All code below that import statement has access to the library contents.

  • import some_python_library: imports the module some_python_library, and creates a reference to that module in the current namespace. Or in other words, after you’ve run this statement, you can use some_python_library.name to refer to things defined in module some_python_library.

  • import some_python_library as plib: imports the module some_python_library and sets an alias for that library that may be easier to refer to. To refer to a thing defined in the library some_python_library, use plib.name.

In practice you'll see the second pattern used very frequently; pandas referred to as pd, numpy referred to as np, etc.


import math as m
number = 2
print(m.sqrt(number))

Output:

1.4142135623730951


import math as m
print(m.log(8))

Output:

2.0794415416798357


Example: Matplotlib

Matplotlib is one of the first python libraries a budding data scientist is likely to encounter. Matplotlib is a feature-rich plotting framework, capable of most plots you'll likely need. The interface to the matplotlib module mimics the plotting functionality in Matlab, another language and environment for scientific computing. If you're familiar with Matlab plots, matplotlib will seem very familiar. Even the plots look almost identical.


Here, we'll cover some basic functionality of matplotlib, line and bar plots and histograms. As with most content convered in this course, this is just scratching the surface. For more info, including many great examples, please consult the official matplotlib documentation. A typical pattern for me when plotting things in python is to find an example that closely mirrors what I'm trying to do, copy this, and tweak until i get things right.


Note: to get plots to appear inline in ipython notebooks, you must invoke the "magic function" %matplotlib inline. To have a stand-alone python app plot in a new window, use plt.show().


In most cases, the input to matplotlib plotting functions is arrays of numerical types, floats or integers.



# used to embed plots inside an ipython notebook
%matplotlib inline 
import matplotlib.pyplot as plt

# really simple example:
y = [1,2,3,4,5,4,3,2,1]
x = [1,2,3,4,5,6,7,8,9]
plt.plot(x, y)

Output:











import numpy as np

X = np.linspace(0, 10, 101) #create values from 0 to 10, and use 101 values
print(X)

Output:

[ 0. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1. 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2. 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3. 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4. 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5. 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 6. 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 7. 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 8. 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9 9. 9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.8 9.9 10. ]



import numpy as np

X = np.linspace(0, 10, 101) #create values from 0 to 10, and use 101 values
Y = []

for x in X:
    y = m.sin(x)
    Y.append(y)
    
plt.plot(X, Y, 'mx')
plt.title('The Sine Wave')
plt.xlabel('X')
plt.ylabel('sin(X)')

Output:











Notice that most of the functionality in matplotlib that we're using is in the sub-module matplotlib.pyplot.


The third argument (i.e., the 'r-.') in the plot function plt.plot(X, Y, 'r-.') is a formatting specifier. This defines some properties for a line to be displayed. Some details: Color characters:

  • b: blue

  • k: black

  • r: red

  • c: cyan

  • m: magenta

  • y: yellow

  • g: green

  • w: white

Some line/marker formatting specifiers:

  • -: solid line style

  • --: dashed line style

  • -.: dash-dot line style

  • :: dotted line style

  • .: point marker

  • ,: pixel marker

  • o: circle marker

  • +: plus marker

  • x: x marker

There are many other options for plots that can be specified. See documentation for more info. We will also revisit this topic in the Visualization lectures.


It is possible to plot multiple plots on the same y-axis. In order to do this, the Y data passed into the plot function must be a list of lists, each with the same length as the X data that is input:


Y = []
for x in X:
    y = [m.sin(x), m.cos(x), 0.1*x]
    Y.append(y)

plt.plot(X, Y)
plt.legend(['sin(x)', 'cos(x)', 'x/10'])

Output:











It is also possible to just plot Y data without corresponding X values. In this case, the index in the array is assumed to be X.


plt.plot(Y)
plt.xlabel('index')
plt.ylabel('f(x)')
plt.legend(['sin(x)', 'cos(x)'])

Output:












Alternately, multiple calls to plot can be made with differing data. Doing so overlays the subsequent plots, creating the same effect.

Y = []
Z = []
for x in X:
    Y.append(m.sin(x))
    Z.append(m.cos(x))
    
plt.plot(X, Y, 'b-.')
plt.plot(X, Z, 'r--')
plt.legend(['sin(x)', 'cos(x)'])

Output:












Bar plots are often a good way to compare data in categories. This is an easy matter with matplotlib, the interface is almost identical to the that used when making line plots.


vals = [7, 6.2, 3, 5, 9]
xval = [1, 2, 3, 4, 5]
plt.bar(xval, vals)

Output:











Histograms are extremely useful for analyzing data. Histograms partition numerical data into a discrete number of buckets (called bins), and return the number of values within each bucket. Typically this is displayed as a bar plot.


Y = []
for x in range(0,100000):
    Y.append(np.random.randn())
    
plt.hist(Y, 50)

Output:

(array([1.000e+00, 8.000e+00, 9.000e+00, 2.000e+01, 2.200e+01, 4.600e+01, 7.300e+01, 1.380e+02, 1.860e+02, 3.010e+02, 4.270e+02, 6.280e+02, 9.120e+02, 1.216e+03, 1.541e+03, 2.144e+03, 2.687e+03, 3.241e+03, 3.893e+03, 4.587e+03, 5.142e+03, 5.641e+03, 6.043e+03, 6.395e+03, 6.632e+03, 6.435e+03, 6.308e+03, 5.834e+03, 5.296e+03, 4.779e+03, 4.142e+03, 3.570e+03, 2.904e+03, 2.311e+03, 1.890e+03, 1.384e+03, 9.950e+02, 7.610e+02, 4.800e+02, 3.680e+02, 2.380e+02, 1.570e+02, 9.100e+01, 5.500e+01, 2.600e+01, 1.800e+01, 1.300e+01, 5.000e+00, 3.000e+00, 4.000e+00]), array([-4.02772307, -3.86460184, -3.7014806 , -3.53835937, -3.37523814, -3.2121169 , -3.04899567, -2.88587444, -2.72275321, -2.55963197, -2.39651074, -2.23338951, -2.07026827, -1.90714704, -1.74402581, -1.58090457, -1.41778334, -1.25466211, -1.09154087, -0.92841964, -0.76529841, -0.60217717, -0.43905594, -0.27593471, -0.11281348, 0.05030776, 0.21342899, 0.37655022, 0.53967146, 0.70279269, 0.86591392, 1.02903516, 1.19215639, 1.35527762, 1.51839886, 1.68152009, 1.84464132, 2.00776256, 2.17088379, 2.33400502, 2.49712626, 2.66024749, 2.82336872, 2.98648995, 3.14961119, 3.31273242, 3.47585365, 3.63897489, 3.80209612, 3.96521735, 4.12833859]), <a list of 50 Patch objects>)


Output:












Class Objects

Python is what is known as an object-oriented programming language. This means python allows a programmer to define special custom data structures called classes that not only can contain their own data elements, but special functions called methods that can potentially alter a class instance's internal state.


Classes are defined through the keyword class, followed by the name of the class, which, by convention, is capitalized. This is followed by a code block that specifies the methods that define a class. Note that classes are a rich and complex topic in python. However, much of the functionality a data scientist may wish to use, in particular, python's machine learning libraries, will be accessed through class objects. Please see the official documentation for more info.


# defining a class
class TestClass:
    def im_a_class(self):
        print("hi! i'm a class!")

    def hello(self, name):
        print("hello %s!" % name)
obj = TestClass()
obj.im_a_class()
obj.hello("naveen")

Output:

hi! i'm a class! hello naveen!


Note that the method functions inside the class definition take a special extra parameter, self. This tells the method that it is assigned to an example of a class, and when it is invoked, it potentially operates on that example, but not other examples of that class.


Concrete examples of classes are called class objects. These are created using a special function called a class constructor. In python, unless the programmer specifies otherwise, all classes are assigned a class constructor that doesn't take any arguments, and doesn't do anything beyond create a new example class object. These constructor functions are invoked by calling the class name as if you were calling a function, that is, using the class name with parentheses afterwards.


Methods associated with a class can be invoked by the special dot operator (.). Here, you take a class object, a concrete example of a class, often assigned to a variable, then use the dot character (.), then the method you wish to call. This method operates only on the class object to the left of the dot. The arguments passed to these methods ignore the special self, keyword mentioned above, you only need to pass in what is to the right of this keyword, if anything.


As mentioned above, classes always have a special function called a constructor that is used to build concrete instances of class objects. A programmer can define their own constructor function, defining any actions that are performed when building a new class object, and data that are used internally within a class object. Like all functions, constructors can take arguments that can be used during their execution. Like methods in a class, the constructor definition takes the special self parameter as the left most argument in it's definition. This allows you to modify the internal state of the class object being constructed. Here is an example class with a custom constructor.


Note that internal variables or methods can be accessed through the dot operator on self.


class Person:
    
    def __init__(self, first_name, last_name):
        # this constructor sets the values of "member variables"
        # in the concrete class object being constructed
        self.first = first_name
        self.last = last_name
        self.screams = 0

    def scream(self):
        # modifying an internal example
        self.screams = self.screams + 1
        print("%s has screamed %d times" % (self.first, self.screams))
        
ben = Person("Ben", "Chan")
ben.scream()
ben.scream()
# accessing the value of a member variable
print("First Name:", ben.first)
print("Last Name:", ben.last)
print("Num of Screams:", ben.screams)

Output:

Ben has screamed 1 times

Ben has screamed 2 times

First Name: Ben

Last Name: Chan

Num of Screams: 2



List Comprehensions

The practical data scientist often faces situations where one list is to be transformed into another list, transforming the values in the input array, filtering out certain undesired values, etc. List comprehensions are a natural, flexible way to perform these transformations on the elements in a list.


The syntax of list comprehensions is based on the way mathematicians define sets and lists, a syntax that leaves it clear what the contents should be:

  • S = {x² : x in {0 ... 9}}

  • V = (1, 2, 4, 8, ..., 2¹²)

  • M = {x | x in S and x even}

Python's list comprehensions give a very natural way to write statements just like these. You can write math-like expressions without having to much special syntax.



import math
S = [math.pow(x, 2) for x in range(0,10)]
print(S)

Output:

[0.0, 1.0, 4.0, 9.0, 16.0, 25.0, 36.0, 49.0, 64.0, 81.0]



S = []
for x in range(0,10):
     S.append(math.pow(x,2))
print(S)

Output:

[0.0, 1.0, 4.0, 9.0, 16.0, 25.0, 36.0, 49.0, 64.0, 81.0]



import math
S = [math.pow(x, 2) for x in range(0,10)]
V = [math.pow(2, x) for x in range(0, 13)]
M = [x for x in S if x%2 == 0]

M=[]
for x in S:
    if x%2 ==0:
        M.append(x)
        
print(M)
# print(S)
# print(V)
# print(M)

Output:

[0.0, 4.0, 16.0, 36.0, 64.0]


Note the list comprehension for deriving M uses a "if statement" to filter out those values that aren't of interest, restricting to only the even perfect squares.


These are simple examples, using numerical compuation. In the following operation we transform a string into an list of values, a more complex operation:


words = 'The quick brown fox jumps over the lazy dog'
[[w.upper(), w.lower(), len(w)] for w in words.split()]

words.split()

Output:

['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']


data = {
        "Ben": {
                  "Job": "Professor",
                  "YOB": "1976",
                  "Children":["Gregory","Anna"]
                  },
        "Joe": {
                "Job": "Data Scientist",
                  "YOB": "1981"}
}
data

Output:

{'Ben': {'Job': 'Professor', 'YOB': '1976', 'Children': ['Gregory', 'Anna']}, 'Joe': {'Job': 'Data Scientist', 'YOB': '1981'}}







Opmerkingen


bottom of page