Python II¶

In this class, we will watch the second of four lectures by Dr. Mike Gelbart, option co-director of the UBC-Vancouver MDS program.

Lecture Outline¶

  • Comments (0 min)

  • Why Python? (0 min)

  • Loops (15 min)

  • Comprehensions (5 min)

  • Functions intro (10 min)

  • DRY principle (15 min)

  • Break (5 min)

  • Keyword arguments (5 min)

  • Docstrings (10 min)

  • Unit tests, corner cases (10 min)

  • Multiple return values (5 min)

Attribution¶

Comments in python (0 min)¶

x = 1 # this is a comment
"""
this is a string, which does nothing
and can be used as a comment
"""

7


x = 1

Why Python? (0 min)¶

  • Why did we choose Python in the MDS program?

    • Extremely popular in DS (and beyond!)

    • Relatively easy to learn

    • Good documentation

    • Huge user community

      • Lots of Stack Overflow and other forums

      • Lots of useful packages (more onm this next week)

Loops (10 min)¶

  • Loops allow us to execute a block of code multiple times.

  • We will focus on for loops

for n in [2, 7, -1, 5]:
    print("The number is", n, "its square is", n**2)
    # this is inside the loop
# this is outside the loop
The number is 2 its square is 4
The number is 7 its square is 49
The number is -1 its square is 1
The number is 5 its square is 25

The main points to notice:

  • Keyword for begins the loop

  • Colon : ends the first line of the loop

  • We can iterate over any kind of iterable: list, tuple, range, string. In this case, we are iterating over the values in a list

  • Block of code indented is executed for each value in the list (hence the name “for” loops, sometimes also called “for each” loops)

  • The loop ends after the variable n has taken all the values in the list

"abc" + "def"
'abcdef'
word = "Python"
for letter in word:
    print("Gimme a " + letter + "!")

print("What's that spell?!! " + word + "!")
Gimme a P!
Gimme a y!
Gimme a t!
Gimme a h!
Gimme a o!
Gimme a n!
What's that spell?!! Python!
  • A very common pattern is to use for with range.

  • range gives you a sequence of integers up to some value.

for i in range(10):
    print(i)
0
1
2
3
4
5
6
7
8
9

We can also specify a start value and a skip-by value with range:

for i in range(1,101,10):
    print(i)
1
11
21
31
41
51
61
71
81
91

We can write a loop inside another loop to iterate over multiple dimensions of data. Consider the following loop as enumerating the coordinates in a 3 by 3 grid of points.

for x in [1,2,3]:
    for y in ["a","b","c"]:
        print((x,y))
(1, 'a')
(1, 'b')
(1, 'c')
(2, 'a')
(2, 'b')
(2, 'c')
(3, 'a')
(3, 'b')
(3, 'c')
list_1 = [1,2,3]
list_2 = ["a","b","c"]
for i in range(3):
    print(list_1[i], list_2[i])
1 a
2 b
3 c

We can loop through key-value pairs of a dictionary using .items():

courses = {521 : "awesome",
           551 : "riveting",
           511 : "naptime!"}

for course_num, description in courses.items():
    print("DSCI", course_num, "is", description)
DSCI 521 is awesome
DSCI 551 is riveting
DSCI 511 is naptime!
for course_num in courses:
    print(course_num, courses[course_num])
521 awesome
551 riveting
511 naptime!

Above: the general syntax is for key, value in dictionary.items():

while loops¶

  • We can also use a while loop to excute a block of code several times.

  • In reality, I rarely use these.

  • Beware! If the conditional expression is always True, then you’ve got an infintite loop!

    • (Use the “Stop” button in the toolbar above, or Ctrl-C in the terminal, to kill the program if you get an infinite loop.)

n = 10
while n > 0:
    print(n)
    n = n - 1

print("Blast off!")
10
9
8
7
6
5
4
3
2
1
Blast off!

Comprehensions (5 min)¶

Comprehensions allow us to build lists/tuples/sets/dictionaries in one convenient, compact line of code.

words = ["hello", "goodbye", "the", "antidisestablishmentarianism"]

y = [word[-1] for word in words]  # list comprehension
y
['o', 'e', 'e', 'm']
y = list()
for word in words:
    y.append(word[-1])
y
['o', 'e', 'e', 'm']
y = (word[-1] for word in words)  # this is NOT a tuple comprehension - more on generators later
print(y)
<generator object <genexpr> at 0x7f8b2c6dd950>
y = {word[-1] for word in words}  # set comprehension
print(y)
{'o', 'e', 'm'}
word_lengths = {word : len(word) for word in words} # dictionary comprehension
word_lengths
{'hello': 5, 'goodbye': 7, 'the': 3, 'antidisestablishmentarianism': 28}

Functions intro (5 min)¶

  • Define a function to re-use a block of code with different input parameters, also known as arguments.

  • For example, define a function called square which takes one input parameter n and returns the square n**2.

def square(n):
    n_squared = n**2
    return n_squared
square(2)
4
square(100)
10000
square(12345)
152399025
  • Begins with def keyword, function name, input parameters and then colon (:)

  • Function block defined by indentation

  • Output or “return” value of the function is given by the return keyword

Side effects¶

  • If a function changes the variables passed into it, then it is said to have side effects

  • Example:

def silly_sum(sri):
    sri.append(0)
    return sum(sri)
    
silly_sum([1,2,3,4])
10

Looks good, like it sums the numbers? But wait…

lst = [1,2,3,4]
silly_sum(lst)
10
lst
[1, 2, 3, 4, 0]
  • If you function has side effects like this, you must mention it in the documentation (later today).

  • More on how this works in Tuesday’s class.

Null return type¶

If you do not specify a return value, the function returns None when it terminates:

def f(x):
    x + 1 # no return!
    if x == 999:
        return
print(f(0))
None

DRY principle, designing good functions (15 min)¶

  • DRY: Don’t Repeat Yourself

  • See Wikipedia article

  • Consider the task of, for each element of a list, turning it into a palindrome

    • e.g. “mike” –> “mikeekim”

names = ["milad", "rodolfo", "tiffany"]
name = "mike"
name[::-1]
'ekim'
names_backwards = list()

names_backwards.append(names[0] + names[0][::-1])
names_backwards.append(names[1] + names[1][::-1])
names_backwards.append(names[2] + names[2][::-1])
names_backwards
['miladdalim', 'rodolfooflodor', 'tiffanyynaffit']
  • Above: this is gross, terrible, yucky code

    1. It only works for a list with 3 elements

    2. It only works for a list named names

    3. If we want to change its functionality, we need to change 3 similar lines of code (Don’t Repeat Yourself!!)

    4. It is hard to understand what it does just by looking at it

names_backwards = list()

for name in names:
    names_backwards.append(name + name[::-1])
    
names_backwards
['miladdalim', 'rodolfooflodor', 'tiffanyynaffit']

Above: this is slightly better. We have solved problems (1) and (3).

def make_palindromes(names):
    names_backwards = list()
    
    for name in names:
        names_backwards.append(name + name[::-1])
    
    return names_backwards

make_palindromes(names)
['miladdalim', 'rodolfooflodor', 'tiffanyynaffit']
  • Above: this is even better. We have now also solved problem (2), because you can call the function with any list, not just names.

  • For example, what if we had multiple lists:

names1 = ["milad", "rodolfo", "tiffany"]
names2 = ["Trudeau", "Scheer", "Singh", "Blanchet", "May"]
names3 = ["apple", "orange", "banana"]
names_backwards_1 = list()

for name in names1:
    names_backwards_1.append(name + name[::-1])
    
names_backwards_1
['miladdalim', 'rodolfooflodor', 'tiffanyynaffit']
names_backwards_2 = list()

for name in names2:
    names_backwards_2.append(name + name[::-1])
    
names_backwards_2
['TrudeauuaedurT', 'ScheerreehcS', 'SinghhgniS', 'BlanchettehcnalB', 'MayyaM']
names_backwards_3 = list()

for name in names3:
    names_backwards_3.append(name + name[::-1])
    
names_backwards_3
['appleelppa', 'orangeegnaro', 'bananaananab']

Above: this is very bad also (and imagine if it was 20 lines of code instead of 2). This was problem (2). Our function makes it much better:

make_palindromes(names1)
['miladdalim', 'rodolfooflodor', 'tiffanyynaffit']
make_palindromes(names2)
['TrudeauuaedurT', 'ScheerreehcS', 'SinghhgniS', 'BlanchettehcnalB', 'MayyaM']
make_palindromes(names3)
['appleelppa', 'orangeegnaro', 'bananaananab']
  • You could get even more fancy, and put the lists of names into a list (so you have a list of lists).

  • Then you could loop over the list and call the function each time:

for list_of_names in [names1, names2, names3]:
    print(make_palindromes(list_of_names))
['miladdalim', 'rodolfooflodor', 'tiffanyynaffit']
['TrudeauuaedurT', 'ScheerreehcS', 'SinghhgniS', 'BlanchettehcnalB', 'MayyaM']
['appleelppa', 'orangeegnaro', 'bananaananab']

Designing good functions¶

  • How far you go with this is sort of a matter of personal style, and how you choose to apply the DRY principle: DON’T REPEAT YOURSELF!

  • These decisions are often ambiguous. For example:

    • Should make_palindromes be a function if I’m only ever doing it once? Twice?

    • Should the loop be inside the function, or outside?

    • Or should there be TWO functions, one that loops over the other??

  • In my personal opinion, make_palindromes does a bit too much to be understandable.

  • I prefer this:

def make_palindrome(name):
    return name + name[::-1]

make_palindrome("milad")
'miladdalim'
  • From here, we want to “apply make_palindrome to every element of a list”

  • It turns out this is an extremely common desire, so Python has built-in functions.

  • One of these is map, which we’ll cover later. But for now, just a comprehension will do:

[make_palindrome(name) for name in names]
['miladdalim', 'rodolfooflodor', 'tiffanyynaffit']

Other function design considerations:

  • Should we print output or produce plots inside or outside functions?

    • I would usually say outside, because this is a “side effect” of sorts

  • Should the function do one thing or many things?

    • This is a tough one, hard to answer in general

Break (5 min)¶

Optional & keyword arguments (5 min)¶

  • Sometimes it is convenient to have default values for some arguments in a function.

  • Because they have default values, these arguments are optional, hence “optional arguments”

  • Example:

def repeat_string(s, n=2):
    return s*n
repeat_string("mds", 2)
'mdsmds'
repeat_string("mds", 5)
'mdsmdsmdsmdsmds'
repeat_string("mds") # do not specify `n`; it is optional
'mdsmds'

Sane defaults:

  • Ideally, the default should be carefully chosen.

  • Here, the idea of “repeating” something makes me think of having 2 copies, so n=2 feels like a sane default.

Syntax:

  • You can have any number of arguments and any number of optional arguments

  • All the optional arguments must come after the regular arguments

  • The regular arguments are mapped by the order they appear

  • The optional arguments can be specified out of order

def example(a, b, c="DEFAULT", d="DEFAULT"):
    print(a,b,c,d)
    
example(1,2,3,4)
1 2
 3 4

Using the defaults for c and d:

example(1,2)
1 2 DEFAULT DEFAULT

Specifying c and d as keyword arguments (i.e. by name):

example(1,2,c=3,d=4)
1 2 3 4

Specifying only one of the optional arguments, by keyword:

example(1,2,c=3)
1 2 3 DEFAULT

Or the other:

example(1,2,d=4)
1 2 DEFAULT 4

Specifying all the arguments as keyword arguments, even though only c and d are optional:

example(a=1,b=2,c=3,d=4)
1 2 3 4

Specifying c by the fact that it comes 3rd (I do not recommend this because I find it is confusing):

example(1,2,3) 
1 2 3 DEFAULT

Specifying the optional arguments by keyword, but in the wrong order (this is also somewhat confusing, but not so terrible - I am OK with it):

example(1,2,d=4,c=3) 
1 2 3 4

Specifying the non-optional arguments by keyword (I am fine with this):

example(a=1,b=2)
1 2 DEFAULT DEFAULT

Specifying the non-optional arguments by keyword, but in the wrong order (not recommended, I find it confusing):

example(b=2,a=1)
1 2 DEFAULT DEFAULT

Specifying keyword arguments before non-keyword arguments (this throws an error):

example(a=2,1)
  File "<ipython-input-56-0d6831baee6b>", line 1
    example(a=2,1)
               ^
SyntaxError: positional argument follows keyword argument
  • In general, I am used to calling non-optional arguments by order, and optional arguments by keyword.

  • The language allows us to deviate from this, but it can be unnecessarily confusing sometimes.

Advanced stuff (optional):¶

  • You can also call/define functions with *args and **kwargs; see, e.g. here

  • Do not instantiate objects in the function definition - see here under “Mutable Default Arguments”

def example(a, b=[]): # don't do this!
    return 0
def example(a, b=None): # insted, do this
    if b is None:
        b = []
    return 0

Docstrings (10 min)¶

  • We got pretty far above, but we never solved problem (4): It is hard to understand what it does just by looking at it

  • Enter the idea of function documentation (and in particular docstrings)

  • The docstring goes right after the def line.

def make_palindrome(string):
    """Turns the string into a palindrome by concatenating itself with a reversed version of itself."""
    
    return string + string[::-1]

In IPython/Jupyter, we can use ? to view the documentation string of any function in our environment.

make_palindrome?
print?

Docstring structure¶

  1. Single-line: If it’s short, then just a single line describing the function will do (as above).

  2. PEP-8 style Multi-line description + a list of arguments; see here.

  3. Scipy style: The most elaborate & informative; see here and here.

The PEP-8 style:

def make_palindrome(string):
    """
    Turns the string into a palindrome by concatenating itself 
    with a reversed version of itself.
    
    Arguments:
    string - (str) the string to turn into a palindrome
    """
    return string + string[::-1]
make_palindrome?

The scipy style:

def make_palindrome(string):
    """
    Turn a string into a palindrome.
    
    Turns the string into a palindrome by concatenating itself 
    with a reversed version of itself, so that the returned
    string is twice as long as the original.
    
    Parameters
    ----------
    string : str
        The string to turn into a palindrome.
        
    Returns
    -------
    str
        The new palindrome string. 
        
    Examples
    --------
    >>> make_palindrome("abc")
    "abccba"
    """
    return string + string[::-1]
make_palindrome(# press shift-tab HERE to get docstring!!

Below is the general form of the scipy docstring (reproduced from the scipy/numpy docs):

def function_name(param1,param2,param3):
    """First line is a short description of the function.
    
    A paragraph describing in a bit more detail what the
    function does and what algorithms it uses and common
    use cases.
    
    Parameters
    ----------
    param1 : datatype
        A description of param1.
    param2 : datatype
        A description of param2.
    param3 : datatype
        A longer description because maybe this requires
        more explanation and we can use several lines.
    
    Returns
    -------
    datatype
        A description of the output, datatypes and behaviours.
        Describe special cases and anything the user needs to
        know to use the function.
    
    Examples
    --------
    >>> function_name(3,8,-5)
    2.0
    """

Docstrings in your labs¶

In MDS we will accept:

  • One-line docstrings for very simple functions.

  • Either the PEP-8 or scipy style for bigger functions.

    • But we think the scipy style is more common in the wild so you may want to get into the habit of using it.

    • Personally, I like that it explicitly gives the datatype of the return value.

Docstrings with optional arguments¶

When specifying the parameters, we specify the defaults for optional arguments:

# PEP-8 style
def repeat_string(s, n=2):
    """
    Repeat the string s, n times.
    
    Arguments:
    s -- (str) the string
    n -- (int) the number of times (default 2)
    """
    return s*n
# scipy style
def repeat_string(s, n=2):
    """
    Repeat the string s, n times.
    
    Parameters
    ----------
    s : str 
        the string
    n : int, optional (default = 2)
        the number of times
        
    Returns
    -------
    str
        the repeated string
        
    Examples
    --------
    >>> repeat_string("Blah", 3)
    "BlahBlahBlah"
    """
    return s*n

Automatically generated documentation¶

  • By following the docstring conventions, we can automatically generate documentation using libraries like sphinx, pydoc or Doxygen.

    • For example: compare this documentation with this code.

    • Notice the similarities? The webpage was automatically generated because the authors used standard conventions for docstrings!

What makes good documentation?¶

  • What do you think about this?

################################
#
# NOT RECOMMENDED TO DO THIS!!!
#
################################

def make_palindrome(string):
    """
    Turns the string into a palindrome by concatenating itself 
    with a reversed version of itself. To do this, it uses the
    Python syntax of `[::-1]` to flip the string, and stores
    this in a variable called string_reversed. It then uses `+`
    to concatenate the two strings and return them to the caller.
    
    Arguments:
    string - (str) the string to turn into a palindrome
    
    Other variables:
    string_reversed - (str) the reversed string
    """
    
    string_reversed = string[::-1]
    return string + string_reversed



  • This is poor documentation! More is not necessarily better!

Why?¶

Unit tests, corner cases (10 min)¶

assert statements¶

  • assert statementS cause your program to fail if the condition is False.

  • They can be used as sanity checks for your program.

  • There are more sophisticated way to “test” your programs, which we’ll discuss in DSCI 524.

  • The syntax is:

assert expression , "Error message if expression is False or raises an error."
assert 1 == 2 , "1 is not equal to 2."
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-87-6a6bcee9d8ba> in <module>
----> 1 assert 1 == 2 , "1 is not equal to 2."

AssertionError: 1 is not equal to 2.

Systematic Program Design¶

A systematic approach to program design is a general set of steps to follow when writing programs. Our approach includes:

  1. Write a stub: a function that does nothing but accept all input parameters and return the correct datatype.

  2. Write tests to satisfy the design specifications.

  3. Outline the program with pseudo-code.

  4. Write code and test frequently.

  5. Write documentation.

The key point: write tests BEFORE you write code.

  • You do not have to do this in MDS, but you may find it surprisingly helpful.

  • Often writing tests helps you think through what you are trying to accomplish.

  • It’s best to have that clear before you write the actual code.

Testing woes - false positives¶

  • Just because all your tests pass, this does not mean your program is correct!!

  • This happens all the time. How to deal with it?

    • Write a lot of tests!

    • Don’t be overconfident, even after writing a lot of tests!

def sample_median(x):
    """Finds the median of a list of numbers."""
    x_sorted = sorted(x)
    return x_sorted[len(x_sorted)//2]

assert sample_median([1,2,3,4,5]) == 3
assert sample_median([0,0,0,0]) == 0

Looks good? … ?






assert sample_median([1,2,3,4]) == 2.5
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-89-8e10deb531f1> in <module>
----> 1 assert sample_median([1,2,3,4]) == 2.5

AssertionError: 






assert sample_median([1,3,2]) == 2
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-90-02b830c5478d> in <module>
----> 1 assert sample_median([1,3,2]) == 2

AssertionError: 

Testing woes - false negatives¶

  • It can also happen, though more rarely, that your tests fail but your program is correct.

  • This means there is something wrong with your test.

  • For example, in the autograding for lab1 this happened to some people, because of tiny roundoff errors.

Corner cases¶

  • A corner case is an input that is reasonable but a bit unusual, and may trip up your code.

  • For example, taking the median of an empty list, or a list with only one element.

  • Often it is desirable to add test cases to address corner cases.

assert sample_median([1]) == 1
  • In this case the code worked with no extra effort, but sometimes we need if statements to handle the weird cases.

  • Sometimes we want the code to throw an error (e.g. median of an empty list); more on this later.

Multiple return values (0 min)¶

  • In most (all?) programming languages I’ve seen, functions can only return one thing.

  • That is technically true in Python, but there is a “workaround”, which is to return a tuple.

# not good from a design perspective!
def sum_and_product(x, y):
    return (x+y, x*y)
sum_and_product(5,6)
(11, 30)

In some cases in Python, the parentheses can be omitted:

def sum_and_product(x, y):
    return x+y, x*y
sum_and_product(5,6)
(11, 30)

It is common to store these in separate variables, so it really feels like the function is returning multiple values:

s, p = sum_and_product(5, 6)
s
11
p
30
  • Question: is this good function design.

  • Answer: usually not, but sometimes.

  • You will encounter this in some Python packages.

Advanced / optional: you can ignore return values you don’t need with _:

s, _ = sum_and_product(5, 6)
s
11

Fun with tuples¶

In general, you can do some weird stuff with tuples:

a, b = 5, 6
a, b = (5, 6)
a, b = b, a # in other languages this requires a "temp" variable
a
6
b
5