DSCI 531 Lecture 3#

In this class, we will watch the third of four lectures by Dr. Mike Gelbart, option co-director of the UBC-Vancouver MDS program.

from random import random
import pdb

Lecture Outline:#

  • Functions as a data type (5 min)

  • Anonymous functions (5 min)

  • Exceptions, try/except (15 min)

Defer to Lecture 5#

  • Style guides and coding style (15 min)

  • Python debugger (pdb) (5 min)

  • Break (5 min)

  • Numpy arrays (10 min)

  • Numpy array shapes (10 min)

  • Numpy indexing and slicing (10 min)

Attribution#

Functions as a data type (5 min)#

  • In Python, functions are a data type just like anything else.

  • We often say functions are “first-class objects”.

def do_nothing(x):
    return x
type(do_nothing)
function
print(do_nothing)
<function do_nothing at 0x7f57e4bb3280>
#do_nothing = 5
do_nothing(10)
10

This means you can pass functions as arguments into other functions.

def square(y):
    return y**2

def evaluate_function_on_x_plus_1(fun, x):
    return fun(x+1)
square(5)
25
evaluate_function_on_x_plus_1(square, 5)
36
  • Above: what happened here?

    • fun(x+1) becomes square(5+1)

    • square(6) becomes 36

  • (optional) You can also write functions that return functions, or define functions inside of other functions.

    • I don’t do these often.

    • But they are important ideas in software engineering.

You can end up with pretty weird stuff:

do_nothing(do_nothing)
<function __main__.do_nothing(x)>
do_nothing(do_nothing)(5)
5

Above:

  • First we call do_nothing(do_nothing), which returns the function do_nothing

  • Then we call do_nothing(5) which returns 5.

do_nothing(do_nothing(5))
5

Above:

  • First we call do_nothing(5), which returns 5.

  • Then we again call do_nothing(5), which returns 5.

Anonymous functions (5 min)#

There are two ways to define functions in Python:

def add_one(x):
    return x+1
add_one(7.2)
8.2
add_one = lambda x: x+1 
type(add_one)
function
add_one(7.2)
8.2

The two approaches above are identical. The one with lambda is called an anonymous function.

Some differences:

  • anonymous functions can only take up one line of code, so they aren’t appropriate in most cases.

  • anonymous functions evaluate to a function (remember, functions are first-class objects) immediate, so we can do weird stuff with them.

(lambda x,y: x+y)(6,7)
13
a = (lambda x,y: x*y)(5,5)
b = (lambda x,y: x+y*2)(2,2)
print(a,b)
25 6
evaluate_function_on_x_plus_1(lambda x: x**2, 5)
36

Above:

  • First, lambda x: x**2 evaluates to a value of type function

    • Notice that this function is never given a name - hence “anonymous functions” !

  • Then, the function and the integer 5 are passed into evaluate_function_on_x_plus_1

  • At which point the anonymous function is evaluated on 5+1, and we get 36.

Exceptions, try/except (10 min)#

Above: the Blue Screen of Death. Some amusing examples here!

  • If something goes wrong, we don’t want the code to crash - we want it to fail gracefully.

  • In Python, this can be accomplished using try/except:

  • Here is a basic example:

for i in range(10):
    
    print(i)
    if i == 4:
        this_variable_does_not_exist
0
1
2
3
4
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [20], in <cell line: 1>()
      3 print(i)
      4 if i == 4:
----> 5     this_variable_does_not_exist

NameError: name 'this_variable_does_not_exist' is not defined
for i in range(10):
    
    print(i)
    if i == 4:
        try:
            this_variable_does_not_exist
        except:
        #     pass
            print("\tYou did something bad!")
  • Python tries to execute the code in the try block.

  • If an error is encountered, we “catch” this in the except block (also called try/catch in other languages).

  • There are many different error types, or exceptions - we saw NameError above.

5/0
my_list = [1,2,3]
my_list[5]
# (note: this is also valid syntax, just very confusing)
[1,2,3][5]
my_tuple = (1,2,3)
my_tuple[0] = 0
  • Ok, so there are apparently a bunch of different errors one could run into.

  • With try/except you can also catch the exception itself:

for i in range(10):
    
    print(i)
    if i == 4:
        try:
            this_variable_does_not_exist
        except Exception as ex:
            print("You did something bad!")
            print(type(ex),ex)
  • In the above, we caught the exception and assigned it to the variable ex so that we could print it out.

  • This is useful because you can see what the error message would have been, without crashing your program.

  • You can also catch specific exceptions types, like so:

try:
    #1/0
    this_variable_does_not_exist
except TypeError:
    print("You made a type error!")
except NameError:
    print("You made a name error!")
except ZeroDivisionError:
    print('You made a division error!')
except:
    print("You made some other sort of error")
  • The final except would trigger if the error is none of the above types, so this sort of has an if/elif/else feel to it.

  • There are some extra features, in particular an else and finally block; if you are interested, see e.g., here.

try:
    5/0
except TypeError:
    print("You made a type error!")
except NameError:
    print("You made a name error!")
except Exception as ex:
    print("You made some other sort of error")
  • Ideally, try to make your try/except blocks specific, and try not to put more errors inside the except

try:
    this_variable_does_not_exist
except:
    5/0
  • This is a bit much, but it does happen sometimes :(

Using raise#

  • You can also write code that raises an exception on purpose, using raise

def add_one(x):
    return x+1
add_one("blah")
def add_one(x):
    if not isinstance(x, float) and not isinstance(x, int):
        raise Exception("Sorry, x must be numeric")
        
    return x+1
add_one("blah")
  • This is useful when your function is complicated and would fail in a complicated way, with a weird error message.

  • You can make the cause of the error much clearer to the caller of the function.

  • Thus, your function is more usable this way.

  • If you do this, you should ideally describe these exceptions in the function documentation, so a user knows what to expect if they call your function.

  • You can also raise other types of exceptions, or even define your own exception types, as in lab 2.

  • You can also use raise by itself to raise whatever exception was going on:

try:
    this_variable_does_not_exist
except:
    print("You did something bad!")
    raise
  • Here, the original exception is raised after we ran some other code.

Style guides and coding style (15 min)#

  • It is incorrect to think that if code works then you are done.

  • Code has two “users” - the computer (which turns it into machine instructions) and humans, who will likely read and/or modify the code in the future.

  • This section is about how to make your code suitable to that second audience, humans.

What is style?#

  • Style encompasses many things.

  • We already talked about the DRY principle, which could be considered under this umbrella, since it affects humans rather than the machines.

Today we will talk about:

  • variable names

  • magic numbers

  • comments

  • whitespace

Style guides#

  • It is common for style conventions to be brought together into a style guide.

  • If everyone follows the same style guide, it makes it easier to read code written by others.

    • “Code is read much more often than it is written.”

  • For Python, we will follow the PEP 8 style guide.

  • It is worth skimming through PEP 8, but here are some highlights:

    • Indent using 4 spaces

    • Have whitespace around operators, e.g. x = 1 not x=1

    • But avoid extra whitespace, e.g. f(1) not f (1)

    • Single and double quotes both fine for strings, but only use “””triple double quotes”””, not ‘’’triple single quotes’’’

    • Variable and function names use underscores_between_words

      • thisVariable (Java, camelCase) —-> this_variable (python)

    • And much more…

Automatic style checking#

This is not required, but I found it handy to install an automatic PEP 8 formatter. These commands should work; see instructions here.

pip install autopep8
jupyter labextension install @ryantam626/jupyterlab_code_formatter
pip install jupyterlab_code_formatter
jupyter serverextension enable --py jupyterlab_code_formatter
blah = [5, 3, 4, 5, 4]
blah2 = 5
# This code is so great

Guidelines that cannot be checked automatically#

  • Variable names should use underscores (PEP 8), but also need to make sense.

    • e.g. spin_times is a reasonable variable name

    • my_list_of_thingies adheres to PEP 8 but is NOT a reasonable variable name

    • same for lst - fine for explaining a concept, but not as part of a script that will be reused

  • DRY (we talked about this last week)

  • Magic numbers

  • Comments

Magic numbers#

# NOT RECOMMENDED BECAUSE "8" IS A MAGIC NUMBER

def num_labs(num_weeks):
    """Compute the number of labs and MDS student attends per week."""
    return num_weeks * 4
# BETTER

def num_labs(num_weeks, labs_per_week=4):
    """Compute the number of labs and MDS student attends per week."""
    return num_weeks * labs_per_week
# ALSO FINE

LABS_PER_WEEK = 4 

def num_labs(num_weeks):
    """Compute the number of labs and MDS student attends per week."""
    return num_weeks * LABS_PER_WEEK
  • In the above, LABS_PER_WEEK is being set as a global constant.

  • More on this next class.

So, why avoid magic numbers?

  1. They make the code hard to read. Once you give the number a name, the code is much clearer.

  2. You may need to use them in multiple places, in which case you’d be violating DRY.

The worst situation:

def num_labs(num_weeks):
    """Compute the number of labs and MDS student attends per week."""
    return num_weeks * 4

def num_wheels(num_cars):
    """Compute the number of wheels in a collection of num_cars cars."""
    return num_cars * 4

And then one day MDS students take 3 labs per week so you, or someone else, goes and changes the code to

def num_labs(num_weeks):
    """Compute the number of labs and MDS student attends per week."""
    return num_weeks * 3

def num_wheels(num_cars):
    """Compute the number of wheels in a collection of num_cars cars."""
    return num_cars * 3

And that is bad!

Comments#

  • Comments are important for understanding your code.

  • While docstrings cover what a function does, your comments will help document how your code achieves its goal.

  • There are PEP 8 guidelines on the length, spacing, capitalization of comments.

  • But, like variable names, this is not sufficient for a good comment.

Below, here is an example of a reasonable comment:

def random_walker(T):
    x = 0
    y = 0

    for i in range(T): 
        
        # Generate a random number between 0 and 1.
        # Then, go right, left, up or down if the number
        # is in the interval [0,0.25), [0.25,0.5),
        # [0.5,0.75) or [0.75,1) respectively.
        
        r = random() 
        if r < 0.25:
            x += 1      # Go right
        elif r < 0.5:
            x -= 1      # Go left
        elif r < 0.75:
            y += 1      # Go up
        else:
            y -= 1      # Go down

        print((x,y))

    return x**2 + y**2

Here are some BAD EXAMPLES of comments:

def random_walker(T):
    # intalize cooords
    x = 0
    y = 0

    for i in range(T):  # loop T times
        r = random() 
        if r < 0.25:
            x += 1 # go right
        elif r < 0.5:
            x -= 1 # go left
        elif r < 0.75:
            y += 1 # go up
        else:
            y -= 1

        # Print the location
        print((x,y))

    # In Python, the ** operator means exponentiation.
    return x**2 + y**2

Python debugger (pdb) (5 min)#

  • My Python code doesn’t work: what do I do?

  • Example: random_walker from lab:

# Write a function that takes in an integer , 
# checks to see if it is bigger than 50, and if it is, print "Good job!"
# if it is over 100, print "Excellent job"
# if it is under 50, print "Try again"

def check2(num):
    """ doc string """
    if num > 100:
        print("excellent job")
    elif num1 > 50:
        print("good job")
    else:
        print("Please try again")

def score_checker(num):
    """ doc string """
    
    try:
        check2(num)
    except NameError:
        print('hello')
        raise
    
# THIS CODE SHOULD BE FIXED BY THE USER!!
score_checker(45)
%debug
def random_walker(T):
    """
    Simulates T steps of a 2D random walk, and prints the result of each step.
    Returns the squared distance from the origin.
    
    Arguments:
    T -- (int) the number of steps to take
    """

    x = 0
    y = 0

    for i in range(T):
        r = random()
#         print(r)
        pdb.set_trace()
        if r < 0.25:
#             print("I'm going right!")
            x += 1
        if r < 0.5:
#             print("I'm going left!")
            x -= 1
        if r < 0.75:
            y += 1
        else:
            y -= 1

        print((x,y))

    return x**2 + y**2

random_walker(10)
  • Looks good, right?

  • But wait, why does it always go left?

  • Let’s add some print statements inside the if blocks to see what’s going on.

  • Alternative: pdb

import pdb
# pdb.set_trace()

See the pdb docs here.

Break (10 min) – See you at 11:50#

Numpy arrays (10 min)#

import numpy as np

Numpy array shapes#

A numpy array is sort of like a list:

my_list = [1,2,3,4,5]
my_list
[1, 2, 3, 4, 5]
my_array = np.array((1,2,3,4,5))
my_array
array([1, 2, 3, 4, 5])
type(my_array)
numpy.ndarray

However, unlike a list, it can only hold a single type (usually numbers):

my_list = [True,"hi"]
my_array = np.array(my_list)
my_array
array(['True', 'hi'], dtype='<U5')

Above: it converted the integer 1 into the string '1' (just avoid this!).

Creating arrays#

Several ways to create numpy arrays:

x = np.zeros(10) # an array of zeros with size 10
x
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
x = np.empty(10) # an array of "empty" with size 10
x
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
x = np.empty(10) + np.nan # an array of "empty" with size 10, turn it all into nan
x
array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan])
x = np.ones(4) # an array of ones with size 4
x
array([1., 1., 1., 1.])
x = np.arange(1,5) # from 1 inclusive to 5 exlcusive
x
array([1, 2, 3, 4])
x = np.arange(1,5,0.5) # step by 0.5
x
array([1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5])
x = np.linspace(1,5,17) # 20 equally spaced points between 1 and 5
x
array([1.  , 1.25, 1.5 , 1.75, 2.  , 2.25, 2.5 , 2.75, 3.  , 3.25, 3.5 ,
       3.75, 4.  , 4.25, 4.5 , 4.75, 5.  ])
x = np.random.rand(5) # random numbers uniformly distributed from 0 to 1
x
array([0.09497144, 0.18529892, 0.22979085, 0.21822965, 0.22039454])

Elementwise operations#

x = np.ones(4)
x
array([1., 1., 1., 1.])
y = x + 1
y
array([2., 2., 2., 2.])
x - y
array([-1., -1., -1., -1.])
x == y
array([False, False, False, False])
x * y
array([2., 2., 2., 2.])
x ** y
array([1., 1., 1., 1.])
x / y
array([0.5, 0.5, 0.5, 0.5])
np.array_equal(x,y)
False

Array shapes (10 min)#

The above are 1-D arrays:

x.shape
(4,)

Aside: tuples with 1 element

[1]
[1]
(1)
1
t = (1,) # tuple with 1 element
t
(1,)
type(t)
tuple
len(x)
4

Just like a list of lists

x = [[1,2],[3,4],[5,6]]
x
[[1, 2], [3, 4], [5, 6]]

You can have 2-D numpy arrays:

x = np.zeros((3,6))
x
array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])
x.T # transpose
array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])
x.shape
(3, 6)
x.size # total number of elements
18
x.ndim # len(x.shape)
2

Other things:

np.random.rand(3,4)
array([[0.28839397, 0.83044657, 0.04429282, 0.96213051],
       [0.55121095, 0.64555915, 0.94032305, 0.92898158],
       [0.07047969, 0.47134103, 0.23814152, 0.41881999]])

Other types:

np.zeros(6, dtype=int)
array([0, 0, 0, 0, 0, 0])
np.zeros(6).astype(int)
array([0, 0, 0, 0, 0, 0])

“dimension” and “length”#

  • The word dimension has 2 meanings (not my fault!)

    • We refer to the length of a vector as its dimension, because we think of it as a point in \(d\)-dimensional space

    • But in terms of being a container holding numbers, it’s a 1-dimensional container regardless of its length

    • Make sure you understand this! (and see below)

random_walker_location = np.zeros(2)
random_walker_location
array([0., 0.])
random_walker_location.ndim
1
x = np.random.rand(5)
x
array([0.81324098, 0.51559719, 0.45306103, 0.0985571 , 0.46671815])
len(x)
5
  • Above: in linear algebra terms, we call this a 5-dimensional vector because it’s a point in 5-dimensional space.

    • But in numpy it’s a 1-dimensional array.

    • We could say it’s a vector of length 5, but that wouldn’t be much better; “length” is also a broken word.

    • It could mean len(x) or it could mean \(\sqrt{\sum_i x_i^2}\), which is the Euclidean “length” of a vector from linear algebra.

  • There is no perfect solution here - just try to be very clear about what you mean and what other people mean.

x = np.random.rand(2,3,4) # a 3-D array
x.shape
(2, 3, 4)
x.size
24
x
array([[[0.31617195, 0.54825555, 0.97780162, 0.91529363],
        [0.14058837, 0.43750167, 0.69022148, 0.53284238],
        [0.36582001, 0.20541847, 0.82440265, 0.63558701]],

       [[0.93324446, 0.35679956, 0.21077102, 0.33825529],
        [0.40313579, 0.43067365, 0.74433005, 0.71426563],
        [0.04141178, 0.31516926, 0.38836496, 0.14649587]]])

One of the most confusing things about numpy: what I call a “1-D array” can have 3 possible shapes:

x = np.ones(5)
print(x)
print("size:", x.size)
print("ndim:", x.ndim)
print("shape:",x.shape)
[1. 1. 1. 1. 1.]
size: 5
ndim: 1
shape: (5,)
y = np.ones((1,5))
print(y)
print("size:", y.size)
print("ndim:", y.ndim)
print("shape:",y.shape)
[[1. 1. 1. 1. 1.]]
size: 5
ndim: 2
shape: (1, 5)
z = np.ones((5,1))
print(z)
print("size:", z.size)
print("ndim:", z.ndim)
print("shape:",z.shape)
[[1.]
 [1.]
 [1.]
 [1.]
 [1.]]
size: 5
ndim: 2
shape: (5, 1)
np.array_equal(x,y)
False
np.array_equal(x,z)
False
np.array_equal(y,z)
False
x + y # makes sense
array([[2., 2., 2., 2., 2.]])
y + z # wait, what????
array([[2., 2., 2., 2., 2.],
       [2., 2., 2., 2., 2.],
       [2., 2., 2., 2., 2.],
       [2., 2., 2., 2., 2.],
       [2., 2., 2., 2., 2.]])

Above: this is called “broadcasting” and will be discussed in the next course (DSCI 523).

Indexing and slicing (10 min)#

x = np.arange(10)
x
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
x[3]
3
x[2:]
array([2, 3, 4, 5, 6, 7, 8, 9])
x[:4]
array([0, 1, 2, 3])
x[2:5]
array([2, 3, 4])
x[2:3]
array([2])
x[-1]
9
x[-2]
8
x[5:0:-1]
array([5, 4, 3, 2, 1])

For 2D arrays:

x = np.random.randint(10,size=(4,6))
x
array([[1, 5, 6, 1, 5, 0],
       [0, 6, 5, 7, 0, 5],
       [2, 7, 3, 8, 7, 0],
       [6, 2, 0, 9, 1, 9]])
# row, then column
x[0,0]
1
x[3,4] # do this
1
x[3][4] # i do not like this as much
1
x[3]
array([6, 2, 0, 9, 1, 9])
x
array([[1, 5, 6, 1, 5, 0],
       [0, 6, 5, 7, 0, 5],
       [2, 7, 3, 8, 7, 0],
       [6, 2, 0, 9, 1, 9]])
len(x) # generally, just confusing
4
x.shape
(4, 6)
x[:,2] # column number 2
array([6, 5, 3, 0])
x[2:,:3]
array([[2, 7, 3],
       [6, 2, 0]])
x.T
array([[1, 0, 2, 6],
       [5, 6, 7, 2],
       [6, 5, 3, 0],
       [1, 7, 8, 9],
       [5, 0, 7, 1],
       [0, 5, 0, 9]])
x
array([[1, 5, 6, 1, 5, 0],
       [0, 6, 5, 7, 0, 5],
       [2, 7, 3, 8, 7, 0],
       [6, 2, 0, 9, 1, 9]])
x[1,1] = 555555
x
array([[     1,      5,      6,      1,      5,      0],
       [     0, 555555,      5,      7,      0,      5],
       [     2,      7,      3,      8,      7,      0],
       [     6,      2,      0,      9,      1,      9]])
z = np.zeros(5)
z
array([0., 0., 0., 0., 0.])
z[0] = 5
z
array([5., 0., 0., 0., 0.])

Boolean indexing#

x = np.random.rand(10)
x
array([0.49980207, 0.26116012, 0.02009277, 0.41642499, 0.51661019,
       0.01805233, 0.70602944, 0.53841982, 0.45866609, 0.27672178])
x + 1
array([1.49980207, 1.26116012, 1.02009277, 1.41642499, 1.51661019,
       1.01805233, 1.70602944, 1.53841982, 1.45866609, 1.27672178])
x_thresh = x > 0.5
x_thresh
array([False, False, False, False,  True, False,  True,  True, False,
       False])
x[x_thresh]
array([0.51661019, 0.70602944, 0.53841982])
x[x_thresh] = 100 # set all elements  > 0.5 to be equal to 100
x
array([4.99802069e-01, 2.61160121e-01, 2.00927704e-02, 4.16424989e-01,
       1.00000000e+02, 1.80523348e-02, 1.00000000e+02, 1.00000000e+02,
       4.58666090e-01, 2.76721778e-01])
x = np.random.rand(10)
x
array([0.97937487, 0.81041273, 0.0042411 , 0.86982107, 0.42289727,
       0.46260313, 0.97150723, 0.69490841, 0.89918386, 0.4716509 ])
x[x > 0.5] = 0.5
x
array([0.5       , 0.5       , 0.0042411 , 0.5       , 0.42289727,
       0.46260313, 0.5       , 0.5       , 0.5       , 0.4716509 ])
new_list = [1,1.0,2.0,2]
new_list
[1, 1.0, 2.0, 2]
np.array(new_list, dtype=int)
array([1, 1, 2, 2])