DSCI 531 Lecture 3#

In this class, we will watch the third of four lectures by Dr. Mike Gelbart, option co-director of the UBC-Vancouver MDS program.

from random import random
import pdb

Lecture Outline:#

Functions as a data type (5 min)
Anonymous functions (5 min)
Exceptions, try/except (15 min)

Defer to Lecture 5#

Style guides and coding style (15 min)
Python debugger (pdb) (5 min)
Break (5 min)
Numpy arrays (10 min)
Numpy array shapes (10 min)
Numpy indexing and slicing (10 min)

Attribution#

The original version of these Python lectures were by Patrick Walls.
These lectures were delivered by Mike Gelbart and are available publicly here.

Functions as a data type (5 min)#

In Python, functions are a data type just like anything else.
We often say functions are “first-class objects”.

def do_nothing(x):
    return x

type(do_nothing)

function

print(do_nothing)

<function do_nothing at 0x7f57e4bb3280>

#do_nothing = 5
do_nothing(10)

This means you can pass functions as arguments into other functions.

def square(y):
    return y**2

def evaluate_function_on_x_plus_1(fun, x):
    return fun(x+1)

square(5)

evaluate_function_on_x_plus_1(square, 5)

Above: what happened here?
- fun(x+1) becomes square(5+1)
- square(6) becomes 36

(optional) You can also write functions that return functions, or define functions inside of other functions.
- I don’t do these often.
- But they are important ideas in software engineering.

You can end up with pretty weird stuff:

do_nothing(do_nothing)

<function __main__.do_nothing(x)>

do_nothing(do_nothing)(5)

Above:

First we call do_nothing(do_nothing), which returns the function do_nothing
Then we call do_nothing(5) which returns 5.

do_nothing(do_nothing(5))

Above:

First we call do_nothing(5), which returns 5.
Then we again call do_nothing(5), which returns 5.

Anonymous functions (5 min)#

There are two ways to define functions in Python:

def add_one(x):
    return x+1

add_one(7.2)

8.2

add_one = lambda x: x+1 

type(add_one)

function

add_one(7.2)

8.2

The two approaches above are identical. The one with lambda is called an anonymous function.

Some differences:

anonymous functions can only take up one line of code, so they aren’t appropriate in most cases.
anonymous functions evaluate to a function (remember, functions are first-class objects) immediate, so we can do weird stuff with them.

(lambda x,y: x+y)(6,7)

a = (lambda x,y: x*y)(5,5)
b = (lambda x,y: x+y*2)(2,2)
print(a,b)

25 6

evaluate_function_on_x_plus_1(lambda x: x**2, 5)

Above:

First, lambda x: x**2 evaluates to a value of type function
- Notice that this function is never given a name - hence “anonymous functions” !
Then, the function and the integer 5 are passed into evaluate_function_on_x_plus_1
At which point the anonymous function is evaluated on 5+1, and we get 36.

Exceptions, `try`/`except` (10 min)#

Above: the Blue Screen of Death. Some amusing examples here!

If something goes wrong, we don’t want the code to crash - we want it to fail gracefully.
In Python, this can be accomplished using try/except:
Here is a basic example:

for i in range(10):
    
    print(i)
    if i == 4:
        this_variable_does_not_exist

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [20], in <cell line: 1>()
      3 print(i)
      4 if i == 4:
----> 5     this_variable_does_not_exist

NameError: name 'this_variable_does_not_exist' is not defined

for i in range(10):
    
    print(i)
    if i == 4:
        try:
            this_variable_does_not_exist
        except:
        #     pass
            print("\tYou did something bad!")

Python tries to execute the code in the try block.
If an error is encountered, we “catch” this in the except block (also called try/catch in other languages).
There are many different error types, or exceptions - we saw NameError above.

5/0

my_list = [1,2,3]
my_list[5]

# (note: this is also valid syntax, just very confusing)
[1,2,3][5]

my_tuple = (1,2,3)
my_tuple[0] = 0

Ok, so there are apparently a bunch of different errors one could run into.
With try/except you can also catch the exception itself:

for i in range(10):
    
    print(i)
    if i == 4:
        try:
            this_variable_does_not_exist
        except Exception as ex:
            print("You did something bad!")
            print(type(ex),ex)

In the above, we caught the exception and assigned it to the variable ex so that we could print it out.
This is useful because you can see what the error message would have been, without crashing your program.

You can also catch specific exceptions types, like so:

try:
    #1/0
    this_variable_does_not_exist
except TypeError:
    print("You made a type error!")
except NameError:
    print("You made a name error!")
except ZeroDivisionError:
    print('You made a division error!')
except:
    print("You made some other sort of error")

The final except would trigger if the error is none of the above types, so this sort of has an if/elif/else feel to it.
There are some extra features, in particular an else and finally block; if you are interested, see e.g., here.

try:
    5/0
except TypeError:
    print("You made a type error!")
except NameError:
    print("You made a name error!")
except Exception as ex:
    print("You made some other sort of error")

Ideally, try to make your try/except blocks specific, and try not to put more errors inside the except…

try:
    this_variable_does_not_exist
except:
    5/0

This is a bit much, but it does happen sometimes :(

Using `raise`#

You can also write code that raises an exception on purpose, using raise

def add_one(x):
    return x+1

add_one("blah")

def add_one(x):
    if not isinstance(x, float) and not isinstance(x, int):
        raise Exception("Sorry, x must be numeric")
        
    return x+1

add_one("blah")

This is useful when your function is complicated and would fail in a complicated way, with a weird error message.
You can make the cause of the error much clearer to the caller of the function.
Thus, your function is more usable this way.
If you do this, you should ideally describe these exceptions in the function documentation, so a user knows what to expect if they call your function.

You can also raise other types of exceptions, or even define your own exception types, as in lab 2.
You can also use raise by itself to raise whatever exception was going on:

try:
    this_variable_does_not_exist
except:
    print("You did something bad!")
    raise

Here, the original exception is raised after we ran some other code.

Style guides and coding style (15 min)#

It is incorrect to think that if code works then you are done.
Code has two “users” - the computer (which turns it into machine instructions) and humans, who will likely read and/or modify the code in the future.
This section is about how to make your code suitable to that second audience, humans.

What is style?#

Style encompasses many things.
We already talked about the DRY principle, which could be considered under this umbrella, since it affects humans rather than the machines.

Today we will talk about:

variable names
magic numbers
comments
whitespace

Style guides#

It is common for style conventions to be brought together into a style guide.
If everyone follows the same style guide, it makes it easier to read code written by others.
- “Code is read much more often than it is written.”
For Python, we will follow the PEP 8 style guide.
It is worth skimming through PEP 8, but here are some highlights:
- Indent using 4 spaces
- Have whitespace around operators, e.g. x = 1 not x=1
- But avoid extra whitespace, e.g. f(1) not f (1)
- Single and double quotes both fine for strings, but only use “””triple double quotes”””, not ‘’’triple single quotes’’’
- Variable and function names use underscores_between_words
  - thisVariable (Java, camelCase) —-> this_variable (python)
- And much more…

Automatic style checking#

This is not required, but I found it handy to install an automatic PEP 8 formatter. These commands should work; see instructions here.

pip install autopep8
jupyter labextension install @ryantam626/jupyterlab_code_formatter
pip install jupyterlab_code_formatter
jupyter serverextension enable --py jupyterlab_code_formatter

blah = [5, 3, 4, 5, 4]
blah2 = 5
# This code is so great

Guidelines that cannot be checked automatically#

Variable names should use underscores (PEP 8), but also need to make sense.
- e.g. spin_times is a reasonable variable name
- my_list_of_thingies adheres to PEP 8 but is NOT a reasonable variable name
- same for lst - fine for explaining a concept, but not as part of a script that will be reused
DRY (we talked about this last week)
Magic numbers
Comments

Magic numbers#

# NOT RECOMMENDED BECAUSE "8" IS A MAGIC NUMBER

def num_labs(num_weeks):
    """Compute the number of labs and MDS student attends per week."""
    return num_weeks * 4

# BETTER

def num_labs(num_weeks, labs_per_week=4):
    """Compute the number of labs and MDS student attends per week."""
    return num_weeks * labs_per_week

# ALSO FINE

LABS_PER_WEEK = 4 

def num_labs(num_weeks):
    """Compute the number of labs and MDS student attends per week."""
    return num_weeks * LABS_PER_WEEK

In the above, LABS_PER_WEEK is being set as a global constant.
More on this next class.

So, why avoid magic numbers?

They make the code hard to read. Once you give the number a name, the code is much clearer.
You may need to use them in multiple places, in which case you’d be violating DRY.

The worst situation:

def num_labs(num_weeks):
    """Compute the number of labs and MDS student attends per week."""
    return num_weeks * 4

def num_wheels(num_cars):
    """Compute the number of wheels in a collection of num_cars cars."""
    return num_cars * 4

And then one day MDS students take 3 labs per week so you, or someone else, goes and changes the code to

def num_labs(num_weeks):
    """Compute the number of labs and MDS student attends per week."""
    return num_weeks * 3

def num_wheels(num_cars):
    """Compute the number of wheels in a collection of num_cars cars."""
    return num_cars * 3

And that is bad!

Comments#

Comments are important for understanding your code.
While docstrings cover what a function does, your comments will help document how your code achieves its goal.
There are PEP 8 guidelines on the length, spacing, capitalization of comments.
But, like variable names, this is not sufficient for a good comment.

Below, here is an example of a reasonable comment:

def random_walker(T):
    x = 0
    y = 0

    for i in range(T): 
        
        # Generate a random number between 0 and 1.
        # Then, go right, left, up or down if the number
        # is in the interval [0,0.25), [0.25,0.5),
        # [0.5,0.75) or [0.75,1) respectively.
        
        r = random() 
        if r < 0.25:
            x += 1      # Go right
        elif r < 0.5:
            x -= 1      # Go left
        elif r < 0.75:
            y += 1      # Go up
        else:
            y -= 1      # Go down

        print((x,y))

    return x**2 + y**2

Here are some BAD EXAMPLES of comments:

def random_walker(T):
    # intalize cooords
    x = 0
    y = 0

    for i in range(T):  # loop T times
        r = random() 
        if r < 0.25:
            x += 1 # go right
        elif r < 0.5:
            x -= 1 # go left
        elif r < 0.75:
            y += 1 # go up
        else:
            y -= 1

        # Print the location
        print((x,y))

    # In Python, the ** operator means exponentiation.
    return x**2 + y**2

Python debugger (`pdb`) (5 min)#

My Python code doesn’t work: what do I do?
Example: random_walker from lab:

# Write a function that takes in an integer , 
# checks to see if it is bigger than 50, and if it is, print "Good job!"
# if it is over 100, print "Excellent job"
# if it is under 50, print "Try again"

def check2(num):
    """ doc string """
    if num > 100:
        print("excellent job")
    elif num1 > 50:
        print("good job")
    else:
        print("Please try again")

def score_checker(num):
    """ doc string """
    
    try:
        check2(num)
    except NameError:
        print('hello')
        raise
    
# THIS CODE SHOULD BE FIXED BY THE USER!!

score_checker(45)

%debug

def random_walker(T):
    """
    Simulates T steps of a 2D random walk, and prints the result of each step.
    Returns the squared distance from the origin.
    
    Arguments:
    T -- (int) the number of steps to take
    """

    x = 0
    y = 0

    for i in range(T):
        r = random()
#         print(r)
        pdb.set_trace()
        if r < 0.25:
#             print("I'm going right!")
            x += 1
        if r < 0.5:
#             print("I'm going left!")
            x -= 1
        if r < 0.75:
            y += 1
        else:
            y -= 1

        print((x,y))

    return x**2 + y**2

random_walker(10)

Looks good, right?
But wait, why does it always go left?
Let’s add some print statements inside the if blocks to see what’s going on.
Alternative: pdb

import pdb
# pdb.set_trace()

See the pdb docs here.

Break (10 min) – See you at 11:50#

Numpy arrays (10 min)#

import numpy as np

Numpy array shapes#

A numpy array is sort of like a list:

my_list = [1,2,3,4,5]
my_list

[1, 2, 3, 4, 5]

my_array = np.array((1,2,3,4,5))
my_array

array([1, 2, 3, 4, 5])

type(my_array)

numpy.ndarray

However, unlike a list, it can only hold a single type (usually numbers):

my_list = [True,"hi"]

my_array = np.array(my_list)

my_array

array(['True', 'hi'], dtype='<U5')

Above: it converted the integer 1 into the string '1' (just avoid this!).

Creating arrays#

Several ways to create numpy arrays:

x = np.zeros(10) # an array of zeros with size 10
x

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

x = np.empty(10) # an array of "empty" with size 10
x

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

x = np.empty(10) + np.nan # an array of "empty" with size 10, turn it all into nan
x

array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan])

x = np.ones(4) # an array of ones with size 4
x

array([1., 1., 1., 1.])

x = np.arange(1,5) # from 1 inclusive to 5 exlcusive
x

array([1, 2, 3, 4])

x = np.arange(1,5,0.5) # step by 0.5
x

array([1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5])

x = np.linspace(1,5,17) # 20 equally spaced points between 1 and 5
x

array([1.  , 1.25, 1.5 , 1.75, 2.  , 2.25, 2.5 , 2.75, 3.  , 3.25, 3.5 ,
       3.75, 4.  , 4.25, 4.5 , 4.75, 5.  ])

x = np.random.rand(5) # random numbers uniformly distributed from 0 to 1
x

array([0.09497144, 0.18529892, 0.22979085, 0.21822965, 0.22039454])

Elementwise operations#

x = np.ones(4)
x

array([1., 1., 1., 1.])

y = x + 1
y

array([2., 2., 2., 2.])

x - y

array([-1., -1., -1., -1.])

x == y

array([False, False, False, False])

x * y

array([2., 2., 2., 2.])

x ** y

array([1., 1., 1., 1.])

x / y

array([0.5, 0.5, 0.5, 0.5])

np.array_equal(x,y)

False

Array shapes (10 min)#

The above are 1-D arrays:

x.shape

(4,)

Aside: tuples with 1 element

[1]

[1]

(1)

t = (1,) # tuple with 1 element
t

(1,)

type(t)

tuple

len(x)

Just like a list of lists

x = [[1,2],[3,4],[5,6]]
x

[[1, 2], [3, 4], [5, 6]]

You can have 2-D numpy arrays:

x = np.zeros((3,6))
x

array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])

x.T # transpose

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

x.shape

(3, 6)

x.size # total number of elements

x.ndim # len(x.shape)

Other things:

np.random.rand(3,4)

array([[0.28839397, 0.83044657, 0.04429282, 0.96213051],
       [0.55121095, 0.64555915, 0.94032305, 0.92898158],
       [0.07047969, 0.47134103, 0.23814152, 0.41881999]])

Other types:

np.zeros(6, dtype=int)

array([0, 0, 0, 0, 0, 0])

np.zeros(6).astype(int)

array([0, 0, 0, 0, 0, 0])

“dimension” and “length”#

The word dimension has 2 meanings (not my fault!)
- We refer to the length of a vector as its dimension, because we think of it as a point in \(d\)-dimensional space
- But in terms of being a container holding numbers, it’s a 1-dimensional container regardless of its length
- Make sure you understand this! (and see below)

random_walker_location = np.zeros(2)
random_walker_location

array([0., 0.])

random_walker_location.ndim

x = np.random.rand(5)
x

array([0.81324098, 0.51559719, 0.45306103, 0.0985571 , 0.46671815])

len(x)

Above: in linear algebra terms, we call this a 5-dimensional vector because it’s a point in 5-dimensional space.
- But in numpy it’s a 1-dimensional array.
- We could say it’s a vector of length 5, but that wouldn’t be much better; “length” is also a broken word.
- It could mean len(x) or it could mean \(\sqrt{\sum_i x_i^2}\), which is the Euclidean “length” of a vector from linear algebra.
There is no perfect solution here - just try to be very clear about what you mean and what other people mean.

x = np.random.rand(2,3,4) # a 3-D array
x.shape

(2, 3, 4)

x.size

array([[[0.31617195, 0.54825555, 0.97780162, 0.91529363],
        [0.14058837, 0.43750167, 0.69022148, 0.53284238],
        [0.36582001, 0.20541847, 0.82440265, 0.63558701]],

       [[0.93324446, 0.35679956, 0.21077102, 0.33825529],
        [0.40313579, 0.43067365, 0.74433005, 0.71426563],
        [0.04141178, 0.31516926, 0.38836496, 0.14649587]]])

One of the most confusing things about numpy: what I call a “1-D array” can have 3 possible shapes:

x = np.ones(5)
print(x)
print("size:", x.size)
print("ndim:", x.ndim)
print("shape:",x.shape)

[1. 1. 1. 1. 1.]
size: 5
ndim: 1
shape: (5,)

y = np.ones((1,5))
print(y)
print("size:", y.size)
print("ndim:", y.ndim)
print("shape:",y.shape)

[[1. 1. 1. 1. 1.]]
size: 5
ndim: 2
shape: (1, 5)

z = np.ones((5,1))
print(z)
print("size:", z.size)
print("ndim:", z.ndim)
print("shape:",z.shape)

[[1.]
 [1.]
 [1.]
 [1.]
 [1.]]
size: 5
ndim: 2
shape: (5, 1)

np.array_equal(x,y)

False

np.array_equal(x,z)

False

np.array_equal(y,z)

False

x + y # makes sense

array([[2., 2., 2., 2., 2.]])

y + z # wait, what????

array([[2., 2., 2., 2., 2.],
       [2., 2., 2., 2., 2.],
       [2., 2., 2., 2., 2.],
       [2., 2., 2., 2., 2.],
       [2., 2., 2., 2., 2.]])

Above: this is called “broadcasting” and will be discussed in the next course (DSCI 523).

Indexing and slicing (10 min)#

x = np.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

x[3]

x[2:]

array([2, 3, 4, 5, 6, 7, 8, 9])

x[:4]

array([0, 1, 2, 3])

x[2:5]

array([2, 3, 4])

x[2:3]

array([2])

x[-1]

x[-2]

x[5:0:-1]

array([5, 4, 3, 2, 1])

For 2D arrays:

x = np.random.randint(10,size=(4,6))
x

array([[1, 5, 6, 1, 5, 0],
       [0, 6, 5, 7, 0, 5],
       [2, 7, 3, 8, 7, 0],
       [6, 2, 0, 9, 1, 9]])

# row, then column
x[0,0]

x[3,4] # do this

x[3][4] # i do not like this as much

x[3]

array([6, 2, 0, 9, 1, 9])

array([[1, 5, 6, 1, 5, 0],
       [0, 6, 5, 7, 0, 5],
       [2, 7, 3, 8, 7, 0],
       [6, 2, 0, 9, 1, 9]])

len(x) # generally, just confusing

x.shape

(4, 6)

x[:,2] # column number 2

array([6, 5, 3, 0])

x[2:,:3]

array([[2, 7, 3],
       [6, 2, 0]])

x.T

array([[1, 0, 2, 6],
       [5, 6, 7, 2],
       [6, 5, 3, 0],
       [1, 7, 8, 9],
       [5, 0, 7, 1],
       [0, 5, 0, 9]])

array([[1, 5, 6, 1, 5, 0],
       [0, 6, 5, 7, 0, 5],
       [2, 7, 3, 8, 7, 0],
       [6, 2, 0, 9, 1, 9]])

x[1,1] = 555555
x

array([[     1,      5,      6,      1,      5,      0],
       [     0, 555555,      5,      7,      0,      5],
       [     2,      7,      3,      8,      7,      0],
       [     6,      2,      0,      9,      1,      9]])

z = np.zeros(5)
z

array([0., 0., 0., 0., 0.])

z[0] = 5
z

array([5., 0., 0., 0., 0.])

Boolean indexing#

x = np.random.rand(10)
x

array([0.49980207, 0.26116012, 0.02009277, 0.41642499, 0.51661019,
       0.01805233, 0.70602944, 0.53841982, 0.45866609, 0.27672178])

x + 1

array([1.49980207, 1.26116012, 1.02009277, 1.41642499, 1.51661019,
       1.01805233, 1.70602944, 1.53841982, 1.45866609, 1.27672178])

x_thresh = x > 0.5
x_thresh

array([False, False, False, False,  True, False,  True,  True, False,
       False])

x[x_thresh]

array([0.51661019, 0.70602944, 0.53841982])

x[x_thresh] = 100 # set all elements  > 0.5 to be equal to 100
x

array([4.99802069e-01, 2.61160121e-01, 2.00927704e-02, 4.16424989e-01,
       1.00000000e+02, 1.80523348e-02, 1.00000000e+02, 1.00000000e+02,
       4.58666090e-01, 2.76721778e-01])

x = np.random.rand(10)
x

array([0.97937487, 0.81041273, 0.0042411 , 0.86982107, 0.42289727,
       0.46260313, 0.97150723, 0.69490841, 0.89918386, 0.4716509 ])

x[x > 0.5] = 0.5
x

array([0.5       , 0.5       , 0.0042411 , 0.5       , 0.42289727,
       0.46260313, 0.5       , 0.5       , 0.5       , 0.4716509 ])

new_list = [1,1.0,2.0,2]
new_list

[1, 1.0, 2.0, 2]

np.array(new_list, dtype=int)

array([1, 1, 2, 2])

Data 531

DSCI 531 Lecture 3

Contents

DSCI 531 Lecture 3#

Lecture Outline:#

Defer to Lecture 5#

Attribution#

Functions as a data type (5 min)#

Anonymous functions (5 min)#

Exceptions, `try`/`except` (10 min)#

Using `raise`#

Style guides and coding style (15 min)#

What is style?#

Style guides#

Automatic style checking#

Guidelines that cannot be checked automatically#

Magic numbers#

Comments#

Python debugger (`pdb`) (5 min)#

Break (10 min) – See you at 11:50#

Numpy arrays (10 min)#

Numpy array shapes#

Creating arrays#

Elementwise operations#

Array shapes (10 min)#

“dimension” and “length”#

Indexing and slicing (10 min)#

Boolean indexing#

Data 531

DSCI 531 Lecture 3

Contents

DSCI 531 Lecture 3#

Lecture Outline:#

Defer to Lecture 5#

Attribution#

Functions as a data type (5 min)#

Anonymous functions (5 min)#

Exceptions, try/except (10 min)#

Using raise#

Style guides and coding style (15 min)#

What is style?#

Style guides#

Automatic style checking#

Guidelines that cannot be checked automatically#

Magic numbers#

Comments#

Python debugger (pdb) (5 min)#

Break (10 min) – See you at 11:50#

Numpy arrays (10 min)#

Numpy array shapes#

Creating arrays#

Elementwise operations#

Array shapes (10 min)#

“dimension” and “length”#

Indexing and slicing (10 min)#

Boolean indexing#

Exceptions, `try`/`except` (10 min)#

Using `raise`#

Python debugger (`pdb`) (5 min)#