DSCI 531 Lecture 3
Contents
DSCI 531 Lecture 3#
In this class, we will watch the third of four lectures by Dr. Mike Gelbart, option co-director of the UBC-Vancouver MDS program.
from random import random
import pdb
Lecture Outline:#
Functions as a data type (5 min)
Anonymous functions (5 min)
Exceptions, try/except (15 min)
Defer to Lecture 5#
Style guides and coding style (15 min)
Python debugger (
pdb
) (5 min)Break (5 min)
Numpy arrays (10 min)
Numpy array shapes (10 min)
Numpy indexing and slicing (10 min)
Attribution#
The original version of these Python lectures were by Patrick Walls.
These lectures were delivered by Mike Gelbart and are available publicly here.
Functions as a data type (5 min)#
In Python, functions are a data type just like anything else.
We often say functions are “first-class objects”.
def do_nothing(x):
return x
type(do_nothing)
function
print(do_nothing)
<function do_nothing at 0x7f57e4bb3280>
#do_nothing = 5
do_nothing(10)
10
This means you can pass functions as arguments into other functions.
def square(y):
return y**2
def evaluate_function_on_x_plus_1(fun, x):
return fun(x+1)
square(5)
25
evaluate_function_on_x_plus_1(square, 5)
36
Above: what happened here?
fun(x+1)
becomessquare(5+1)
square(6)
becomes36
(optional) You can also write functions that return functions, or define functions inside of other functions.
I don’t do these often.
But they are important ideas in software engineering.
You can end up with pretty weird stuff:
do_nothing(do_nothing)
<function __main__.do_nothing(x)>
do_nothing(do_nothing)(5)
5
Above:
First we call
do_nothing(do_nothing)
, which returns the functiondo_nothing
Then we call
do_nothing(5)
which returns5
.
do_nothing(do_nothing(5))
5
Above:
First we call
do_nothing(5)
, which returns5
.Then we again call
do_nothing(5)
, which returns5
.
Anonymous functions (5 min)#
There are two ways to define functions in Python:
def add_one(x):
return x+1
add_one(7.2)
8.2
add_one = lambda x: x+1
type(add_one)
function
add_one(7.2)
8.2
The two approaches above are identical. The one with lambda
is called an anonymous function.
Some differences:
anonymous functions can only take up one line of code, so they aren’t appropriate in most cases.
anonymous functions evaluate to a function (remember, functions are first-class objects) immediate, so we can do weird stuff with them.
(lambda x,y: x+y)(6,7)
13
a = (lambda x,y: x*y)(5,5)
b = (lambda x,y: x+y*2)(2,2)
print(a,b)
25 6
evaluate_function_on_x_plus_1(lambda x: x**2, 5)
36
Above:
First,
lambda x: x**2
evaluates to a value of typefunction
Notice that this function is never given a name - hence “anonymous functions” !
Then, the function and the integer
5
are passed intoevaluate_function_on_x_plus_1
At which point the anonymous function is evaluated on
5+1
, and we get36
.
Exceptions, try
/except
(10 min)#
Above: the Blue Screen of Death. Some amusing examples here!
If something goes wrong, we don’t want the code to crash - we want it to fail gracefully.
In Python, this can be accomplished using
try
/except
:Here is a basic example:
for i in range(10):
print(i)
if i == 4:
this_variable_does_not_exist
0
1
2
3
4
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Input In [20], in <cell line: 1>()
3 print(i)
4 if i == 4:
----> 5 this_variable_does_not_exist
NameError: name 'this_variable_does_not_exist' is not defined
for i in range(10):
print(i)
if i == 4:
try:
this_variable_does_not_exist
except:
# pass
print("\tYou did something bad!")
Python tries to execute the code in the
try
block.If an error is encountered, we “catch” this in the
except
block (also calledtry
/catch
in other languages).There are many different error types, or exceptions - we saw
NameError
above.
5/0
my_list = [1,2,3]
my_list[5]
# (note: this is also valid syntax, just very confusing)
[1,2,3][5]
my_tuple = (1,2,3)
my_tuple[0] = 0
Ok, so there are apparently a bunch of different errors one could run into.
With
try
/except
you can also catch the exception itself:
for i in range(10):
print(i)
if i == 4:
try:
this_variable_does_not_exist
except Exception as ex:
print("You did something bad!")
print(type(ex),ex)
In the above, we caught the exception and assigned it to the variable
ex
so that we could print it out.This is useful because you can see what the error message would have been, without crashing your program.
You can also catch specific exceptions types, like so:
try:
#1/0
this_variable_does_not_exist
except TypeError:
print("You made a type error!")
except NameError:
print("You made a name error!")
except ZeroDivisionError:
print('You made a division error!')
except:
print("You made some other sort of error")
The final
except
would trigger if the error is none of the above types, so this sort of has anif
/elif
/else
feel to it.There are some extra features, in particular an
else
andfinally
block; if you are interested, see e.g., here.
try:
5/0
except TypeError:
print("You made a type error!")
except NameError:
print("You made a name error!")
except Exception as ex:
print("You made some other sort of error")
Ideally, try to make your
try
/except
blocks specific, and try not to put more errors inside theexcept
…
try:
this_variable_does_not_exist
except:
5/0
This is a bit much, but it does happen sometimes :(
Using raise
#
You can also write code that raises an exception on purpose, using
raise
def add_one(x):
return x+1
add_one("blah")
def add_one(x):
if not isinstance(x, float) and not isinstance(x, int):
raise Exception("Sorry, x must be numeric")
return x+1
add_one("blah")
This is useful when your function is complicated and would fail in a complicated way, with a weird error message.
You can make the cause of the error much clearer to the caller of the function.
Thus, your function is more usable this way.
If you do this, you should ideally describe these exceptions in the function documentation, so a user knows what to expect if they call your function.
You can also raise other types of exceptions, or even define your own exception types, as in lab 2.
You can also use
raise
by itself to raise whatever exception was going on:
try:
this_variable_does_not_exist
except:
print("You did something bad!")
raise
Here, the original exception is raised after we ran some other code.
Style guides and coding style (15 min)#
It is incorrect to think that if code works then you are done.
Code has two “users” - the computer (which turns it into machine instructions) and humans, who will likely read and/or modify the code in the future.
This section is about how to make your code suitable to that second audience, humans.
What is style?#
Style encompasses many things.
We already talked about the DRY principle, which could be considered under this umbrella, since it affects humans rather than the machines.
Today we will talk about:
variable names
magic numbers
comments
whitespace
Style guides#
It is common for style conventions to be brought together into a style guide.
If everyone follows the same style guide, it makes it easier to read code written by others.
“Code is read much more often than it is written.”
For Python, we will follow the PEP 8 style guide.
It is worth skimming through PEP 8, but here are some highlights:
Indent using 4 spaces
Have whitespace around operators, e.g.
x = 1
notx=1
But avoid extra whitespace, e.g.
f(1)
notf (1)
Single and double quotes both fine for strings, but only use “””triple double quotes”””, not ‘’’triple single quotes’’’
Variable and function names use
underscores_between_words
thisVariable (Java, camelCase) —-> this_variable (python)
And much more…
Automatic style checking#
This is not required, but I found it handy to install an automatic PEP 8 formatter. These commands should work; see instructions here.
pip install autopep8
jupyter labextension install @ryantam626/jupyterlab_code_formatter
pip install jupyterlab_code_formatter
jupyter serverextension enable --py jupyterlab_code_formatter
blah = [5, 3, 4, 5, 4]
blah2 = 5
# This code is so great
Guidelines that cannot be checked automatically#
Variable names should use underscores (PEP 8), but also need to make sense.
e.g.
spin_times
is a reasonable variable namemy_list_of_thingies
adheres to PEP 8 but is NOT a reasonable variable namesame for
lst
- fine for explaining a concept, but not as part of a script that will be reused
DRY (we talked about this last week)
Magic numbers
Comments
Magic numbers#
# NOT RECOMMENDED BECAUSE "8" IS A MAGIC NUMBER
def num_labs(num_weeks):
"""Compute the number of labs and MDS student attends per week."""
return num_weeks * 4
# BETTER
def num_labs(num_weeks, labs_per_week=4):
"""Compute the number of labs and MDS student attends per week."""
return num_weeks * labs_per_week
# ALSO FINE
LABS_PER_WEEK = 4
def num_labs(num_weeks):
"""Compute the number of labs and MDS student attends per week."""
return num_weeks * LABS_PER_WEEK
In the above,
LABS_PER_WEEK
is being set as a global constant.More on this next class.
So, why avoid magic numbers?
They make the code hard to read. Once you give the number a name, the code is much clearer.
You may need to use them in multiple places, in which case you’d be violating DRY.
The worst situation:
def num_labs(num_weeks):
"""Compute the number of labs and MDS student attends per week."""
return num_weeks * 4
def num_wheels(num_cars):
"""Compute the number of wheels in a collection of num_cars cars."""
return num_cars * 4
And then one day MDS students take 3 labs per week so you, or someone else, goes and changes the code to
def num_labs(num_weeks):
"""Compute the number of labs and MDS student attends per week."""
return num_weeks * 3
def num_wheels(num_cars):
"""Compute the number of wheels in a collection of num_cars cars."""
return num_cars * 3
And that is bad!
Python debugger (pdb
) (5 min)#
My Python code doesn’t work: what do I do?
Example:
random_walker
from lab:
# Write a function that takes in an integer ,
# checks to see if it is bigger than 50, and if it is, print "Good job!"
# if it is over 100, print "Excellent job"
# if it is under 50, print "Try again"
def check2(num):
""" doc string """
if num > 100:
print("excellent job")
elif num1 > 50:
print("good job")
else:
print("Please try again")
def score_checker(num):
""" doc string """
try:
check2(num)
except NameError:
print('hello')
raise
# THIS CODE SHOULD BE FIXED BY THE USER!!
score_checker(45)
%debug
def random_walker(T):
"""
Simulates T steps of a 2D random walk, and prints the result of each step.
Returns the squared distance from the origin.
Arguments:
T -- (int) the number of steps to take
"""
x = 0
y = 0
for i in range(T):
r = random()
# print(r)
pdb.set_trace()
if r < 0.25:
# print("I'm going right!")
x += 1
if r < 0.5:
# print("I'm going left!")
x -= 1
if r < 0.75:
y += 1
else:
y -= 1
print((x,y))
return x**2 + y**2
random_walker(10)
Looks good, right?
But wait, why does it always go left?
Let’s add some
print
statements inside theif
blocks to see what’s going on.Alternative:
pdb
import pdb
# pdb.set_trace()
See the pdb
docs here.
Break (10 min) – See you at 11:50#
Numpy arrays (10 min)#
import numpy as np
Numpy array shapes#
A numpy array is sort of like a list:
my_list = [1,2,3,4,5]
my_list
[1, 2, 3, 4, 5]
my_array = np.array((1,2,3,4,5))
my_array
array([1, 2, 3, 4, 5])
type(my_array)
numpy.ndarray
However, unlike a list, it can only hold a single type (usually numbers):
my_list = [True,"hi"]
my_array = np.array(my_list)
my_array
array(['True', 'hi'], dtype='<U5')
Above: it converted the integer 1
into the string '1'
(just avoid this!).
Creating arrays#
Several ways to create numpy arrays:
x = np.zeros(10) # an array of zeros with size 10
x
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
x = np.empty(10) # an array of "empty" with size 10
x
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
x = np.empty(10) + np.nan # an array of "empty" with size 10, turn it all into nan
x
array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan])
x = np.ones(4) # an array of ones with size 4
x
array([1., 1., 1., 1.])
x = np.arange(1,5) # from 1 inclusive to 5 exlcusive
x
array([1, 2, 3, 4])
x = np.arange(1,5,0.5) # step by 0.5
x
array([1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5])
x = np.linspace(1,5,17) # 20 equally spaced points between 1 and 5
x
array([1. , 1.25, 1.5 , 1.75, 2. , 2.25, 2.5 , 2.75, 3. , 3.25, 3.5 ,
3.75, 4. , 4.25, 4.5 , 4.75, 5. ])
x = np.random.rand(5) # random numbers uniformly distributed from 0 to 1
x
array([0.09497144, 0.18529892, 0.22979085, 0.21822965, 0.22039454])
Elementwise operations#
x = np.ones(4)
x
array([1., 1., 1., 1.])
y = x + 1
y
array([2., 2., 2., 2.])
x - y
array([-1., -1., -1., -1.])
x == y
array([False, False, False, False])
x * y
array([2., 2., 2., 2.])
x ** y
array([1., 1., 1., 1.])
x / y
array([0.5, 0.5, 0.5, 0.5])
np.array_equal(x,y)
False
Array shapes (10 min)#
The above are 1-D arrays:
x.shape
(4,)
Aside: tuples with 1 element
[1]
[1]
(1)
1
t = (1,) # tuple with 1 element
t
(1,)
type(t)
tuple
len(x)
4
Just like a list of lists
x = [[1,2],[3,4],[5,6]]
x
[[1, 2], [3, 4], [5, 6]]
You can have 2-D numpy arrays:
x = np.zeros((3,6))
x
array([[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]])
x.T # transpose
array([[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]])
x.shape
(3, 6)
x.size # total number of elements
18
x.ndim # len(x.shape)
2
Other things:
np.random.rand(3,4)
array([[0.28839397, 0.83044657, 0.04429282, 0.96213051],
[0.55121095, 0.64555915, 0.94032305, 0.92898158],
[0.07047969, 0.47134103, 0.23814152, 0.41881999]])
Other types:
np.zeros(6, dtype=int)
array([0, 0, 0, 0, 0, 0])
np.zeros(6).astype(int)
array([0, 0, 0, 0, 0, 0])
“dimension” and “length”#
The word dimension has 2 meanings (not my fault!)
We refer to the length of a vector as its dimension, because we think of it as a point in \(d\)-dimensional space
But in terms of being a container holding numbers, it’s a 1-dimensional container regardless of its length
Make sure you understand this! (and see below)
random_walker_location = np.zeros(2)
random_walker_location
array([0., 0.])
random_walker_location.ndim
1
x = np.random.rand(5)
x
array([0.81324098, 0.51559719, 0.45306103, 0.0985571 , 0.46671815])
len(x)
5
Above: in linear algebra terms, we call this a 5-dimensional vector because it’s a point in 5-dimensional space.
But in numpy it’s a 1-dimensional array.
We could say it’s a vector of length 5, but that wouldn’t be much better; “length” is also a broken word.
It could mean
len(x)
or it could mean \(\sqrt{\sum_i x_i^2}\), which is the Euclidean “length” of a vector from linear algebra.
There is no perfect solution here - just try to be very clear about what you mean and what other people mean.
x = np.random.rand(2,3,4) # a 3-D array
x.shape
(2, 3, 4)
x.size
24
x
array([[[0.31617195, 0.54825555, 0.97780162, 0.91529363],
[0.14058837, 0.43750167, 0.69022148, 0.53284238],
[0.36582001, 0.20541847, 0.82440265, 0.63558701]],
[[0.93324446, 0.35679956, 0.21077102, 0.33825529],
[0.40313579, 0.43067365, 0.74433005, 0.71426563],
[0.04141178, 0.31516926, 0.38836496, 0.14649587]]])
One of the most confusing things about numpy: what I call a “1-D array” can have 3 possible shapes:
x = np.ones(5)
print(x)
print("size:", x.size)
print("ndim:", x.ndim)
print("shape:",x.shape)
[1. 1. 1. 1. 1.]
size: 5
ndim: 1
shape: (5,)
y = np.ones((1,5))
print(y)
print("size:", y.size)
print("ndim:", y.ndim)
print("shape:",y.shape)
[[1. 1. 1. 1. 1.]]
size: 5
ndim: 2
shape: (1, 5)
z = np.ones((5,1))
print(z)
print("size:", z.size)
print("ndim:", z.ndim)
print("shape:",z.shape)
[[1.]
[1.]
[1.]
[1.]
[1.]]
size: 5
ndim: 2
shape: (5, 1)
np.array_equal(x,y)
False
np.array_equal(x,z)
False
np.array_equal(y,z)
False
x + y # makes sense
array([[2., 2., 2., 2., 2.]])
y + z # wait, what????
array([[2., 2., 2., 2., 2.],
[2., 2., 2., 2., 2.],
[2., 2., 2., 2., 2.],
[2., 2., 2., 2., 2.],
[2., 2., 2., 2., 2.]])
Above: this is called “broadcasting” and will be discussed in the next course (DSCI 523).
Indexing and slicing (10 min)#
x = np.arange(10)
x
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
x[3]
3
x[2:]
array([2, 3, 4, 5, 6, 7, 8, 9])
x[:4]
array([0, 1, 2, 3])
x[2:5]
array([2, 3, 4])
x[2:3]
array([2])
x[-1]
9
x[-2]
8
x[5:0:-1]
array([5, 4, 3, 2, 1])
For 2D arrays:
x = np.random.randint(10,size=(4,6))
x
array([[1, 5, 6, 1, 5, 0],
[0, 6, 5, 7, 0, 5],
[2, 7, 3, 8, 7, 0],
[6, 2, 0, 9, 1, 9]])
# row, then column
x[0,0]
1
x[3,4] # do this
1
x[3][4] # i do not like this as much
1
x[3]
array([6, 2, 0, 9, 1, 9])
x
array([[1, 5, 6, 1, 5, 0],
[0, 6, 5, 7, 0, 5],
[2, 7, 3, 8, 7, 0],
[6, 2, 0, 9, 1, 9]])
len(x) # generally, just confusing
4
x.shape
(4, 6)
x[:,2] # column number 2
array([6, 5, 3, 0])
x[2:,:3]
array([[2, 7, 3],
[6, 2, 0]])
x.T
array([[1, 0, 2, 6],
[5, 6, 7, 2],
[6, 5, 3, 0],
[1, 7, 8, 9],
[5, 0, 7, 1],
[0, 5, 0, 9]])
x
array([[1, 5, 6, 1, 5, 0],
[0, 6, 5, 7, 0, 5],
[2, 7, 3, 8, 7, 0],
[6, 2, 0, 9, 1, 9]])
x[1,1] = 555555
x
array([[ 1, 5, 6, 1, 5, 0],
[ 0, 555555, 5, 7, 0, 5],
[ 2, 7, 3, 8, 7, 0],
[ 6, 2, 0, 9, 1, 9]])
z = np.zeros(5)
z
array([0., 0., 0., 0., 0.])
z[0] = 5
z
array([5., 0., 0., 0., 0.])
Boolean indexing#
x = np.random.rand(10)
x
array([0.49980207, 0.26116012, 0.02009277, 0.41642499, 0.51661019,
0.01805233, 0.70602944, 0.53841982, 0.45866609, 0.27672178])
x + 1
array([1.49980207, 1.26116012, 1.02009277, 1.41642499, 1.51661019,
1.01805233, 1.70602944, 1.53841982, 1.45866609, 1.27672178])
x_thresh = x > 0.5
x_thresh
array([False, False, False, False, True, False, True, True, False,
False])
x[x_thresh]
array([0.51661019, 0.70602944, 0.53841982])
x[x_thresh] = 100 # set all elements > 0.5 to be equal to 100
x
array([4.99802069e-01, 2.61160121e-01, 2.00927704e-02, 4.16424989e-01,
1.00000000e+02, 1.80523348e-02, 1.00000000e+02, 1.00000000e+02,
4.58666090e-01, 2.76721778e-01])
x = np.random.rand(10)
x
array([0.97937487, 0.81041273, 0.0042411 , 0.86982107, 0.42289727,
0.46260313, 0.97150723, 0.69490841, 0.89918386, 0.4716509 ])
x[x > 0.5] = 0.5
x
array([0.5 , 0.5 , 0.0042411 , 0.5 , 0.42289727,
0.46260313, 0.5 , 0.5 , 0.5 , 0.4716509 ])
new_list = [1,1.0,2.0,2]
new_list
[1, 1.0, 2.0, 2]
np.array(new_list, dtype=int)
array([1, 1, 2, 2])
Comments#
Comments are important for understanding your code.
While docstrings cover what a function does, your comments will help document how your code achieves its goal.
There are PEP 8 guidelines on the length, spacing, capitalization of comments.
But, like variable names, this is not sufficient for a good comment.
Below, here is an example of a reasonable comment:
Here are some BAD EXAMPLES of comments: