DSCI 531 Lecture 3¶
In this class, we will watch the third of four lectures by Dr. Mike Gelbart, option co-director of the UBC-Vancouver MDS program.
from random import random
Lecture Outline:¶
Functions as a data type (5 min)
Anonymous functions (5 min)
Exceptions, try/except (15 min)
Style guides and coding style (15 min)
Python debugger (
pdb
) (5 min)Break (5 min)
Numpy arrays (10 min)
Numpy array shapes (10 min)
Numpy indexing and slicing (10 min)
Attribution¶
The original version of these Python lectures were by Patrick Walls.
These lectures were delivered by Mike Gelbart and are available publicly here.
Functions as a data type (5 min)¶
In Python, functions are a data type just like anything else.
We often say functions are “first-class objects”.
def do_nothing(x):
return x
type(do_nothing)
function
print(do_nothing)
<function do_nothing at 0x7f80883d0560>
# do_nothing = 5
This means you can pass functions as arguments into other functions.
def square(y):
return y**2
def evaluate_function_on_x_plus_1(fun, x):
return fun(x+1)
evaluate_function_on_x_plus_1(square, 5)
36
Above: what happened here?
fun(x+1)
becomessquare(5+1)
square(6)
becomes36
(optional) You can also write functions that return functions, or define functions inside of other functions.
I don’t do these often.
But they are important ideas in software engineering.
You can end up with pretty weird stuff:
do_nothing(do_nothing)
<function __main__.do_nothing(x)>
do_nothing(do_nothing)(5)
5
Above:
First we call
do_nothing(do_nothing)
, which returns the functiondo_nothing
Then we call
do_nothing(5)
which returns5
.
do_nothing(do_nothing(5))
5
Above:
First we call
do_nothing(5)
, which returns5
.Then we again call
do_nothing(5)
, which returns5
.
Anonymous functions (5 min)¶
There are two ways to define functions in Python:
def add_one(x):
return x+1
add_one(7.2)
8.2
add_one = lambda x: x+1
type(add_one)
function
add_one(7.2)
8.2
The two approaches above are identical. The one with lambda
is called an anonymous function.
Some differences:
anonymous functions can only take up one line of code, so they aren’t appropriate in most cases.
anonymous functions evaluate to a function (remember, functions are first-class objects) immediate, so we can do weird stuff with them.
(lambda x,y: x+y)(6,7)
13
evaluate_function_on_x_plus_1(lambda x: x**2, 5)
36
Above:
First,
lambda x: x**2
evaluates to a value of typefunction
Notice that this function is never given a name - hence “anonymous functions” !
Then, the function and the integer
5
are passed intoevaluate_function_on_x_plus_1
At which point the anonymous function is evaluated on
5+1
, and we get36
.
Exceptions, try
/except
(10 min)¶
Above: the Blue Screen of Death. Some amusing examples here!
If something goes wrong, we don’t want the code to crash - we want it to fail gracefully.
In Python, this can be accomplished using
try
/except
:Here is a basic example:
this_variable_does_not_exist
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-18-ce7872e25149> in <module>
----> 1 this_variable_does_not_exist
NameError: name 'this_variable_does_not_exist' is not defined
try:
this_variable_does_not_exist
except:
# pass
print("You did something bad!")
You did something bad!
Python tries to execute the code in the
try
block.If an error is encountered, we “catch” this in the
except
block (also calledtry
/catch
in other languages).There are many different error types, or exceptions - we saw
NameError
above.
5/0
---------------------------------------------------------------------------
ZeroDivisionError Traceback (most recent call last)
<ipython-input-25-0106664d39e8> in <module>
----> 1 5/0
ZeroDivisionError: division by zero
my_list = [1,2,3]
my_list[5]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-26-9deea645a317> in <module>
1 my_list = [1,2,3]
----> 2 my_list[5]
IndexError: list index out of range
# (note: this is also valid syntax, just very confusing)
[1,2,3][5]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-27-408f46526dbe> in <module>
1 # (note: this is also valid syntax, just very confusing)
----> 2 [1,2,3][5]
IndexError: list index out of range
my_tuple = (1,2,3)
my_tuple[0] = 0
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-28-ead1b17878d2> in <module>
1 my_tuple = (1,2,3)
----> 2 my_tuple[0] = 0
TypeError: 'tuple' object does not support item assignment
Ok, so there are apparently a bunch of different errors one could run into.
With
try
/except
you can also catch the exception itself:
try:
this_variable_does_not_exist
except Exception as ex:
print("You did something bad!")
print(ex)
print(type(ex))
You did something bad!
name 'this_variable_does_not_exist' is not defined
<class 'NameError'>
In the above, we caught the exception and assigned it to the variable
ex
so that we could print it out.This is useful because you can see what the error message would have been, without crashing your program.
You can also catch specific exceptions types, like so:
try:
this_variable_does_not_exist
except TypeError:
print("You made a type error!")
except NameError:
print("You made a name error!")
except:
print("You made some other sort of error")
You made a name error!
The final
except
would trigger if the error is none of the above types, so this sort of has anif
/elif
/else
feel to it.There are some extra features, in particular an
else
andfinally
block; if you are interested, see e.g., here.
try:
5/0
except TypeError:
print("You made a type error!")
except NameError:
print("You made a name error!")
except Exception as ex:
print("You made some other sort of error")
You made some other sort of error
Ideally, try to make your
try
/except
blocks specific, and try not to put more errors inside theexcept
…
try:
this_variable_does_not_exist
except:
5/0
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-32-a39e9781c1b0> in <module>
1 try:
----> 2 this_variable_does_not_exist
3 except:
NameError: name 'this_variable_does_not_exist' is not defined
During handling of the above exception, another exception occurred:
ZeroDivisionError Traceback (most recent call last)
<ipython-input-32-a39e9781c1b0> in <module>
2 this_variable_does_not_exist
3 except:
----> 4 5/0
ZeroDivisionError: division by zero
This is a bit much, but it does happen sometimes :(
Using raise
¶
You can also write code that raises an exception on purpose, using
raise
def add_one(x):
return x+1
add_one("blah")
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-34-96e0142692a3> in <module>
----> 1 add_one("blah")
<ipython-input-33-666bb314e8dd> in add_one(x)
1 def add_one(x):
----> 2 return x+1
TypeError: can only concatenate str (not "int") to str
def add_one(x):
if not isinstance(x, float) and not isinstance(x, int):
raise Exception("Sorry, x must be numeric")
return x+1
add_one("blah")
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
<ipython-input-36-96e0142692a3> in <module>
----> 1 add_one("blah")
<ipython-input-35-d504e0880b37> in add_one(x)
1 def add_one(x):
2 if not isinstance(x, float) and not isinstance(x, int):
----> 3 raise Exception("Sorry, x must be numeric")
4
5 return x+1
Exception: Sorry, x must be numeric
This is useful when your function is complicated and would fail in a complicated way, with a weird error message.
You can make the cause of the error much clearer to the caller of the function.
Thus, your function is more usable this way.
If you do this, you should ideally describe these exceptions in the function documentation, so a user knows what to expect if they call your function.
You can also raise other types of exceptions, or even define your own exception types, as in lab 2.
You can also use
raise
by itself to raise whatever exception was going on:
try:
this_variable_does_not_exist
except:
print("You did something bad!")
raise
You did something bad!
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-37-efbefb33d98b> in <module>
1 try:
----> 2 this_variable_does_not_exist
3 except:
4 print("You did something bad!")
5 raise
NameError: name 'this_variable_does_not_exist' is not defined
Here, the original exception is raised after we ran some other code.
Style guides and coding style (15 min)¶
It is incorrect to think that if code works then you are done.
Code has two “users” - the computer (which turns it into machine instructions) and humans, who will likely read and/or modify the code in the future.
This section is about how to make your code suitable to that second audience, humans.
What is style?¶
Style encompasses many things.
We already talked about the DRY principle, which could be considered under this umbrella, since it affects humans rather than the machines.
Today we will talk about:
variable names
magic numbers
comments
whitespace
Style guides¶
It is common for style conventions to be brought together into a style guide.
If everyone follows the same style guide, it makes it easier to read code written by others.
“Code is read much more often than it is written.”
For Python, we will follow the PEP 8 style guide.
It is worth skimming through PEP 8, but here are some highlights:
Indent using 4 spaces
Have whitespace around operators, e.g.
x = 1
notx=1
But avoid extra whitespace, e.g.
f(1)
notf (1)
Single and double quotes both fine for strings, but only use “””triple double quotes”””, not ‘’’triple single quotes’’’
Variable and function names use
underscores_between_words
And much more…
Automatic style checking¶
This is not required, but I found it handy to install an automatic PEP 8 formatter. These commands should work; see instructions here.
pip install autopep8
jupyter labextension install @ryantam626/jupyterlab_code_formatter
pip install jupyterlab_code_formatter
jupyter serverextension enable --py jupyterlab_code_formatter
blah = [5, 3, 4, 5, 4]
blah2 = 5
# This code is so great
Guidelines that cannot be checked automatically¶
Variable names should use underscores (PEP 8), but also need to make sense.
e.g.
spin_times
is a reasonable variable namemy_list_of_thingies
adheres to PEP 8 but is NOT a reasonable variable namesame for
lst
- fine for explaining a concept, but not as part of a script that will be reused
DRY (we talked about this last week)
Magic numbers
Comments
Magic numbers¶
# NOT RECOMMENDED BECAUSE "8" IS A MAGIC NUMBER
def num_labs(num_weeks):
"""Compute the number of labs and MDS student attends per week."""
return num_weeks * 4
# BETTER
def num_labs(num_weeks, labs_per_week=4):
"""Compute the number of labs and MDS student attends per week."""
return num_weeks * labs_per_week
# ALSO FINE
LABS_PER_WEEK = 4
def num_labs(num_weeks):
"""Compute the number of labs and MDS student attends per week."""
return num_weeks * LABS_PER_WEEK
In the above,
LABS_PER_WEEK
is being set as a global constant.More on this next class.
So, why avoid magic numbers?
They make the code hard to read. Once you give the number a name, the code is much clearer.
You may need to use them in multiple places, in which case you’d be violating DRY.
The worst situation:
def num_labs(num_weeks):
"""Compute the number of labs and MDS student attends per week."""
return num_weeks * 4
def num_wheels(num_cars):
"""Compute the number of wheels in a collection of num_cars cars."""
return num_cars * 4
And then one day MDS students take 3 labs per week so you, or someone else, goes and changes the code to
def num_labs(num_weeks):
"""Compute the number of labs and MDS student attends per week."""
return num_weeks * 3
def num_wheels(num_cars):
"""Compute the number of wheels in a collection of num_cars cars."""
return num_cars * 3
And that is bad!
Comments¶
Comments are important for understanding your code.
While docstrings cover what a function does, your comments will help document how your code achieves its goal.
There are PEP 8 guidelines on the length, spacing, capitalization of comments.
But, like variable names, this is not sufficient for a good comment.
Below, here is an example of a reasonable comment:
def random_walker(T):
x = 0
y = 0
for i in range(T):
# Generate a random number between 0 and 1.
# Then, go right, left, up or down if the number
# is in the interval [0,0.25), [0.25,0.5),
# [0.5,0.75) or [0.75,1) respectively.
r = random()
if r < 0.25:
x += 1 # Go right
elif r < 0.5:
x -= 1 # Go left
elif r < 0.75:
y += 1 # Go up
else:
y -= 1 # Go down
print((x,y))
return x**2 + y**2
Here are some BAD EXAMPLES of comments:
def random_walker(T):
# intalize cooords
x = 0
y = 0
for i in range(T): # loop T times
r = random()
if r < 0.25:
x += 1 # go right
elif r < 0.5:
x -= 1 # go left
elif r < 0.75:
y += 1 # go up
else:
y -= 1
# Print the location
print((x,y))
# In Python, the ** operator means exponentiation.
return x**2 + y**2
Python debugger (pdb
) (5 min)¶
My Python code doesn’t work: what do I do?
Example:
random_walker
from lab 1:
def random_walker(T):
"""
Simulates T steps of a 2D random walk, and prints the result of each step.
Returns the squared distance from the origin.
Arguments:
T -- (int) the number of steps to take
"""
x = 0
y = 0
for i in range(T):
r = random()
# print(r)
pdb.set_trace()
if r < 0.25:
# print("I'm going right!")
x += 1
if r < 0.5:
# print("I'm going left!")
x -= 1
if r < 0.75:
y += 1
else:
y -= 1
print((x,y))
return x**2 + y**2
random_walker(10)
> <ipython-input-50-06cca0b6c5f3>(17)random_walker()
-> if r < 0.25:
(Pdb) print(r)
0.42169116928133177
(Pdb) print(x)
0
(Pdb) z = x
(Pdb) x = 1
(Pdb) n
> <ipython-input-50-06cca0b6c5f3>(20)random_walker()
-> if r < 0.5:
(Pdb) c
(0, 1)
> <ipython-input-50-06cca0b6c5f3>(16)random_walker()
-> pdb.set_trace()
(Pdb) exit
---------------------------------------------------------------------------
BdbQuit Traceback (most recent call last)
<ipython-input-50-06cca0b6c5f3> in <module>
30 return x**2 + y**2
31
---> 32 random_walker(10)
<ipython-input-50-06cca0b6c5f3> in random_walker(T)
14 r = random()
15 # print(r)
---> 16 pdb.set_trace()
17 if r < 0.25:
18 # print("I'm going right!")
<ipython-input-50-06cca0b6c5f3> in random_walker(T)
14 r = random()
15 # print(r)
---> 16 pdb.set_trace()
17 if r < 0.25:
18 # print("I'm going right!")
~/anaconda3/lib/python3.7/bdb.py in trace_dispatch(self, frame, event, arg)
86 return # None
87 if event == 'line':
---> 88 return self.dispatch_line(frame)
89 if event == 'call':
90 return self.dispatch_call(frame, arg)
~/anaconda3/lib/python3.7/bdb.py in dispatch_line(self, frame)
111 if self.stop_here(frame) or self.break_here(frame):
112 self.user_line(frame)
--> 113 if self.quitting: raise BdbQuit
114 return self.trace_dispatch
115
BdbQuit:
Looks good, right?
But wait, why does it always go left?
Let’s add some
print
statements inside theif
blocks to see what’s going on.Alternative:
pdb
import pdb
# pdb.set_trace()
See the pdb
docs here.
Break (5 min)¶
Numpy arrays (10 min)¶
import numpy as np
Numpy array shapes¶
A numpy array is sort of like a list:
my_list = [1,2,3,4,5]
my_list
[1, 2, 3, 4, 5]
my_array = np.array((1,2,3,4,5))
my_array
array([1, 2, 3, 4, 5])
type(my_array)
numpy.ndarray
However, unlike a list, it can only hold a single type (usually numbers):
my_list = [1,"hi"]
my_array = np.array((1, "hi"))
my_array
array(['1', 'hi'], dtype='<U21')
Above: it converted the integer 1
into the string '1'
(just avoid this!).
Creating arrays¶
Several ways to create numpy arrays:
x = np.zeros(10) # an array of zeros with size 10
x
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
x = np.ones(4) # an array of ones with size 4
x
array([1., 1., 1., 1.])
x = np.arange(1,5) # from 1 inclusive to 5 exlcusive
x
array([1, 2, 3, 4])
x = np.arange(1,5,0.5) # step by 0.5
x
array([1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5])
x = np.linspace(1,5,20) # 20 equally spaced points between 1 and 5
x
array([1. , 1.21052632, 1.42105263, 1.63157895, 1.84210526,
2.05263158, 2.26315789, 2.47368421, 2.68421053, 2.89473684,
3.10526316, 3.31578947, 3.52631579, 3.73684211, 3.94736842,
4.15789474, 4.36842105, 4.57894737, 4.78947368, 5. ])
x = np.random.rand(5) # random numbers uniformly distributed from 0 to 1
x
array([0.60294432, 0.45408616, 0.16198369, 0.04883807, 0.09359963])
Elementwise operations¶
x = np.ones(4)
x
array([1., 1., 1., 1.])
y = x + 1
y
array([2., 2., 2., 2.])
x - y
array([-1., -1., -1., -1.])
x == y
array([False, False, False, False])
x * y
array([2., 2., 2., 2.])
x ** y
array([1., 1., 1., 1.])
x / y
array([0.5, 0.5, 0.5, 0.5])
np.array_equal(x,y)
False
Array shapes (10 min)¶
The above are 1-D arrays:
x.shape
(4,)
Aside: tuples with 1 element
[1]
[1]
(1)
1
t = (1,) # tuple with 1 element
t
(1,)
type(t)
tuple
len(x)
4
Just like a list of lists
x = [[1,2],[3,4],[5,6]]
x
[[1, 2], [3, 4], [5, 6]]
You can have 2-D numpy arrays:
x = np.zeros((3,6))
x
array([[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]])
x.T # transpose
array([[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]])
x.shape
(3, 6)
x.size # total number of elements
18
x.ndim # len(x.shape)
2
Annoying things:
np.random.rand(3,4)
array([[0.29430582, 0.29657717, 0.86937079, 0.50214988],
[0.22139675, 0.19843037, 0.72430929, 0.34951465],
[0.7294458 , 0.05811196, 0.13680871, 0.8286903 ]])
Other types:
np.zeros(6, dtype=int)
array([0, 0, 0, 0, 0, 0])
np.zeros(6).astype(int)
array([0, 0, 0, 0, 0, 0])
“dimension” and “length”¶
The word dimension has 2 meanings (not my fault!)
We refer to the length of a vector as its dimension, because we think of it as a point in \(d\)-dimensional space
But in terms of being a container holding numbers, it’s a 1-dimensional container regardless of its length
Make sure you understand this! (and see below)
random_walker_location = np.zeros(2)
random_walker_location
array([0., 0.])
random_walker_location.ndim
1
x = np.random.rand(5)
x
array([0.07471064, 0.47155974, 0.20281899, 0.25839375, 0.58787817])
len(x)
5
Above: in linear algebra terms, we call this a 5-dimensional vector because it’s a point in 5-dimensional space.
But in numpy it’s a 1-dimensional array.
We could say it’s a vector of length 5, but that wouldn’t be much better; “length” is also a broken word.
It could mean
len(x)
or it could mean \(\sqrt{\sum_i x_i^2}\), which is the Euclidean “length” of a vector from linear algebra.
There is no perfect solution here - just try to be very clear about what you mean and what other people mean.
x = np.random.rand(2,3,4) # a 3-D array
x.shape
(2, 3, 4)
x.size
24
x
array([[[0.15449557, 0.60940399, 0.24177921, 0.14511164],
[0.16004214, 0.0905055 , 0.13894775, 0.33383699],
[0.83780548, 0.1352112 , 0.4569923 , 0.54667053]],
[[0.18213166, 0.15180786, 0.54256792, 0.315431 ],
[0.54409351, 0.54948664, 0.14565978, 0.89951562],
[0.23192264, 0.22408342, 0.35214408, 0.25585036]]])
One of the most confusing things about numpy: what I call a “1-D array” can have 3 possible shapes:
x = np.ones(5)
print(x)
print("size:", x.size)
print("ndim:", x.ndim)
print("shape:",x.shape)
[1. 1. 1. 1. 1.]
size: 5
ndim: 1
shape: (5,)
y = np.ones((1,5))
print(y)
print("size:", y.size)
print("ndim:", y.ndim)
print("shape:",y.shape)
[[1. 1. 1. 1. 1.]]
size: 5
ndim: 2
shape: (1, 5)
z = np.ones((5,1))
print(z)
print("size:", z.size)
print("ndim:", z.ndim)
print("shape:",z.shape)
[[1.]
[1.]
[1.]
[1.]
[1.]]
size: 5
ndim: 2
shape: (5, 1)
np.array_equal(x,y)
False
np.array_equal(x,z)
False
np.array_equal(y,z)
False
x + y # makes sense
array([[2., 2., 2., 2., 2.]])
y + z # wait, what????
array([[2., 2., 2., 2., 2.],
[2., 2., 2., 2., 2.],
[2., 2., 2., 2., 2.],
[2., 2., 2., 2., 2.],
[2., 2., 2., 2., 2.]])
Above: this is called “broadcasting” and will be discussed in the next course (DSCI 523).
Indexing and slicing (10 min)¶
x = np.arange(10)
x
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
x[3]
3
x[2:]
array([2, 3, 4, 5, 6, 7, 8, 9])
x[:4]
array([0, 1, 2, 3])
x[2:5]
array([2, 3, 4])
x[2:3]
array([2])
x[-1]
9
x[-2]
8
x[5:0:-1]
array([5, 4, 3, 2, 1])
For 2D arrays:
x = np.random.randint(10,size=(4,6))
x
array([[0, 2, 2, 5, 6, 6],
[9, 5, 8, 3, 1, 2],
[0, 1, 3, 8, 8, 5],
[2, 8, 0, 3, 7, 7]])
x[3,4] # do this
7
x[3][4] # i do not like this as much
7
x[3]
array([2, 8, 0, 3, 7, 7])
len(x) # generally, just confusing
4
x.shape
(4, 6)
x[:,2] # column number 2
array([2, 8, 3, 0])
x[2:,:3]
array([[0, 1, 3],
[2, 8, 0]])
x.T
array([[0, 9, 0, 2],
[2, 5, 1, 8],
[2, 8, 3, 0],
[5, 3, 8, 3],
[6, 1, 8, 7],
[6, 2, 5, 7]])
x
array([[0, 2, 2, 5, 6, 6],
[9, 5, 8, 3, 1, 2],
[0, 1, 3, 8, 8, 5],
[2, 8, 0, 3, 7, 7]])
x[1,1] = 555555
x
array([[ 0, 2, 2, 5, 6, 6],
[ 9, 555555, 8, 3, 1, 2],
[ 0, 1, 3, 8, 8, 5],
[ 2, 8, 0, 3, 7, 7]])
z = np.zeros(5)
z
array([0., 0., 0., 0., 0.])
z[0] = 5
z
array([5., 0., 0., 0., 0.])
Boolean indexing¶
x = np.random.rand(10)
x
array([0.51112542, 0.07549209, 0.75990943, 0.30332327, 0.49273735,
0.65559496, 0.95427368, 0.11408001, 0.70675736, 0.85908288])
x + 1
array([1.51112542, 1.07549209, 1.75990943, 1.30332327, 1.49273735,
1.65559496, 1.95427368, 1.11408001, 1.70675736, 1.85908288])
x_thresh = x > 0.5
x_thresh
array([ True, False, True, False, False, True, True, False, True,
True])
x[x_thresh] = 0.5 # set all elements > 0.5 to be equal to 0.5
x
array([0.5 , 0.07549209, 0.5 , 0.30332327, 0.49273735,
0.5 , 0.5 , 0.11408001, 0.5 , 0.5 ])
x = np.random.rand(10)
x
array([0.87792822, 0.14482622, 0.04383749, 0.90660235, 0.85626749,
0.24290128, 0.44308545, 0.03931421, 0.61184713, 0.54742504])
x[x > 0.5] = 0.5
x
array([0.5 , 0.14482622, 0.04383749, 0.5 , 0.5 ,
0.24290128, 0.44308545, 0.03931421, 0.5 , 0.5 ])