DSCI 531 Lecture 4¶
In this class, we will watch the last lecture on Python by Dr. Mike Gelbart, option co-director of the UBC-Vancouver MDS program.
import numpy as np
Lecture Outline:¶
Python classes (20 min)
Python
import
(10 min)Importing your own functions (5 min)
Break (5 min)
Intriguing behaviour in Python (5 min)
References (10 min)
Function calls and references (5 min)
copy
anddeepcopy
(10 min)Scoping (10 min)
Attribution¶
The original version of these Python lectures were by Patrick Walls.
These lectures were delivered by Mike Gelbart and are available publicly here.
Python Classes (20 min)¶
We’ve seen data types like
dict
(built in to Python) andnp.ndarray
(3rd party library).Today we’ll see how to create our own data types.
These are called classes and an instance is called an object. (Classes documentation here.)
For our purposes, a type and a class are the same thing. Some discussion of the differences here.
The general approach to programming using classes and objects is called object-oriented programming.
d = dict()
Here, d
is an object, whereas dict
is a type.
type(d)
dict
type(dict)
type
We say d
is an instance of type dict
. Hence
isinstance(d, dict)
True
Why create your own types/classes?¶
Example: a circle in 2D space
You want to be able to change the circle in several ways: move it or make it bigger or smaller.
You want to be able to compute properties of the circle: its area, circumference, and its distance to the origin.
x = 2.0
y = 3.0
r = 1.0 # radius
def area(r):
"""Compute the area of a circle with radius r."""
return np.pi * r**2
def circumference(r):
"""Compute the circumference of a circle with radius r."""
return 2.0 * np.pi * r
def dist(x, y, r):
"""Compute the distance to the origin from a circle with centre x, y and radius r."""
return np.abs(np.sqrt(x**2 + y**2) - r)
dist(x, y, r)
2.605551275463989
area(r)
3.141592653589793
Now let’s say you want two circles…
x2 = -3
y2 = 4
r2 = 0.5
dist(x2, y2, r2)
4.5
This approach is very clunky. What if you accidentally call
dist(x2, y2, r) # use the radius of the other circle by accident
4.0
Ok, so maybe you can wrap everything in dictionaries:
circle1 = {"x" : x,
"y" : y,
"r" : r}
circle2 = {"x" : x2,
"y" : y2,
"r" : r2}
dist(**circle1) # fancy syntax to "unpack" a dictionary into the arguments of a function, assuming the keys of the dictionary match the expected argument names
2.605551275463989
The above is slightly better, but still awkward. For example, you might accidentally do
circle3 = {"x" : 5,
"z" : 2, # now circle3 has different property names by accident
"r" : 3}
dist(**circle3)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-13-5a4ef956265a> in <module>
----> 1 dist(**circle3)
TypeError: dist() got an unexpected keyword argument 'z'
Classes allow us to enforce the structure of our data.
That is, a circle contains a \(x\), \(y\), and \(r\).
It also helps writing functions, as you’ll see.
Above, all our functions had to take in the same data and re-explain the arguments.
Making a class¶
The syntax below creates a class, or type, called
Circle
.The functions defined inside a class are called methods.
The
__init__
method is run when you create a new instance of the class (i.e. a newCircle
object).
class Circle:
"""A circle with a centre (x,y) and radius r."""
def __init__(self, x, y, r):
self.x = x
self.y = y
self.r = r
Let’s re-create circle1
:
circle1 = Circle(2.0, 3.0, 1.0)
type(circle1)
__main__.Circle
circle1.x # retrieve one of the fields
2.0
Let’s now implement the methods:
class Circle:
"""A circle with a centre (x,y) and radius r."""
def __init__(self, x, y, r=1.0):
# For those familiar with a "constructor" - this is it!
self.x = x
self.y = y
self.r = r
def area(self):
return np.pi * self.r**2
def circumference(self):
return 2.0 * np.pi * self.r
def dist(self):
"""Compute the distance to the origin."""
return np.abs(np.sqrt(self.x**2 + self.y**2) - self.r)
Some things to note:
The inputs to the methods are just
self
.This
self
object is literally itself; thus, it gives you access to all the data inside the class usingself.x
, etc.No need to re-explain the arguments each time, just explain the data at the start of the class.
This makes the code cleaner, more reusable and more modular.
We call the functions with the
.
circle1 = Circle(2.0, 3.0, 1.0)
circle1.area()
3.141592653589793
circle1.dist()
2.605551275463989
In fact, we’ve seen this before:
d = dict()
for key, val in d.items():
pass
This is the same .
because items
is a method of the dict
class.
a = np.random.randint(10, size=8) # make a numpy array
a
array([5, 3, 4, 5, 2, 7, 0, 5])
a.shape
(8,)
a.size
8
These are fields of the ndarray
object. Here is a method:
a.sort()
a
array([0, 2, 3, 4, 5, 5, 5, 7])
Now imagine we also wanted a function to compute the distance between two circles.
This would have been a pain before:
def dist_between(x1, y1, r1, x2, y2, r2):
"""
Compute the distance between one circle and another circle.
Arguments:
x1 -- (float) x-coordinate of the centre of the first circle
y1 -- (float) y-coordinate of the centre of the first circle
r1 -- (float) radius of the first circle
x2 -- (float) x-coordinate of the centre of the second circle
y2 -- (float) y-coordinate of the centre of the second circle
r2 -- (float) radius of the second circle
"""
return np.sqrt((x1 - x2)**2 + (y1 - y2)**2) - (r1 + r2)
dist_between(x, y, r, x2, y2, r2)
3.5990195135927845
What a mess!
Now it’s much cleaner (and yes I’m violating DRY, but just for teaching purposes!):
class Circle:
"""A circle with a centre (x,y) and radius r."""
def __init__(self, x, y, r):
self.x = x
self.y = y
self.r = r
def area(self):
return np.pi * self.r**2
def circumference(self):
return 2.0 * np.pi * self.r
def dist(self):
"""Compute the distance to the origin."""
return np.abs(np.sqrt(self.x**2 + self.y**2) - self.r)
def dist_between(self, other):
"""
Compute the distance between this circle and another circle.
Parameters
----------
other : Circle
the other circle.
"""
if not isinstance(other, Circle):
raise Exception("other must be a Circle!!!")
return np.sqrt((self.x - other.x)**2 + (self.y - other.y)**2) - (self.r + other.r)
circle1 = Circle(2.0, 3.0, 1.0)
circle2 = Circle(8,9,0.1)
circle2.dist_between(circle1)
7.38528137423857
Changing data in a class¶
Classes you create are generally mutable.
You can directly change the data like this:
circle1.circumference()
6.283185307179586
circle1.r = 10
circle1.circumference()
62.83185307179586
You can also create methods that allow the user to change the object:
class Circle:
"""A circle with a centre (x,y) and radius r."""
def __init__(self, x, y, r):
self.x = x
self.y = y
self.r = r
def area(self):
return np.pi * self.r**2
def circumference(self):
return 2.0 * np.pi * self.r
def dist(self):
"""Compute the distance to the origin."""
return np.abs(np.sqrt(self.x**2 + self.y**2) - self.r)
def dist_between(self, other):
"""Compute the distance between this circle and another circle."""
return np.sqrt((self.x - other.x)**2 + (self.y - other.y)**2) - (self.r + other.r)
def translate(self, Δx, Δy):
"""Move the circle by (Δx, Δy)"""
self.x += Δx
self.y += Δy
return self # This is not needed, but is sometimes convenient.
circle1 = Circle(2.0, 3.0, 1.0)
circle1.dist()
2.605551275463989
circle1.translate(10, 10)
circle1.dist()
16.69180601295413
Other special methods¶
Aside from
__init__
, there are other special methods you might find useful.For example, what if we want to print our object.
print(circle1)
<__main__.Circle object at 0x106ce4dd8>
This doesn’t look very good.
But other objects, like numpy arrays, print out nicely:
print(a)
[0 2 3 4 5 5 5 7]
To specify how our object is printed, we can define a method called
__str__
(Python documentation).
class Circle:
"""A circle with a centre (x,y) and radius r."""
def __init__(self, x, y, r):
self.x = x
self.y = y
self.r = r
self.area = np.pi * self.r**2
def area(self):
return np.pi * self.r**2
def circumference(self):
return 2.0 * np.pi * self.r
def dist(self):
"""Compute the distance to the origin."""
return np.abs(np.sqrt(self.x**2 + self.y**2) - self.r)
def dist_between(self, other):
"""Compute the distance between this circle and another circle."""
return np.sqrt((self.x - other.x)**2 + (self.y - other.y)**2) - (self.r + other.r)
def translate(self, Δx, Δy):
"""Move the circle by (Δx, Δy)"""
self.x += Δx
self.y += Δy
return self # This is not needed, but is sometimes convenient.
def __str__(self):
return "A Circle at (%.1f, %.1f) with radius %.1f." % (self.x, self.y, self.r)
circle1 = Circle(2.0, 3.0, 1.0)
print(circle1)
A Circle at (2.0, 3.0) with radius 1.0.
Python import
(10 min)¶
It is often useful to collect a bunch of classes and functions into modules or packages (Python package documentation).
For example, numpy is a package that contains both classes (e.g.
np.ndarray
) and functions (e.g.np.sqrt
) and even constants (e.g.np.pi
).
We will discuss packages in depth in DSCI 524.
For now, we’ll just discuss importing packages.
Unfortunately, this is a bit confusing.
Ways of importing things¶
Let’s use numpy
as an example, and import it in various ways.
Import a package:
import numpy
numpy.sqrt(5)
2.23606797749979
Import a package, but refer to it by a different name:
import numpy as np
np.sqrt(5)
2.23606797749979
np.random.randn()
-0.26086894926921717
Import a particular function from a package:
from numpy.random import randn
randn() # now I can refer to it without the package/module names
-0.44897876253709507
from numpy.random import randn as random_gaussian
random_gaussian()
0.898816560829361
np.random.rand()
0.6725552471462851
It’s also possible to import everything in a module, though this is generally not recommended:
from numpy.random import *
binomial(10, 0.1)
1
Some annoying facts of life¶
The module and the function might have the same name:
import random
random.random()
0.31015211304267387
from random import random
random()
0.04886840168047635
Sometimes you may need to explicitly import submodules to use them:
import scipy
scipy.stats
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-72-d2e1a58f1dd7> in <module>
----> 1 scipy.stats
AttributeError: module 'scipy' has no attribute 'stats'
import scipy.stats
scipy.stats
<module 'scipy.stats' from '/Users/mgelbart/anaconda3/lib/python3.7/site-packages/scipy/stats/__init__.py'>
In Python, the import name and the install name do not necessarily match:
import sklearn
To install, run pip install scikit-learn
.
dir
¶
You can use dir
to look up what can be done with an object:
dir(circle1)
['__class__',
'__delattr__',
'__dict__',
'__dir__',
'__doc__',
'__eq__',
'__format__',
'__ge__',
'__getattribute__',
'__gt__',
'__hash__',
'__init__',
'__init_subclass__',
'__le__',
'__lt__',
'__module__',
'__ne__',
'__new__',
'__reduce__',
'__reduce_ex__',
'__repr__',
'__setattr__',
'__sizeof__',
'__str__',
'__subclasshook__',
'__weakref__',
'area',
'circumference',
'dist',
'dist_between',
'r',
'translate',
'x',
'y']
Importing your own functions (5 min)¶
In many MDS courses we only work in Jupyter - it is a great teaching & learning environment.
However, when we write larger pieces of code we will need to move to
.py
files.Let’s restart the kernel so that
Circle
is no longer in the environment.
circle = Circle(1,2,3)
Luckily, I have a file in this directory named
circle.py
- let’s take a look.
from circle import Circle
c = Circle(1,2,3)
my_function()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-80-7bebf01be998> in <module>
----> 1 my_function()
NameError: name 'my_function' is not defined
from circle import *
my_function()
MY_CONSTANT
5
We imported not only a class, but also a function and a single variable.
It makes sense that we can import all of these, because they are all objects in Python, just with different types:
type(Circle)
type
type(my_function)
function
type(MY_CONSTANT)
int
And c
itself has a type that we defined:
type(c)
circle.Circle
np.pi
3.141592653589793
Break (5 min)¶
import numpy as np
Intriguing behaviour in Python (5 min)¶
What do you think the code below will print?
x = 1
y = x
x = 2
y
1
And how about the next one?
x = [1]
y = x
x[0] = 2
y
[2]
References (10 min)¶
In Python, the list
x
is a reference to some location in the computer’s memory.When you set
y = x
these two variables now refer to the same location in memory - the one thatx
referred to.Setting
x[0] = 2
goes and modifies that memory. Sox
andy
are both modified.It makes no different if you set
x[0] = 2
ory[0] = 2
, both modify the same memory.
However, some basic built-in types
int
,float
,bool
etc are exceptions to this logic:When you set
y = x
it actually copies the value1
, sox
andy
are decoupled.Thus, the list example is actually the typical case, the integer example is the “special” case.
Analogy:
I share a Dropbox folder (or git repo) with you, and you modify it – I sent you the location of the stuff (this is like the list case)
I send you an email with a file attached, you download it and modify the file – I sent you the stuff itself (this is like the integer case)
And this?
x = [1]
y = x
x = [2] # before we had x[0] = 2
y
[1]
No, here we are not modifying the contents of x
, we are setting x
to refer to a new list [2]
.
Additional weirdness¶
x = np.array([1,2,3,4,5])
y = x
x = x + 5
y
array([1, 2, 3, 4, 5])
x = np.array([1,2,3,4,5])
y = x
x += 5
y
array([ 6, 7, 8, 9, 10])
So, it turns out x += 5
is not identical x = x + 5
.
The former modifies the contents of
x
.The latter first evaluates
x + 5
to a new array of the same size, and then overwrites the namex
with a reference to this new array.
Function calls and references (5 min)¶
How about these?
def foo(y):
y = "Hello from inside foo!"
return y
x = "I'm outside."
foo(x)
x
"I'm outside."
def bar(y):
y[0] = "Hello from inside foo!"
x = ["I'm outside."]
bar(x)
x
['Hello from inside foo!']
Above: the fact that you called a function is not relevant.
When pass the value of
x
into the function and it becomesy
in the function, that is basically likey = x
we had above.In the latter case, we say the function has a side effect.
x = "I'm outside."
x = foo(x)
x
'Hello from inside foo!'
Above: in this case,
x
is not getting modified insidefoo
.Rather it’s getting overwritten after the function call.
(Optional) If you’re interested, there is a bunch of terminology you can look up
pass by value (call by value)
pass by reference (call by reference)
copy-on-modify
lazy copying
…
Good news: the we don’t need to memorize special rules for calling functions.
Copying happens with
int
,float
,bool
, probably some other things I’m forgetting; the rest is “by reference”now you see why we care if objects are mutable or immutable… passing around a reference can be dangerous!
General rule: if you do
x = ...
then you’re not modifying the original, but if you dox.SOMETHING = y
orx[SOMETHING] = y
orx *= y
then you probably are.
Note: In R, life is simpler - means you’re never “modifying the original” inside a function.
copy
and deepcopy
(10 min)¶
import copy
x = [1]
y = x
x[0] = 2
y
[2]
x = [1]
y = copy.copy(x)
x[0] = 2
y
[1]
Ok, so what do you think will happen here?
x = [[1], [2,99], [3, "hi"]] # a list of lists
y = copy.copy(x)
x[0][0] = "pikachu"
print(x)
print(y)
[['pikachu'], [2, 99], [3, 'hi']]
[['pikachu'], [2, 99], [3, 'hi']]
What happened?
copy
makes the containers different, i.e. the outer list.But the outer lists both point to the same data.
This is what happens after
y = copy.copy(x)
:
We can use is
to tell apart these scenarios.
x == y # they are both lists of the same lists
True
x is y # but they are not the *same* lists of that stuff
False
So, by that logic…
y.append(5)
print(x)
print(y)
[['pikachu'], [2, 99], [3, 'hi']]
[['pikachu'], [2, 99], [3, 'hi'], 5]
x == y
False
That makes sense, as weird as it seems.
In short,
copy
copies one level down.What if we want to copy everything?
Enter our friend
deepcopy
:
x = [[1], [2,99], [3, "hi"]]
y = copy.deepcopy(x)
x[0][0] = "pikachu"
print(x)
print(y)
[['pikachu'], [2, 99], [3, 'hi']]
[[1], [2, 99], [3, 'hi']]
Scoping (10 min)¶
def f():
x = 10
x = 5
f()
x
5
def f():
new_variable = 10
f()
new_variable
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-107-07ad50490e16> in <module>
3
4 f()
----> 5 new_variable
NameError: name 'new_variable' is not defined
It looks like the
x
inside and outside the function are different.It looks like
new_variable
is defined only for use inside the function.That is generally a good way of thinking, and is more true in other languages.
This is called scope (see Wikipedia article).
However, in Python things are dangerously loose and permissive, so be careful.
def bat():
print(s)
s = "hello world"
bat()
hello world
def bat(s):
print(s)
s = "hello world"
bat("another string")
another string
What happened?
In the first case,
s
was not defined, so it was borrowed from the scope outside the function.In the second case,
s
was passed in directly, so it was used.This is very worrying, because of the following:
def modify_the_stuff():
the_stuff[0] = 99999
the_stuff = [1,2,3]
modify_the_stuff()
the_stuff
[99999, 2, 3]
Above:
modify_the_stuff
modified a variable that was not even passed in as an argument!So functions can really mess with your stuff without you knowing.
Please do not write code like this!
Safest: functions with no side effects.
Acceptable: functions with side effects, clearly documented.
Disaster: functions with undocumented side effects on its arguments.
Complete disaster: functions modifying stuff that you didn’t even pass into the function.
Some other things to avoid:
def func(s, len):
print(len(s))
func("hello", 5)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-111-14060809a19f> in <module>
2 print(len(s))
3
----> 4 func("hello", 5)
<ipython-input-111-14060809a19f> in func(s, len)
1 def func(s, len):
----> 2 print(len(s))
3
4 func("hello", 5)
TypeError: 'int' object is not callable
Above: don’t do this - inside the function there’s a variable called
len
which is overwriting the built-inlen
function.Below: functions can access other functions if they are all in the global scope:
def hello(a):
a = a + 5
return a
a = 1
hello(a) # hello(1)
6
def f():
print("Hello from f!")
def g():
f()
g()
Hello from f!
That is, there’s no need to pass the function f
into g
to call it, because f
is “global”.