DSCI 531 Lecture 4¶

In this class, we will watch the last lecture on Python by Dr. Mike Gelbart, option co-director of the UBC-Vancouver MDS program.

import numpy as np

Lecture Outline:¶

  • Python classes (20 min)

  • Python import(10 min)

  • Importing your own functions (5 min)

  • Break (5 min)

  • Intriguing behaviour in Python (5 min)

  • References (10 min)

  • Function calls and references (5 min)

  • copy and deepcopy (10 min)

  • Scoping (10 min)

Attribution¶

Python Classes (20 min)¶

  • We’ve seen data types like dict (built in to Python) and np.ndarray (3rd party library).

  • Today we’ll see how to create our own data types.

  • These are called classes and an instance is called an object. (Classes documentation here.)

  • For our purposes, a type and a class are the same thing. Some discussion of the differences here.

  • The general approach to programming using classes and objects is called object-oriented programming.

d = dict()

Here, d is an object, whereas dict is a type.

type(d)
dict
type(dict)
type

We say d is an instance of type dict. Hence

isinstance(d, dict)
True

Why create your own types/classes?¶

  • Example: a circle in 2D space

  • You want to be able to change the circle in several ways: move it or make it bigger or smaller.

  • You want to be able to compute properties of the circle: its area, circumference, and its distance to the origin.

x = 2.0
y = 3.0
r = 1.0 # radius

def area(r):
    """Compute the area of a circle with radius r."""
    return np.pi * r**2

def circumference(r):
    """Compute the circumference of a circle with radius r."""
    return 2.0 * np.pi * r

def dist(x, y, r):
    """Compute the distance to the origin from a circle with centre x, y and radius r."""
    return np.abs(np.sqrt(x**2 + y**2) - r)
dist(x, y, r)
2.605551275463989
area(r)
3.141592653589793

Now let’s say you want two circles…

x2 = -3
y2 = 4
r2 = 0.5

dist(x2, y2, r2)
4.5

This approach is very clunky. What if you accidentally call

dist(x2, y2, r) # use the radius of the other circle by accident
4.0

Ok, so maybe you can wrap everything in dictionaries:

circle1 = {"x" : x,
           "y" : y,
           "r" : r}

circle2 = {"x" : x2,
           "y" : y2,
           "r" : r2}

dist(**circle1) # fancy syntax to "unpack" a dictionary into the arguments of a function, assuming the keys of the dictionary match the expected argument names
2.605551275463989

The above is slightly better, but still awkward. For example, you might accidentally do

circle3 = {"x" : 5,
           "z" : 2,  # now circle3 has different property names by accident
           "r" : 3}
dist(**circle3)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-13-5a4ef956265a> in <module>
----> 1 dist(**circle3)

TypeError: dist() got an unexpected keyword argument 'z'
  • Classes allow us to enforce the structure of our data.

    • That is, a circle contains a \(x\), \(y\), and \(r\).

  • It also helps writing functions, as you’ll see.

    • Above, all our functions had to take in the same data and re-explain the arguments.

Making a class¶

  • The syntax below creates a class, or type, called Circle.

  • The functions defined inside a class are called methods.

  • The __init__ method is run when you create a new instance of the class (i.e. a new Circle object).

class Circle:
    """A circle with a centre (x,y) and radius r."""
    
    def __init__(self, x, y, r):
        self.x = x
        self.y = y
        self.r = r

Let’s re-create circle1:

circle1 = Circle(2.0, 3.0, 1.0)
type(circle1)
__main__.Circle
circle1.x # retrieve one of the fields
2.0

Let’s now implement the methods:

class Circle:
    """A circle with a centre (x,y) and radius r."""
    
    def __init__(self, x, y, r=1.0):
        # For those familiar with a "constructor" - this is it!
        self.x = x
        self.y = y
        self.r = r
        
    def area(self):
        return np.pi * self.r**2

    def circumference(self):
        return 2.0 * np.pi * self.r

    def dist(self):
        """Compute the distance to the origin."""
        return np.abs(np.sqrt(self.x**2 + self.y**2) - self.r)

Some things to note:

  • The inputs to the methods are just self.

  • This self object is literally itself; thus, it gives you access to all the data inside the class using self.x, etc.

  • No need to re-explain the arguments each time, just explain the data at the start of the class.

    • This makes the code cleaner, more reusable and more modular.

  • We call the functions with the .

circle1 = Circle(2.0, 3.0, 1.0)
circle1.area()
3.141592653589793
circle1.dist()
2.605551275463989

In fact, we’ve seen this before:

d = dict()

for key, val in d.items():
    pass

This is the same . because items is a method of the dict class.

a = np.random.randint(10, size=8) # make a numpy array
a
array([5, 3, 4, 5, 2, 7, 0, 5])
a.shape
(8,)
a.size
8

These are fields of the ndarray object. Here is a method:

a.sort()
a
array([0, 2, 3, 4, 5, 5, 5, 7])
  • Now imagine we also wanted a function to compute the distance between two circles.

  • This would have been a pain before:

def dist_between(x1, y1, r1, x2, y2, r2):
    """
    Compute the distance between one circle and another circle.
    
    Arguments:
    x1 -- (float) x-coordinate of the centre of the first circle
    y1 -- (float) y-coordinate of the centre of the first circle
    r1 -- (float) radius of the first circle
    x2 -- (float) x-coordinate of the centre of the second circle
    y2 -- (float) y-coordinate of the centre of the second circle
    r2 -- (float) radius of the second circle
    """
    return np.sqrt((x1 - x2)**2 + (y1 - y2)**2) - (r1 + r2)

dist_between(x, y, r, x2, y2, r2)
3.5990195135927845
  • What a mess!

  • Now it’s much cleaner (and yes I’m violating DRY, but just for teaching purposes!):

class Circle:
    """A circle with a centre (x,y) and radius r."""
    
    def __init__(self, x, y, r):
        self.x = x
        self.y = y
        self.r = r
        
    def area(self):
        return np.pi * self.r**2

    def circumference(self):
        return 2.0 * np.pi * self.r

    def dist(self):
        """Compute the distance to the origin."""
        return np.abs(np.sqrt(self.x**2 + self.y**2) - self.r)
    
    def dist_between(self, other):
        """
        Compute the distance between this circle and another circle.
        
        Parameters
        ----------
        other : Circle
            the other circle.
        """
        if not isinstance(other, Circle):
            raise Exception("other must be a Circle!!!")
        
        return np.sqrt((self.x - other.x)**2 + (self.y - other.y)**2) - (self.r + other.r)
circle1 = Circle(2.0, 3.0, 1.0)
circle2 = Circle(8,9,0.1)
circle2.dist_between(circle1)
7.38528137423857

Changing data in a class¶

  • Classes you create are generally mutable.

  • You can directly change the data like this:

circle1.circumference()
6.283185307179586
circle1.r = 10
circle1.circumference()
62.83185307179586

You can also create methods that allow the user to change the object:

class Circle:
    """A circle with a centre (x,y) and radius r."""
    
    def __init__(self, x, y, r):
        self.x = x
        self.y = y
        self.r = r
        
    def area(self):
        return np.pi * self.r**2

    def circumference(self):
        return 2.0 * np.pi * self.r

    def dist(self):
        """Compute the distance to the origin."""
        return np.abs(np.sqrt(self.x**2 + self.y**2) - self.r)
    
    def dist_between(self, other):
        """Compute the distance between this circle and another circle."""
        return np.sqrt((self.x - other.x)**2 + (self.y - other.y)**2) - (self.r + other.r)
    
    def translate(self, Δx, Δy):
        """Move the circle by (Δx, Δy)"""
        self.x += Δx
        self.y += Δy
        return self # This is not needed, but is sometimes convenient.
circle1 = Circle(2.0, 3.0, 1.0)
circle1.dist()
2.605551275463989
circle1.translate(10, 10)
circle1.dist()
16.69180601295413

Other special methods¶

  • Aside from __init__, there are other special methods you might find useful.

  • For example, what if we want to print our object.

print(circle1)
<__main__.Circle object at 0x106ce4dd8>
  • This doesn’t look very good.

  • But other objects, like numpy arrays, print out nicely:

print(a)
[0 2 3 4 5 5 5 7]
  • To specify how our object is printed, we can define a method called __str__ (Python documentation).

class Circle:
    """A circle with a centre (x,y) and radius r."""
    
    def __init__(self, x, y, r):
        self.x = x
        self.y = y
        self.r = r
        self.area = np.pi * self.r**2
        
    def area(self):
        return np.pi * self.r**2

    def circumference(self):
        return 2.0 * np.pi * self.r

    def dist(self):
        """Compute the distance to the origin."""
        return np.abs(np.sqrt(self.x**2 + self.y**2) - self.r)
    
    def dist_between(self, other):
        """Compute the distance between this circle and another circle."""
        return np.sqrt((self.x - other.x)**2 + (self.y - other.y)**2) - (self.r + other.r)
    
    def translate(self, Δx, Δy):
        """Move the circle by (Δx, Δy)"""
        self.x += Δx
        self.y += Δy
        return self # This is not needed, but is sometimes convenient.
        
    def __str__(self):
        return "A Circle at (%.1f, %.1f) with radius %.1f." % (self.x, self.y, self.r)
circle1 = Circle(2.0, 3.0, 1.0)
print(circle1)
A Circle at (2.0, 3.0) with radius 1.0.

Python import (10 min)¶

  • It is often useful to collect a bunch of classes and functions into modules or packages (Python package documentation).

    • For example, numpy is a package that contains both classes (e.g. np.ndarray) and functions (e.g. np.sqrt) and even constants (e.g. np.pi).

  • We will discuss packages in depth in DSCI 524.

  • For now, we’ll just discuss importing packages.

  • Unfortunately, this is a bit confusing.

Ways of importing things¶

Let’s use numpy as an example, and import it in various ways.

Import a package:

import numpy
numpy.sqrt(5)
2.23606797749979

Import a package, but refer to it by a different name:

import numpy as np
np.sqrt(5)
2.23606797749979
np.random.randn()
-0.26086894926921717

Import a particular function from a package:

from numpy.random import randn
randn() # now I can refer to it without the package/module names
-0.44897876253709507
from numpy.random import randn as random_gaussian
random_gaussian()
0.898816560829361
np.random.rand()
0.6725552471462851

It’s also possible to import everything in a module, though this is generally not recommended:

from numpy.random import *
binomial(10, 0.1)
1

Some annoying facts of life¶

The module and the function might have the same name:

import random
random.random()
0.31015211304267387
from random import random
random()
0.04886840168047635

Sometimes you may need to explicitly import submodules to use them:

import scipy
scipy.stats
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-72-d2e1a58f1dd7> in <module>
----> 1 scipy.stats

AttributeError: module 'scipy' has no attribute 'stats'
import scipy.stats
scipy.stats
<module 'scipy.stats' from '/Users/mgelbart/anaconda3/lib/python3.7/site-packages/scipy/stats/__init__.py'>

In Python, the import name and the install name do not necessarily match:

import sklearn

To install, run pip install scikit-learn.

dir¶

You can use dir to look up what can be done with an object:

dir(circle1)
['__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'area',
 'circumference',
 'dist',
 'dist_between',
 'r',
 'translate',
 'x',
 'y']

Importing your own functions (5 min)¶

  • In many MDS courses we only work in Jupyter - it is a great teaching & learning environment.

  • However, when we write larger pieces of code we will need to move to .py files.

  • Let’s restart the kernel so that Circle is no longer in the environment.

circle = Circle(1,2,3)
  • Luckily, I have a file in this directory named circle.py - let’s take a look.

from circle import Circle
c = Circle(1,2,3)
my_function()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-80-7bebf01be998> in <module>
----> 1 my_function()

NameError: name 'my_function' is not defined
from circle import *
my_function()
MY_CONSTANT
5
  • We imported not only a class, but also a function and a single variable.

  • It makes sense that we can import all of these, because they are all objects in Python, just with different types:

type(Circle)
type
type(my_function)
function
type(MY_CONSTANT)
int

And c itself has a type that we defined:

type(c)
circle.Circle
np.pi
3.141592653589793

Break (5 min)¶

import numpy as np

Intriguing behaviour in Python (5 min)¶

What do you think the code below will print?

x = 1
y = x
x = 2
y
1

And how about the next one?

x = [1]
y = x
x[0] = 2
y
[2]

References (10 min)¶

  • In Python, the list x is a reference to some location in the computer’s memory.

  • When you set y = x these two variables now refer to the same location in memory - the one that x referred to.

  • Setting x[0] = 2 goes and modifies that memory. So x and y are both modified.

    • It makes no different if you set x[0] = 2 or y[0] = 2, both modify the same memory.

  • However, some basic built-in types int, float, bool etc are exceptions to this logic:

    • When you set y = x it actually copies the value 1, so x and y are decoupled.

    • Thus, the list example is actually the typical case, the integer example is the “special” case.

  • Analogy:

    • I share a Dropbox folder (or git repo) with you, and you modify it – I sent you the location of the stuff (this is like the list case)

    • I send you an email with a file attached, you download it and modify the file – I sent you the stuff itself (this is like the integer case)

And this?

x = [1]
y = x
x = [2] # before we had x[0] = 2
y
[1]




No, here we are not modifying the contents of x, we are setting x to refer to a new list [2].

Additional weirdness¶

x = np.array([1,2,3,4,5])
y = x
x = x + 5
y
array([1, 2, 3, 4, 5])
x = np.array([1,2,3,4,5])
y = x
x += 5
y
array([ 6,  7,  8,  9, 10])

So, it turns out x += 5 is not identical x = x + 5.

  • The former modifies the contents of x.

  • The latter first evaluates x + 5 to a new array of the same size, and then overwrites the name x with a reference to this new array.

Function calls and references (5 min)¶

How about these?

def foo(y):
    y = "Hello from inside foo!"
    return y

x = "I'm outside."
foo(x)
x
"I'm outside."
def bar(y):
    y[0] = "Hello from inside foo!"
x = ["I'm outside."]
bar(x)
x
['Hello from inside foo!']
  • Above: the fact that you called a function is not relevant.

  • When pass the value of x into the function and it becomes y in the function, that is basically like y = x we had above.

  • In the latter case, we say the function has a side effect.

x = "I'm outside."
x = foo(x)
x
'Hello from inside foo!'
  • Above: in this case, x is not getting modified inside foo.

  • Rather it’s getting overwritten after the function call.

  • (Optional) If you’re interested, there is a bunch of terminology you can look up

    • pass by value (call by value)

    • pass by reference (call by reference)

    • copy-on-modify

    • lazy copying

    • …

  • Good news: the we don’t need to memorize special rules for calling functions.

  • Copying happens with int, float, bool, probably some other things I’m forgetting; the rest is “by reference”

  • now you see why we care if objects are mutable or immutable… passing around a reference can be dangerous!

  • General rule: if you do x = ... then you’re not modifying the original, but if you do x.SOMETHING = y or x[SOMETHING] = y or x *= y then you probably are.

Note: In R, life is simpler - means you’re never “modifying the original” inside a function.

copy and deepcopy (10 min)¶

import copy

x = [1]
y = x
x[0] = 2
y
[2]
x = [1]
y = copy.copy(x)
x[0] = 2
y
[1]

Ok, so what do you think will happen here?

x = [[1], [2,99], [3, "hi"]] # a list of lists

y = copy.copy(x) 

x[0][0] = "pikachu"
print(x)
print(y)
[['pikachu'], [2, 99], [3, 'hi']]
[['pikachu'], [2, 99], [3, 'hi']]




What happened?

  • copy makes the containers different, i.e. the outer list.

  • But the outer lists both point to the same data.

  • This is what happens after y = copy.copy(x):

We can use is to tell apart these scenarios.

x == y       # they are both lists of the same lists
True
x is y       # but they are not the *same* lists of that stuff
False

So, by that logic…

y.append(5)
print(x)
print(y)
[['pikachu'], [2, 99], [3, 'hi']]
[['pikachu'], [2, 99], [3, 'hi'], 5]
x == y
False




That makes sense, as weird as it seems.

  • In short, copy copies one level down.

  • What if we want to copy everything?

  • Enter our friend deepcopy:

x = [[1], [2,99], [3, "hi"]] 

y = copy.deepcopy(x)

x[0][0] = "pikachu"
print(x)
print(y)
[['pikachu'], [2, 99], [3, 'hi']]
[[1], [2, 99], [3, 'hi']]

Scoping (10 min)¶

def f():
    x = 10

x = 5
f()
x
5
def f():
    new_variable = 10

f()
new_variable
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-107-07ad50490e16> in <module>
      3 
      4 f()
----> 5 new_variable

NameError: name 'new_variable' is not defined
  • It looks like the x inside and outside the function are different.

  • It looks like new_variable is defined only for use inside the function.

  • That is generally a good way of thinking, and is more true in other languages.

  • This is called scope (see Wikipedia article).

  • However, in Python things are dangerously loose and permissive, so be careful.

def bat():
    print(s)
    
s = "hello world"
bat()
hello world
def bat(s):
    print(s)
    
s = "hello world"    
bat("another string")
another string

What happened?

  • In the first case, s was not defined, so it was borrowed from the scope outside the function.

  • In the second case, s was passed in directly, so it was used.

  • This is very worrying, because of the following:

def modify_the_stuff():
    the_stuff[0] = 99999
    
the_stuff = [1,2,3]
modify_the_stuff()
the_stuff
[99999, 2, 3]
  • Above: modify_the_stuff modified a variable that was not even passed in as an argument!

  • So functions can really mess with your stuff without you knowing.

  • Please do not write code like this!

    • Safest: functions with no side effects.

    • Acceptable: functions with side effects, clearly documented.

    • Disaster: functions with undocumented side effects on its arguments.

    • Complete disaster: functions modifying stuff that you didn’t even pass into the function.

Some other things to avoid:

def func(s, len):
    print(len(s))
    
func("hello", 5)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-111-14060809a19f> in <module>
      2     print(len(s))
      3 
----> 4 func("hello", 5)

<ipython-input-111-14060809a19f> in func(s, len)
      1 def func(s, len):
----> 2     print(len(s))
      3 
      4 func("hello", 5)

TypeError: 'int' object is not callable
  • Above: don’t do this - inside the function there’s a variable called len which is overwriting the built-in len function.

  • Below: functions can access other functions if they are all in the global scope:

def hello(a):
    a = a + 5 
    return a

a = 1
hello(a) # hello(1)
6
def f():
    print("Hello from f!")
    
def g():
    f()
    
g()
Hello from f!

That is, there’s no need to pass the function f into g to call it, because f is “global”.

That’s all, folks!¶