from IPython.display import IFrame
from IPython.display import Markdown
# Additional styling ; should be moved into helpers
from IPython.core.display import display, HTML
HTML("<style>{}</style>".format(open("rise.css").read()))
/tmp/ipykernel_2275/129583323.py:5: DeprecationWarning: Importing display from IPython.core.display is deprecated since IPython 7.14, please import from IPython display
from IPython.core.display import display, HTML
Class 4C: Introduction to Programming in Python II#
We will begin soon! Until then, feel free to use the chat to socialize, and enjoy the music!

Announcements#
Lab 1 Feedback is released, please check the feedback!
Milestone 1 was due last night at 6 PM, there’s still a 48-hour grace period
Bonus Test 1 will be next week (5 mins)
Demo of how to book a reservation
Python II#
In this class, we go through a notebook by a former colleague, Dr. Mike Gelbart, option co-director of the UBC-Vancouver MDS program.
If you prefer, you can also watch his recording of the same material.
Class Outline#
Comments (0 min)
Why Python? (0 min)
Introduction to Pandas (10 min)
Loops (15 min)
Comprehensions (5 min)
Functions intro (10 min)
Attribution#
The original version of these Python lectures were by Patrick Walls.
These lectures were delivered by Mike Gelbart and are available publicly here.
Why Python? (0 min)#
Why did we choose Python in DATA 301?
Extremely popular in DS (and beyond!)
Relatively easy to learn
Good documentation
Huge user community
Lots of Stack Overflow and other forums
Lots of useful packages (more on this next week)
Introduction to Pandas#
Pandas DataFrames#
At the very basic level, Pandas objects are a structured object in which the rows and columns are identified with labels (rather than simple integer indices).
As we will see during the course of this chapter, Pandas provides a host of useful tools, methods, and functionality on top of the basic data structures, but nearly everything that follows will require an understanding of what these structures are.
You can import pandas using the following convention:
import pandas as pd
import pandas as pd
The fundamental structure in Pandas is the DataFrame
.
Like the Series
object discussed in the previous section, the DataFrame
can be thought of either as a generalization of a NumPy array, or as a specialization of a Python dictionary.
We’ll now take a look at each of these perspectives.
Loading Data into a Jupyter Notebook#
In Milestone 2 you will be required to load your dataset into a Jupyter Notebook.
To load your dataset into a Jupyter notebook:#
# - start a new Jupyter Lab session in your project repo
# - Create a new notebook in the analysis directory/folder
# - import pandas as pd
# - Use the pd.read_csv('path_to_data')
import pandas as pd
pd.read_csv('../data/raw/data.csv')
DataFrame from URL (demo)#
If your dataset exists on the web as a publicly accessible file, you can create a DataFrame directly from the URL to the CSV.
read_csv(path)
is a function from the Pandas package that creates a DataFrame from a CSV file.The argument
path
can be a URL or a reference to a local file.
import pandas as pd
pd.read_csv("https://github.com/firasm/bits/raw/master/fruits.csv")
Fruit Name | Mass(g) | Colour | Rating | |
---|---|---|---|---|
0 | Apple | 200 | Red | 8 |
1 | Banana | 250 | Yellow | 9 |
2 | Cantoloupe | 600 | Orange | 10 |
3 | Cranberry | 50 | Red | 6 |
4 | Blueberry | 20 | Blue | 9 |
5 | Strawberry | 120 | Red | 10 |
6 | Papaya | 220 | Green | 8 |
7 | Lemon | 200 | Yellow | 9 |
8 | Avocado | 300 | Green | 7 |
9 | Jackfruit | 500 | Yellow | 8 |
I can store the dataframe as an object like this:
fruits = pd.read_csv("https://github.com/firasm/bits/raw/master/fruits.csv")
fruits
Fruit Name | Mass(g) | Colour | Rating | |
---|---|---|---|---|
0 | Apple | 200 | Red | 8 |
1 | Banana | 250 | Yellow | 9 |
2 | Cantoloupe | 600 | Orange | 10 |
3 | Cranberry | 50 | Red | 6 |
4 | Blueberry | 20 | Blue | 9 |
5 | Strawberry | 120 | Red | 10 |
6 | Papaya | 220 | Green | 8 |
7 | Lemon | 200 | Yellow | 9 |
8 | Avocado | 300 | Green | 7 |
9 | Jackfruit | 500 | Yellow | 8 |
More about this next week!
Loops (10 min)#
Loops allow us to execute a block of code multiple times.
We will focus on
for
loops
for n in [2, 7, -1, 5]:
print("The number is", n, "its square is", n**2)
# this is inside the loop
print(n)
# this is outside the loop
n
The number is 2 its square is 4
2
The number is 7 its square is 49
7
The number is -1 its square is 1
-1
The number is 5 its square is 25
5
5
The main points to notice:
Keyword
for
begins the loopColon
:
ends the first line of the loopWe can iterate over any kind of iterable: list, tuple, range, string. In this case, we are iterating over the values in a list
Block of code indented is executed for each value in the list (hence the name “for” loops, sometimes also called “for each” loops)
The loop ends after the variable
n
has taken all the values in the list
word = "Python"
for letter in word:
print("Gimme a " + letter + "!")
print("What's that spell?!! " + word + "!")
Gimme a P!
Gimme a y!
Gimme a t!
Gimme a h!
Gimme a o!
Gimme a n!
What's that spell?!! Python!
A very common pattern is to use
for
withrange
.range
gives you a sequence of integers up to some value.
for i in range(0, 10):
print(i)
0
1
2
3
4
5
6
7
8
9
We can also specify a start value and a skip-by value with range
:
for i in range(1, 101, 10):
print(i)
1
11
21
31
41
51
61
71
81
91
We can write a loop inside another loop to iterate over multiple dimensions of data. Consider the following loop as enumerating the coordinates in a 3 by 3 grid of points.
for x in [1, 2, 3]:
for y in ["a", "b", "c"]:
print((x, y))
(1, 'a')
(1, 'b')
(1, 'c')
(2, 'a')
(2, 'b')
(2, 'c')
(3, 'a')
(3, 'b')
(3, 'c')
list_1 = [1, 2, 3]
list_2 = ["a", "b", "c"]
for i in range(3):
print(list_1[i], list_2[i])
1 a
2 b
3 c
We can loop through key-value pairs of a dictionary using .items()
:
courses = {521: "awesome", 551: "riveting", 511: "naptime!"}
for course_num, description in courses.items():
print(f"DSCI {course_num} is {description}")
DSCI 521 is awesome
DSCI 551 is riveting
DSCI 511 is naptime!
for course_num in courses.keys():
print(f"key: {course_num}")
key: 521
key: 551
key: 511
for description in courses.values():
print(f"value: {description}")
value: awesome
value: riveting
value: naptime!
for course_num in courses:
print(course_num, courses[course_num])
521 awesome
551 riveting
511 naptime!
Above: the general syntax is for key, value in dictionary.items():
while
loops#
We can also use a
while
loop to excute a block of code several times.In reality, I rarely use these.
Beware! If the conditional expression is always
True
, then you’ve got an infintite loop!(Use the “Stop” button in the toolbar above, or Ctrl-C in the terminal, to kill the program if you get an infinite loop.)
n = 10
while n > 0:
print(n)
n = n - 1
print("Blast off!")
10
9
8
7
6
5
4
3
2
1
Blast off!
Comprehensions (5 min)#
Comprehensions allow us to build lists/tuples/sets/dictionaries in one convenient, compact line of code.
words = ["hello", "goodbye", "the", "antidisestablishmentarianism"]
y = list()
for word in words:
y.append(word[-1])
y
['o', 'e', 'e', 'm']
y = [word[-1] for word in words] # NEW: list comprehension
y
['o', 'e', 'e', 'm']
Fun with List comprehensions#
[word[-1] for word in words if word[-1] != "e"]
['o', 'm']
[word[-1] for word in words if ("e" not in word[-1]) and ("o" not in word[-1])]
['m']
word_lengths = {word: len(word) for word in words} # dictionary comprehension
word_lengths
{'hello': 5, 'goodbye': 7, 'the': 3, 'antidisestablishmentarianism': 28}
word_lengths = {}
for word in words:
word_lengths[word] = len(words)
word_lengths
{'hello': 4, 'goodbye': 4, 'the': 4, 'antidisestablishmentarianism': 4}
y = {word[-1] for word in words} # set comprehension
print(y)
{'o', 'm', 'e'}
# this is NOT a tuple comprehension - more on generators in another course
y = (word[-1] for word in words)
y
<generator object <genexpr> at 0x7f3aabecdf50>
Functions intro (5 min)#
Define a function to re-use a block of code with different input parameters, also known as arguments.
For example, define a function called
square
which takes one input parametern
and returns the squaren**2
.
def square(n):
n_squared = n**2
return n_squared
square(2)
4
square(100)
10000
res = square(12345)
res
152399025
Begins with
def
keyword, function name, input parameters and then colon (:
)Function block defined by indentation
Output or “return” value of the function is given by the
return
keyword
Side effects#
If a function changes the variables passed into it, then it is said to have side effects
Example:
def silly_sum(sri):
sri.append(0)
return sum(sri)
silly_sum([1, 2, 3, 4])
10
Looks good, like it sums the numbers? But wait…
lst = [1, 2, 3, 4]
silly_sum(lst)
10
silly_sum(lst)
10
lst
[1, 2, 3, 4, 0, 0]
If you function has side effects like this, you must mention it in the documentation (later today).
In general avoid!
Null return type#
If you do not specify a return value, the function returns None
when it terminates:
def f(x):
x + 1 # no return!
if x == 999:
return
print(f(0))
None
Attribution#
This notebook contains an excerpt from the Python Data Science Handbook by Jake VanderPlas; the content is available on GitHub.
The text is released under the CC-BY-NC-ND license, and code is released under the MIT license. If you find this content useful, please consider supporting the work by buying the book!
Comments in python (0 min)#