from IPython.display import IFrame
from IPython.display import Markdown

# Additional styling ; should be moved into helpers
from IPython.core.display import display, HTML

HTML("<style>{}</style>".format(open("rise.css").read()))
/tmp/ipykernel_2275/129583323.py:5: DeprecationWarning: Importing display from IPython.core.display is deprecated since IPython 7.14, please import from IPython display
  from IPython.core.display import display, HTML

Class 4C: Introduction to Programming in Python II#

We will begin soon! Until then, feel free to use the chat to socialize, and enjoy the music!

../../../_images/programming4.jpg


Photo by Christina Morillo from Pexels
Firas Moosvi

Announcements#

  1. Lab 1 Feedback is released, please check the feedback!

  1. Milestone 1 was due last night at 6 PM, there’s still a 48-hour grace period

  1. Bonus Test 1 will be next week (5 mins)

Python II#

In this class, we go through a notebook by a former colleague, Dr. Mike Gelbart, option co-director of the UBC-Vancouver MDS program.

If you prefer, you can also watch his recording of the same material.

Class Outline#

  • Comments (0 min)

  • Why Python? (0 min)

  • Introduction to Pandas (10 min)

  • Loops (15 min)

  • Comprehensions (5 min)

  • Functions intro (10 min)

Attribution#

Comments in python (0 min)#

x = 1  # this is a comment
"""
this is a string, which does nothing
and can be used as a comment
"""

7


x = 1

Why Python? (0 min)#

  • Why did we choose Python in DATA 301?

    • Extremely popular in DS (and beyond!)

    • Relatively easy to learn

    • Good documentation

    • Huge user community

      • Lots of Stack Overflow and other forums

      • Lots of useful packages (more on this next week)

Introduction to Pandas#

Pandas DataFrames#

At the very basic level, Pandas objects are a structured object in which the rows and columns are identified with labels (rather than simple integer indices).

As we will see during the course of this chapter, Pandas provides a host of useful tools, methods, and functionality on top of the basic data structures, but nearly everything that follows will require an understanding of what these structures are.

You can import pandas using the following convention:

import pandas as pd
import pandas as pd

The fundamental structure in Pandas is the DataFrame. Like the Series object discussed in the previous section, the DataFrame can be thought of either as a generalization of a NumPy array, or as a specialization of a Python dictionary. We’ll now take a look at each of these perspectives.

Loading Data into a Jupyter Notebook#

In Milestone 2 you will be required to load your dataset into a Jupyter Notebook.

To load your dataset into a Jupyter notebook:#

# - start a new Jupyter Lab session in your project repo
# - Create a new notebook in the analysis directory/folder
# - import pandas as pd
# - Use the pd.read_csv('path_to_data')

import pandas as pd
pd.read_csv('../data/raw/data.csv')

DataFrame from URL (demo)#

If your dataset exists on the web as a publicly accessible file, you can create a DataFrame directly from the URL to the CSV.

  • read_csv(path) is a function from the Pandas package that creates a DataFrame from a CSV file.

    • The argument path can be a URL or a reference to a local file.

import pandas as pd

pd.read_csv("https://github.com/firasm/bits/raw/master/fruits.csv")
Fruit Name Mass(g) Colour Rating
0 Apple 200 Red 8
1 Banana 250 Yellow 9
2 Cantoloupe 600 Orange 10
3 Cranberry 50 Red 6
4 Blueberry 20 Blue 9
5 Strawberry 120 Red 10
6 Papaya 220 Green 8
7 Lemon 200 Yellow 9
8 Avocado 300 Green 7
9 Jackfruit 500 Yellow 8

I can store the dataframe as an object like this:

fruits = pd.read_csv("https://github.com/firasm/bits/raw/master/fruits.csv")

fruits
Fruit Name Mass(g) Colour Rating
0 Apple 200 Red 8
1 Banana 250 Yellow 9
2 Cantoloupe 600 Orange 10
3 Cranberry 50 Red 6
4 Blueberry 20 Blue 9
5 Strawberry 120 Red 10
6 Papaya 220 Green 8
7 Lemon 200 Yellow 9
8 Avocado 300 Green 7
9 Jackfruit 500 Yellow 8

More about this next week!

Loops (10 min)#

  • Loops allow us to execute a block of code multiple times.

  • We will focus on for loops

for n in [2, 7, -1, 5]:
    print("The number is", n, "its square is", n**2)
    # this is inside the loop
    print(n)

# this is outside the loop
n
The number is 2 its square is 4
2
The number is 7 its square is 49
7
The number is -1 its square is 1
-1
The number is 5 its square is 25
5
5

The main points to notice:

  • Keyword for begins the loop

  • Colon : ends the first line of the loop

  • We can iterate over any kind of iterable: list, tuple, range, string. In this case, we are iterating over the values in a list

  • Block of code indented is executed for each value in the list (hence the name “for” loops, sometimes also called “for each” loops)

  • The loop ends after the variable n has taken all the values in the list

word = "Python"
for letter in word:
    print("Gimme a " + letter + "!")

print("What's that spell?!! " + word + "!")
Gimme a P!
Gimme a y!
Gimme a t!
Gimme a h!
Gimme a o!
Gimme a n!
What's that spell?!! Python!
  • A very common pattern is to use for with range.

  • range gives you a sequence of integers up to some value.

for i in range(0, 10):
    print(i)
0
1
2
3
4
5
6
7
8
9

We can also specify a start value and a skip-by value with range:

for i in range(1, 101, 10):
    print(i)
1
11
21
31
41
51
61
71
81
91

We can write a loop inside another loop to iterate over multiple dimensions of data. Consider the following loop as enumerating the coordinates in a 3 by 3 grid of points.

for x in [1, 2, 3]:
    for y in ["a", "b", "c"]:
        print((x, y))
(1, 'a')
(1, 'b')
(1, 'c')
(2, 'a')
(2, 'b')
(2, 'c')
(3, 'a')
(3, 'b')
(3, 'c')
list_1 = [1, 2, 3]
list_2 = ["a", "b", "c"]
for i in range(3):
    print(list_1[i], list_2[i])
1 a
2 b
3 c

We can loop through key-value pairs of a dictionary using .items():

courses = {521: "awesome", 551: "riveting", 511: "naptime!"}

for course_num, description in courses.items():
    print(f"DSCI {course_num} is {description}")
DSCI 521 is awesome
DSCI 551 is riveting
DSCI 511 is naptime!
for course_num in courses.keys():
    print(f"key: {course_num}")
key: 521
key: 551
key: 511
for description in courses.values():
    print(f"value: {description}")
value: awesome
value: riveting
value: naptime!
for course_num in courses:
    print(course_num, courses[course_num])
521 awesome
551 riveting
511 naptime!

Above: the general syntax is for key, value in dictionary.items():

while loops#

  • We can also use a while loop to excute a block of code several times.

  • In reality, I rarely use these.

  • Beware! If the conditional expression is always True, then you’ve got an infintite loop!

    • (Use the “Stop” button in the toolbar above, or Ctrl-C in the terminal, to kill the program if you get an infinite loop.)

n = 10
while n > 0:
    print(n)
    n = n - 1

print("Blast off!")
10
9
8
7
6
5
4
3
2
1
Blast off!

Comprehensions (5 min)#

Comprehensions allow us to build lists/tuples/sets/dictionaries in one convenient, compact line of code.

words = ["hello", "goodbye", "the", "antidisestablishmentarianism"]
y = list()
for word in words:
    y.append(word[-1])
y
['o', 'e', 'e', 'm']
y = [word[-1] for word in words]  # NEW: list comprehension
y
['o', 'e', 'e', 'm']

Fun with List comprehensions#

[word[-1] for word in words if word[-1] != "e"]
['o', 'm']
[word[-1] for word in words if ("e" not in word[-1]) and ("o" not in word[-1])]
['m']
word_lengths = {word: len(word) for word in words}  # dictionary comprehension
word_lengths
{'hello': 5, 'goodbye': 7, 'the': 3, 'antidisestablishmentarianism': 28}
word_lengths = {}

for word in words:
    word_lengths[word] = len(words)

word_lengths
{'hello': 4, 'goodbye': 4, 'the': 4, 'antidisestablishmentarianism': 4}
y = {word[-1] for word in words}  # set comprehension
print(y)
{'o', 'm', 'e'}
# this is NOT a tuple comprehension - more on generators in another course

y = (word[-1] for word in words)
y
<generator object <genexpr> at 0x7f3aabecdf50>

Functions intro (5 min)#

  • Define a function to re-use a block of code with different input parameters, also known as arguments.

  • For example, define a function called square which takes one input parameter n and returns the square n**2.

def square(n):
    n_squared = n**2
    return n_squared
square(2)
4
square(100)
10000
res = square(12345)

res
152399025
  • Begins with def keyword, function name, input parameters and then colon (:)

  • Function block defined by indentation

  • Output or “return” value of the function is given by the return keyword

Side effects#

  • If a function changes the variables passed into it, then it is said to have side effects

  • Example:

def silly_sum(sri):
    sri.append(0)
    return sum(sri)
silly_sum([1, 2, 3, 4])
10

Looks good, like it sums the numbers? But wait…

lst = [1, 2, 3, 4]
silly_sum(lst)
10
silly_sum(lst)
10
lst
[1, 2, 3, 4, 0, 0]
  • If you function has side effects like this, you must mention it in the documentation (later today).

  • In general avoid!

Null return type#

If you do not specify a return value, the function returns None when it terminates:

def f(x):
    x + 1  # no return!
    if x == 999:
        return


print(f(0))
None

Attribution#

This notebook contains an excerpt from the Python Data Science Handbook by Jake VanderPlas; the content is available on GitHub.

The text is released under the CC-BY-NC-ND license, and code is released under the MIT license. If you find this content useful, please consider supporting the work by buying the book!