Lecture 1 - Welcome to Platforms for Data Science¶

from IPython.display import IFrame
from IPython.display import Markdown
# Additional styling ; should be moved into helpers
from IPython.core.display import display, HTML
HTML('<style>{}</style>'.format(open('styler.css').read()))

Lecture Outline¶

  1. Introduction (40 mins)

  1. Break (10 mins)

  1. Course Tools survey (2 mins)

  1. Syllabus walk-through (10 minutes)

  1. Demo of DATA 530 Lab and GitHub (20 mins)

Introduction¶

The Essence of the course¶

The overall goal of this course is for you to:

Install, configure, and use a variety of data analysis tools and software packages

This course covers how to configure data analysis environments, select appropriate tools for particular tasks, read documentation and get help, and use a variety of software packages for data analysis.

- Other courses will build on this course by going deeper into the application of these systems and techniques.

Course Objectives¶

  1. Install and setup a variety of software tools and programs used by data analysts

  1. Perform basic and advanced data analysis and visualization in Excel

  1. Able to setup IDEs and write small programs in Python and R

  1. Understand the pros and cons of each tool/software package and criteria to select the best tool for the job

Course Goals (for me)¶

  1. Provide the information in a simple, concise, and effective way for learning.

  1. Strive for all students to understand the material and excel at the course.

  1. Be available for questions during class time, office hours, and at other times as needed.

  1. Provide an introduction to a variety of data analysis software and systems.

  1. Emphasize the use of Excel as a easy-to-use, general tool for data analysis.

Academic Integrity¶

Cheating is strictly prohibited and is taken very seriously by UBC. A guideline to what constitutes cheating:

  • Labs

    • Submitting code produced by others.

    • Working in groups to solve questions and/or comparing answers to questions once they have been solved (except for group assignments).

    • Discussing HOW to solve a particular question instead of WHAT the question involves.

  • Exams (Quizzes)

    • DATA 530 and 531 Quizzes are open book

    • No communication about course content is permitted (with classmates, or others)

How to Excel in this course¶

  • Attend class

    • If required, Read notes before class as preparation and try the questions on your own

    • Participate in class activities and questions (clickers)

  • Attend and complete all labs

    • Labs practice the fundamental employable skills as well as being for marks.

  • Practice on your own. Practice makes perfect.

    • Do more questions than in the labs.

    • Read the additional reference material and perform practice questions.

Systems and tools¶

  1. Course Material:

    • Canvas, this site, and Github (all are identical)

  1. Marks:

    • Canvas

  1. Feedback

    • Canvas and (sometimes) Github

  1. Hardware

    • Your laptop (mostly)

    • Cloud computing (when needed): https://ubc.syzygy.ca

Lab Assignments¶

  • Weekly lab assignments are worth 40% of your overall grade.

  • Lab assignments will likely take more than the 1.5 hours of lab time.

  • You have until after the following lab to complete each lab.

    • Late labs will not be accepted (except for grace period)

  • Lab assignments are done individually but can be worked on collaboratively.

  • The lab assignments are critical to learning the material and are designed both to prepare you for the exams and build up your skills!

Labs during COVID-19¶

  • Lab sections are 3 hours long, split into two to ensure physical distancing

    • Section 1: 12:30 - 14:00

    • Section 2: 14:00 - 15:30

  • If there are 16 people that would like to attend the physical lab, we should split evenly so half are in each section

  • Need to coordinate and figure out who would like to attend labs

    • I don’t have a good system at the moment, maybe we can brainstorm together?

Breakout rooms activity [10 mins]¶

We will split into breakout rooms for this activity, in groups of about 4

Your Task: In your small groups, brainstorm ways to organize “who goes where, when”.¶

  • Keep in mind that many of us will be attending remotely, and there are timezones to consider!

  • Write your ideas on Slack in the DATA 530 channel - use threads to comment on others’ ideas!

  • Consider, “The Zen of Python”:

import this 
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

In-class “Clicker questions” AKA Participation¶

To promote understanding, 10% of your overall grade is allocated to answering in-class questions (called Clickers).

  • These questions are answered using Sli.do and you will need to create an account.

  • You should download the apps and bookmark the site for convenience

  • At different times during all the lectures, questions reviewing material will be asked. Reponses are given using Sli.do polls.

  • We will do several of these today to get you familiar with the system

Activity: A typical Clicker Sequence¶

  • Part 1: Instructor asks a question and students answer

  • Part 2: Instructor shows the results (Optional) and opens breakout rooms for students to discuss their choices with others

  • Part 3: Instructor asks the same question again and we see if there is convergence

Clicker Question Activity [5 mins]¶

IFrame('https://app.sli.do/event/0oandysw/embed/polls/bb021733-e27c-454b-9be7-1cdc591346ef', width=400, height=500)

Results and breakout rooms¶

(Part 2: Instructor shows the results (Optional) and opens breakout rooms for students to discuss their choices with others

Re-vote and Debrief¶

(Part 3: Instructor asks the same question again and we see if there is convergence)

IFrame('https://app.sli.do/event/0oandysw/embed/polls/bcf12c99-53c2-467c-9635-b6564db67e11', width=450, height=550)

Break (10 mins)¶

Course Tools Survey (2 mins)¶

IFrame('https://ubc.ca1.qualtrics.com/jfe/form/SV_eboA3fyVCFkhixn',width=500,height=1300)

Syllabus walk-through (10 minutes)¶

Content covered¶

  • How learning works

  • Piazza for Q&A, Slack for conversations

  • Canvas structure

  • Course structure

  • Course policies

  • Due dates and grace period (for DATA 530 and 531 only)

  • Quiz policies

  • Labs

Demo of DATA 530 Lab and GitHub (20 mins)¶

Content covered¶

  • Navigating GitHub

  • Uploading files on GitHub

    • Uploading to a specific folder

  • Accepting a lab assignment

  • Cloning a repository (repo)

    • git clone <https://...>

  • Making changes to a local version of a repo

    • code README.md and then make changes

  • Commit your changes to the repository

    • git add -A

    • git commit -m "This is a test commit"

  • Pushing the changes above

    • git push

RISE Template¶

from traitlets.config.manager import BaseJSONConfigManager
from pathlib import Path
path = Path.home() / ".jupyter" / "nbconfig"
cm = BaseJSONConfigManager(config_dir=str(path))
tmp = cm.update(
        "rise",
        {
            "theme": "sky", # https://revealjs.com/themes/
            "transition": "fade",
            "start_slideshow_at": "selected",
            "autolaunch": False,
            "width": "100%",
            "height": "100%",
            "header": "",
            "footer":"",
            "scroll": True,
            "enable_chalkboard": True,
            "slideNumber": True,
            "center": False,
            "controlsLayout": "edges",
            "slideNumber": True,
            "hash": True,
        }
    )