Lecture 1 - Welcome to Platforms for Data Science¶
from IPython.display import IFrame
from IPython.display import Markdown
# Additional styling ; should be moved into helpers
from IPython.core.display import display, HTML
HTML('<style>{}</style>'.format(open('styler.css').read()))
Lecture Outline¶
Introduction (40 mins)
Break (10 mins)
Course Tools survey (2 mins)
Syllabus walk-through (10 minutes)
Demo of DATA 530 Lab and GitHub (20 mins)
Introduction¶
The Essence of the course¶
The overall goal of this course is for you to:
Install, configure, and use a variety of data analysis tools and software packages
This course covers how to configure data analysis environments, select appropriate tools for particular tasks, read documentation and get help, and use a variety of software packages for data analysis.
- Other courses will build on this course by going deeper into the application of these systems and techniques.
Course Objectives¶
Install and setup a variety of software tools and programs used by data analysts
Perform basic and advanced data analysis and visualization in Excel
Able to setup IDEs and write small programs in Python and R
Understand the pros and cons of each tool/software package and criteria to select the best tool for the job
Course Goals (for me)¶
Provide the information in a simple, concise, and effective way for learning.
Strive for all students to understand the material and excel at the course.
Be available for questions during class time, office hours, and at other times as needed.
Provide an introduction to a variety of data analysis software and systems.
Emphasize the use of Excel as a easy-to-use, general tool for data analysis.
Academic Integrity¶
Cheating is strictly prohibited and is taken very seriously by UBC. A guideline to what constitutes cheating:
Labs
Submitting code produced by others.
Working in groups to solve questions and/or comparing answers to questions once they have been solved (except for group assignments).
Discussing HOW to solve a particular question instead of WHAT the question involves.
Exams (Quizzes)
DATA 530 and 531 Quizzes are open book
No communication about course content is permitted (with classmates, or others)
How to Excel in this course¶
Attend class
If required, Read notes before class as preparation and try the questions on your own
Participate in class activities and questions (clickers)
Attend and complete all labs
Labs practice the fundamental employable skills as well as being for marks.
Practice on your own. Practice makes perfect.
Do more questions than in the labs.
Read the additional reference material and perform practice questions.
Systems and tools¶
Course Material:
Canvas, this site, and Github (all are identical)
Marks:
Canvas
Feedback
Canvas and (sometimes) Github
Hardware
Your laptop (mostly)
Cloud computing (when needed): https://ubc.syzygy.ca
Lab Assignments¶
Weekly lab assignments are worth 40% of your overall grade.
Lab assignments will likely take more than the 1.5 hours of lab time.
You have until after the following lab to complete each lab.
Late labs will not be accepted (except for grace period)
Lab assignments are done individually but can be worked on collaboratively.
The lab assignments are critical to learning the material and are designed both to prepare you for the exams and build up your skills!
Labs during COVID-19¶
Lab sections are 3 hours long, split into two to ensure physical distancing
Section 1: 12:30 - 14:00
Section 2: 14:00 - 15:30
If there are 16 people that would like to attend the physical lab, we should split evenly so half are in each section
Need to coordinate and figure out who would like to attend labs
I don’t have a good system at the moment, maybe we can brainstorm together?
Breakout rooms activity [10 mins]¶
We will split into breakout rooms for this activity, in groups of about 4
Your Task: In your small groups, brainstorm ways to organize “who goes where, when”.¶
Keep in mind that many of us will be attending remotely, and there are timezones to consider!
Write your ideas on Slack in the DATA 530 channel - use threads to comment on others’ ideas!
Consider, “The Zen of Python”:
import this
The Zen of Python, by Tim Peters
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
In-class “Clicker questions” AKA Participation¶
To promote understanding, 10% of your overall grade is allocated to answering in-class questions (called Clickers).
These questions are answered using Sli.do and you will need to create an account.
You should download the apps and bookmark the site for convenience
At different times during all the lectures, questions reviewing material will be asked. Reponses are given using Sli.do polls.
We will do several of these today to get you familiar with the system
Activity: A typical Clicker Sequence¶
Part 1: Instructor asks a question and students answer
Part 2: Instructor shows the results (Optional) and opens breakout rooms for students to discuss their choices with others
Part 3: Instructor asks the same question again and we see if there is convergence
Clicker Question Activity [5 mins]¶
IFrame('https://app.sli.do/event/0oandysw/embed/polls/bb021733-e27c-454b-9be7-1cdc591346ef', width=400, height=500)
Results and breakout rooms¶
(Part 2: Instructor shows the results (Optional) and opens breakout rooms for students to discuss their choices with others
Re-vote and Debrief¶
(Part 3: Instructor asks the same question again and we see if there is convergence)
IFrame('https://app.sli.do/event/0oandysw/embed/polls/bcf12c99-53c2-467c-9635-b6564db67e11', width=450, height=550)
Break (10 mins)¶
Course Tools Survey (2 mins)¶
IFrame('https://ubc.ca1.qualtrics.com/jfe/form/SV_eboA3fyVCFkhixn',width=500,height=1300)
Syllabus walk-through (10 minutes)¶
Content covered¶
How learning works
Piazza for Q&A, Slack for conversations
Canvas structure
Course structure
Course policies
Due dates and grace period (for DATA 530 and 531 only)
Quiz policies
Labs
Demo of DATA 530 Lab and GitHub (20 mins)¶
Content covered¶
Navigating GitHub
Uploading files on GitHub
Uploading to a specific folder
Accepting a lab assignment
Cloning a repository (repo)
git clone <https://...>
Making changes to a local version of a repo
code README.md
and then make changes
Commit your changes to the repository
git add -A
git commit -m "This is a test commit"
Pushing the changes above
git push
RISE Template¶
from traitlets.config.manager import BaseJSONConfigManager
from pathlib import Path
path = Path.home() / ".jupyter" / "nbconfig"
cm = BaseJSONConfigManager(config_dir=str(path))
tmp = cm.update(
"rise",
{
"theme": "sky", # https://revealjs.com/themes/
"transition": "fade",
"start_slideshow_at": "selected",
"autolaunch": False,
"width": "100%",
"height": "100%",
"header": "",
"footer":"",
"scroll": True,
"enable_chalkboard": True,
"slideNumber": True,
"center": False,
"controlsLayout": "edges",
"slideNumber": True,
"hash": True,
}
)