Lab 4A¶
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
Task 1: Join/Merge three datasets together¶
Your task is to take three separate datasets (grades1a.csv
, grades1b.csv
, grades1c.csv
) and merge them together.
You may find the documentation helpful as it contains some visualizations to help you understand the different merges and joins that are possible.
Note: In your Explainer Video, remember to explain your code in detail. Explain not just what you did, but also why you did it that way.
The final dataset should look like this:
Sample Output¶
final_df
Name |
Student ID |
Chemistry |
Physics |
Math |
|
---|---|---|---|---|---|
0 |
Lila Oni |
12001 |
59 |
90 |
45 |
1 |
Amina Chimwala |
12002 |
54 |
42 |
85 |
2 |
Neda Makena |
12003 |
42 |
88 |
54 |
3 |
Shanthi Catrina |
12004 |
66 |
48 |
64 |
4 |
Deirbhile Bhavna |
12005 |
60 |
80 |
72 |
5 |
Ige Aifric |
12006 |
78 |
73 |
64 |
6 |
Firouzeh Rudo |
12007 |
64 |
43 |
67 |
7 |
Desta Jahanara |
12008 |
82 |
69 |
70 |
8 |
Taiwo Sona |
12009 |
63 |
54 |
80 |
9 |
Fíona Finnguala |
12010 |
41 |
52 |
70 |
# Your Answer Here
Task 2: Combine two datasets with different keys¶
Your task is to merge two datasets together on the “Student ID” column in grades2a.csv
and the “SID” column in the file grades2b.csv
.
You can do this any way you like that matches the Sample Output (there are multiple ways of doing it) but please explain how you did it (and why).
Sample Output¶
Name |
Student ID |
Chemistry |
Physics |
Math |
|
---|---|---|---|---|---|
0 |
Lila Oni |
12001 |
59 |
90 |
45 |
1 |
Amina Chimwala |
12002 |
54 |
42 |
85 |
2 |
Neda Makena |
12003 |
42 |
88 |
54 |
3 |
Shanthi Catrina |
12004 |
66 |
48 |
64 |
4 |
Deirbhile Bhavna |
12005 |
60 |
80 |
72 |
5 |
Ige Aifric |
12006 |
78 |
73 |
64 |
6 |
Firouzeh Rudo |
12007 |
64 |
43 |
67 |
7 |
Desta Jahanara |
12008 |
82 |
69 |
70 |
8 |
Taiwo Sona |
12009 |
63 |
54 |
80 |
9 |
Fíona Finnguala |
12010 |
41 |
52 |
70 |
# Your Solution Here
Task 3: Merge dataframes and keep only overlapping rows¶
Often it is useful to try a bunch of different merge operations on a dataset to identify what parts of the data overlap, and what parts are disparate. With the overlapping rows, you can then look at the value from the left, and the value from the right to compare them to see if there is any ambiguity if you chose to merge the dataframes.
In this question, you will use the same data as in Task 2 (grades2a.csv
and grades2b.csv
) and try to identify which rows are duplicates, or appear in both datasets.
Restriction: For this Task, you CANNOT rename columns.
Sample Output¶
Name_x |
Student ID |
Chemistry_x |
Physics_x |
Math_x |
Name_y |
SID |
Chemistry_y |
Physics_y |
Math_y |
|
---|---|---|---|---|---|---|---|---|---|---|
0 |
Shanthi Catrina |
12004 |
66 |
48 |
64 |
Shanthi Catrina |
12004 |
66 |
48 |
64 |
1 |
Deirbhile Bhavna |
12005 |
60 |
80 |
72 |
Deirbhile Bhavna |
12005 |
60 |
80 |
72 |
2 |
Ige Aifric |
12006 |
78 |
73 |
64 |
Ige Aifric |
12006 |
78 |
73 |
64 |
# Your Solution here
Task 4: Explain the different merge types¶
Task 4.1 - Table¶
The first part of this question is to fill out a table in words. For each merge type, write a sentence or two to explain each merge type.
Task 4.2 - Visualization¶
The second part of this question is to find (with citation) or create a diagram to explain the merge types visually.
Task 4.1 - Table¶
Merge Type |
Explanation |
---|---|
Left join |
# Your sentence here |
Right join |
# Your sentence here |
Outer join |
# Your sentence here |
Inner join |
# Your sentence here |
Cross join |
# Your sentence here |
Task 4.2 - Visualization¶
Find (or create/sketch) a visualization/graphic/picture that makes sense to you on the different types of join, and explain it (in words) so you can crystallize your understanding of the merge types and how they work practically.
Note: Please clearly cite and reference any graphic with a link and some text that you choose to use from external sources.
# Your solution here
That’s it, nice work!