Task 1: Practice with Pandas#
import pandas as pd
Task 1.1: Join/Merge three datasets together#
Your task is to take three separate datasets (grades1a.csv
, grades1b.csv
, grades1c.csv
-> find them in the data folder) and merge them together.
You may find the documentation helpful as it contains some visualizations to help you understand the different merges and joins that are possible.
Restriction: There are no restrictions for this task.
The final dataset should look like this:
Sample Output#
Name |
Student ID |
Chemistry |
Physics |
Math |
|
---|---|---|---|---|---|
0 |
Lila Oni |
12001 |
59 |
90 |
45 |
1 |
Amina Chimwala |
12002 |
54 |
42 |
85 |
2 |
Neda Makena |
12003 |
42 |
88 |
54 |
3 |
Shanthi Catrina |
12004 |
66 |
48 |
64 |
4 |
Deirbhile Bhavna |
12005 |
60 |
80 |
72 |
5 |
Ige Aifric |
12006 |
78 |
73 |
64 |
6 |
Firouzeh Rudo |
12007 |
64 |
43 |
67 |
7 |
Desta Jahanara |
12008 |
82 |
69 |
70 |
8 |
Taiwo Sona |
12009 |
63 |
54 |
80 |
9 |
Fíona Finnguala |
12010 |
41 |
52 |
70 |
#Your Answer Here
Task 1.2: Combine two datasets with different keys#
Your task is to merge two datasets together on the “Student ID” column in grades2a.csv
and the “SID” column in the file grades2b.csv
(csv
files are located in the data folder).
You can do this any way you like that matches the Sample Output (there are multiple ways of doing it) but please explain how you did it (and why).
Restriction: There are no restrictions for this task.
Sample Output#
Name |
Student ID |
Chemistry |
Physics |
Math |
|
---|---|---|---|---|---|
0 |
Lila Oni |
12001 |
59 |
90 |
45 |
1 |
Amina Chimwala |
12002 |
54 |
42 |
85 |
2 |
Neda Makena |
12003 |
42 |
88 |
54 |
3 |
Shanthi Catrina |
12004 |
66 |
48 |
64 |
4 |
Deirbhile Bhavna |
12005 |
60 |
80 |
72 |
5 |
Ige Aifric |
12006 |
78 |
73 |
64 |
6 |
Firouzeh Rudo |
12007 |
64 |
43 |
67 |
7 |
Desta Jahanara |
12008 |
82 |
69 |
70 |
8 |
Taiwo Sona |
12009 |
63 |
54 |
80 |
9 |
Fíona Finnguala |
12010 |
41 |
52 |
70 |
#Your Answer Here
Task 1.3: Merge dataframes and keep only overlapping rows#
Often it is useful to try a bunch of different merge operations on a dataset to identify what parts of the data overlap, and what parts are disparate. With the overlapping rows, you can then look at the value from the left, and the value from the right to compare them to see if there is any ambiguity if you chose to merge the dataframes.
In this question, you will use the same data as in Task 2 (grades2a.csv
and grades2b.csv
) and try to identify which rows are duplicates, or appear in both datasets.
Restriction: For this Task, you CANNOT rename columns.
Sample Output#
Name_x |
Student ID |
Chemistry_x |
Physics_x |
Math_x |
Name_y |
SID |
Chemistry_y |
Physics_y |
Math_y |
|
---|---|---|---|---|---|---|---|---|---|---|
0 |
Shanthi Catrina |
12004 |
66 |
48 |
64 |
Shanthi Catrina |
12004 |
66 |
48 |
64 |
1 |
Deirbhile Bhavna |
12005 |
60 |
80 |
72 |
Deirbhile Bhavna |
12005 |
60 |
80 |
72 |
2 |
Ige Aifric |
12006 |
78 |
73 |
64 |
Ige Aifric |
12006 |
78 |
73 |
64 |
#Your Answer Here