Task 1: Practice with Pandas#

import pandas as pd

Task 1.1: Join/Merge three datasets together#

Your task is to take three separate datasets (grades1a.csv, grades1b.csv, grades1c.csv -> find them in the data folder) and merge them together. You may find the documentation helpful as it contains some visualizations to help you understand the different merges and joins that are possible.

Restriction: There are no restrictions for this task.

The final dataset should look like this:

Sample Output#

Name

Student ID

Chemistry

Physics

Math

0

Lila Oni

12001

59

90

45

1

Amina Chimwala

12002

54

42

85

2

Neda Makena

12003

42

88

54

3

Shanthi Catrina

12004

66

48

64

4

Deirbhile Bhavna

12005

60

80

72

5

Ige Aifric

12006

78

73

64

6

Firouzeh Rudo

12007

64

43

67

7

Desta Jahanara

12008

82

69

70

8

Taiwo Sona

12009

63

54

80

9

Fíona Finnguala

12010

41

52

70

#Your Answer Here

Task 1.2: Combine two datasets with different keys#

Your task is to merge two datasets together on the “Student ID” column in grades2a.csv and the “SID” column in the file grades2b.csv (csv files are located in the data folder).

You can do this any way you like that matches the Sample Output (there are multiple ways of doing it) but please explain how you did it (and why).

Restriction: There are no restrictions for this task.

Sample Output#

Name

Student ID

Chemistry

Physics

Math

0

Lila Oni

12001

59

90

45

1

Amina Chimwala

12002

54

42

85

2

Neda Makena

12003

42

88

54

3

Shanthi Catrina

12004

66

48

64

4

Deirbhile Bhavna

12005

60

80

72

5

Ige Aifric

12006

78

73

64

6

Firouzeh Rudo

12007

64

43

67

7

Desta Jahanara

12008

82

69

70

8

Taiwo Sona

12009

63

54

80

9

Fíona Finnguala

12010

41

52

70

#Your Answer Here

Task 1.3: Merge dataframes and keep only overlapping rows#

Often it is useful to try a bunch of different merge operations on a dataset to identify what parts of the data overlap, and what parts are disparate. With the overlapping rows, you can then look at the value from the left, and the value from the right to compare them to see if there is any ambiguity if you chose to merge the dataframes.

In this question, you will use the same data as in Task 2 (grades2a.csv and grades2b.csv) and try to identify which rows are duplicates, or appear in both datasets.

Restriction: For this Task, you CANNOT rename columns.

Sample Output#

Name_x

Student ID

Chemistry_x

Physics_x

Math_x

Name_y

SID

Chemistry_y

Physics_y

Math_y

0

Shanthi Catrina

12004

66

48

64

Shanthi Catrina

12004

66

48

64

1

Deirbhile Bhavna

12005

60

80

72

Deirbhile Bhavna

12005

60

80

72

2

Ige Aifric

12006

78

73

64

Ige Aifric

12006

78

73

64

#Your Answer Here