Lab 4A

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Task 1: Join/Merge three datasets together

Your task is to take three separate datasets (grades1a.csv, grades1b.csv, grades1c.csv) and merge them together. You may find the documentation helpful as it contains some visualizations to help you understand the different merges and joins that are possible.

Note: In your Explainer Video, remember to explain your code in detail. Explain not just what you did, but also why you did it that way.

The final dataset should look like this:

Sample Output

final_df

Name

Student ID

Chemistry

Physics

Math

0

Lila Oni

12001

59

90

45

1

Amina Chimwala

12002

54

42

85

2

Neda Makena

12003

42

88

54

3

Shanthi Catrina

12004

66

48

64

4

Deirbhile Bhavna

12005

60

80

72

5

Ige Aifric

12006

78

73

64

6

Firouzeh Rudo

12007

64

43

67

7

Desta Jahanara

12008

82

69

70

8

Taiwo Sona

12009

63

54

80

9

Fíona Finnguala

12010

41

52

70

# Your Answer Here

Task 2: Combine two datasets with different keys

Your task is to merge two datasets together on the “Student ID” column in grades2a.csv and the “SID” column in the file grades2b.csv.

You can do this any way you like that matches the Sample Output (there are multiple ways of doing it) but please explain how you did it (and why).

Sample Output

Name

Student ID

Chemistry

Physics

Math

0

Lila Oni

12001

59

90

45

1

Amina Chimwala

12002

54

42

85

2

Neda Makena

12003

42

88

54

3

Shanthi Catrina

12004

66

48

64

4

Deirbhile Bhavna

12005

60

80

72

5

Ige Aifric

12006

78

73

64

6

Firouzeh Rudo

12007

64

43

67

7

Desta Jahanara

12008

82

69

70

8

Taiwo Sona

12009

63

54

80

9

Fíona Finnguala

12010

41

52

70

# Your Solution Here

Task 3: Merge dataframes and keep only overlapping rows

Often it is useful to try a bunch of different merge operations on a dataset to identify what parts of the data overlap, and what parts are disparate. With the overlapping rows, you can then look at the value from the left, and the value from the right to compare them to see if there is any ambiguity if you chose to merge the dataframes.

In this question, you will use the same data as in Task 2 (grades2a.csv and grades2b.csv) and try to identify which rows are duplicates, or appear in both datasets.

Restriction: For this Task, you CANNOT rename columns.

Sample Output

Name_x

Student ID

Chemistry_x

Physics_x

Math_x

Name_y

SID

Chemistry_y

Physics_y

Math_y

0

Shanthi Catrina

12004

66

48

64

Shanthi Catrina

12004

66

48

64

1

Deirbhile Bhavna

12005

60

80

72

Deirbhile Bhavna

12005

60

80

72

2

Ige Aifric

12006

78

73

64

Ige Aifric

12006

78

73

64

# Your Solution here

Task 4: Explain the different merge types

Task 4.1 - Table

The first part of this question is to fill out a table in words. For each merge type, write a sentence or two to explain each merge type.

Task 4.2 - Visualization

The second part of this question is to find (with citation) or create a diagram to explain the merge types visually.

Task 4.1 - Table

Merge Type

Explanation

Left join

# Your sentence here

Right join

# Your sentence here

Outer join

# Your sentence here

Inner join

# Your sentence here

Cross join

# Your sentence here

Task 4.2 - Visualization

Find (or create/sketch) a visualization/graphic/picture that makes sense to you on the different types of join, and explain it (in words) so you can crystallize your understanding of the merge types and how they work practically.

Note: Please clearly cite and reference any graphic with a link and some text that you choose to use from external sources.

# Your solution here

That’s it, nice work!