DATA 301 Lab 8: SQL DDLΒΆ

ObjectivesΒΆ

  1. Connect to a Sqlite database using Python

  2. Write SQL DDL commands for CREATE TABLE, INSERT, UPDATE, and DELETE.

  3. Query and visualize data in a database

Getting StartedΒΆ

  1. Make sure that you have python packages pandas and matplotlib installed.

  2. Download the db file from here

!pip install ipython-sql pandas matplotlib
Requirement already satisfied: pandas in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (1.1.5)
Requirement already satisfied: matplotlib in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (3.3.3)
Collecting ipython-sql
  Downloading ipython_sql-0.4.0-py3-none-any.whl (19 kB)
Requirement already satisfied: ipython-genutils>=0.1.0 in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from ipython-sql) (0.2.0)
Requirement already satisfied: sqlalchemy>=0.6.7 in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from ipython-sql) (1.3.22)
Requirement already satisfied: ipython>=1.0 in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from ipython-sql) (7.19.0)
Requirement already satisfied: six in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from ipython-sql) (1.15.0)
Requirement already satisfied: cycler>=0.10 in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from matplotlib) (0.10.0)
Requirement already satisfied: numpy>=1.15 in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from matplotlib) (1.19.4)
Requirement already satisfied: kiwisolver>=1.0.1 in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from matplotlib) (1.3.1)
Requirement already satisfied: python-dateutil>=2.1 in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from matplotlib) (2.8.1)
Requirement already satisfied: pillow>=6.2.0 in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from matplotlib) (8.0.1)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from matplotlib) (2.4.7)
Requirement already satisfied: pytz>=2017.2 in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from pandas) (2020.4)
Requirement already satisfied: python-dateutil>=2.1 in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from matplotlib) (2.8.1)
Requirement already satisfied: numpy>=1.15 in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from matplotlib) (1.19.4)
Requirement already satisfied: six in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from ipython-sql) (1.15.0)
Requirement already satisfied: setuptools>=18.5 in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from ipython>=1.0->ipython-sql) (47.1.0)
Requirement already satisfied: traitlets>=4.2 in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from ipython>=1.0->ipython-sql) (5.0.5)
Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from ipython>=1.0->ipython-sql) (3.0.8)
Requirement already satisfied: pickleshare in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from ipython>=1.0->ipython-sql) (0.7.5)
Requirement already satisfied: pexpect>4.3 in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from ipython>=1.0->ipython-sql) (4.8.0)
Requirement already satisfied: pygments in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from ipython>=1.0->ipython-sql) (2.7.3)
Requirement already satisfied: jedi>=0.10 in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from ipython>=1.0->ipython-sql) (0.17.2)
Requirement already satisfied: decorator in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from ipython>=1.0->ipython-sql) (4.4.2)
Requirement already satisfied: backcall in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from ipython>=1.0->ipython-sql) (0.2.0)
Requirement already satisfied: parso<0.8.0,>=0.7.0 in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from jedi>=0.10->ipython>=1.0->ipython-sql) (0.7.1)
Requirement already satisfied: ptyprocess>=0.5 in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from pexpect>4.3->ipython>=1.0->ipython-sql) (0.6.0)
Collecting prettytable<1
  Downloading prettytable-0.7.2.zip (28 kB)
Requirement already satisfied: wcwidth in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->ipython>=1.0->ipython-sql) (0.2.5)
Requirement already satisfied: six in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from ipython-sql) (1.15.0)
Collecting sqlparse
  Downloading sqlparse-0.4.1-py3-none-any.whl (42 kB)
?25l
     |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                        | 10 kB 32.4 MB/s eta 0:00:01
     |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                | 20 kB 19.6 MB/s eta 0:00:01
     |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž        | 30 kB 24.6 MB/s eta 0:00:01
     |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 40 kB 29.1 MB/s eta 0:00:01
     |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 42 kB 1.7 MB/s 
?25hRequirement already satisfied: ipython-genutils>=0.1.0 in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from ipython-sql) (0.2.0)
Building wheels for collected packages: prettytable
  Building wheel for prettytable (setup.py) ... ?25l-
 done
?25h  Created wheel for prettytable: filename=prettytable-0.7.2-py3-none-any.whl size=13699 sha256=43a400c4972bb5ef6d498707f81d125f2966f441076186f5957977d6ba3ca929
  Stored in directory: /home/runner/.cache/pip/wheels/b2/7f/f6/f180315b584f00445045ff1699b550fa895d09471337ce21c6
Successfully built prettytable
Installing collected packages: sqlparse, prettytable, ipython-sql
Successfully installed ipython-sql-0.4.0 prettytable-0.7.2 sqlparse-0.4.1
WARNING: You are using pip version 20.3.1; however, version 20.3.3 is available.
You should consider upgrading via the '/opt/hostedtoolcache/Python/3.7.9/x64/bin/python -m pip install --upgrade pip' command.
%load_ext sql

Configuration (1 mark)ΒΆ

  1. Connect to a sqlite database data540 using the cell below.

%sql sqlite:///data540.db
%sql PRAGMA foreign_keys = ON;
 * sqlite:///data540.db
Done.
[]

You can list all of the tables in your database using the query below. Initially, the database should be empty.

%%sql

SELECT name FROM sqlite_master 
WHERE type IN ('table','view') 
AND name NOT LIKE 'sqlite_%'
ORDER BY 1;
 * sqlite:///data540.db
Done.
name
Artist
Person
PlaySong
RecordLabel
Song

Make sure to drop any existing tables to clean the schema before running the lab.

%%sql

DROP TABLE IF EXISTS PlaySong;
DROP TABLE IF EXISTS Song;
DROP TABLE IF EXISTS Person;
DROP TABLE IF EXISTS Artist;
DROP TABLE IF EXISTS RecordLabel;
 * sqlite:///data540.db
Done.
Done.
Done.
Done.
Done.
[]

Question 1 (10 marks)ΒΆ

Write the SQL DDL to create the following database about music consisting of 5 tables: Artist, RecordLabel, Song, Person, and PlaySong:

  1. A RecordLabel table where each record label has a labelName (string up to 20 characters), countryCode (exactly 3 characters), and a revenue capable of storing value up to $999,999,999.99. (2 marks)

%%sql

CREATE TABLE RecordLabel (
    labelName   VARCHAR(20),
    countryCode CHAR(3),
    revenue     DECIMAL(11,2),
    PRIMARY KEY (labelName)
);
 * sqlite:///data540.db
Done.
[]
  1. An Artist table where each musical artist is identified by a field called artistId that is an integer. Other attributes include artistName (string up to 30 characters), age (integer), and labelName (string up to 20 characters). Create a foreign key as appropriate for labelName. Make foreign key set to null on delete and cascade on update. (2 marks)

%%sql

CREATE TABLE Artist (
    artistId    int,
    artistName  VARCHAR(30),
    age         int,
    labelName   VARCHAR(20),
    PRIMARY KEY (artistId),
    FOREIGN KEY (labelName) REFERENCES RecordLabel(labelName) ON DELETE SET NULL ON UPDATE CASCADE
);
 * sqlite:///data540.db
Done.
[]
  1. A Person table with personId as key (integer), firstname (up to 15 characters), lastname (up to 20 characters), totalPlays (integer), and balance (capable of storing up to $99,999.99). (2 marks)

%%sql

CREATE TABLE Person (
    personId    int,
    firstName   VARCHAR(15),
    lastName    VARCHAR(20),
    totalPlays  int,
    balance     DECIMAL(7,2),    
    PRIMARY KEY (personId)
);
 * sqlite:///data540.db
Done.
[]
  1. A Song table with two fields as the primary key: artistId (integer) and songName (up to 20 characters). Other fields include: duration (integer), releaseDate (should store date info), and cost (capable of storing up to $99.99). Create an appropriate foreign key. Make foreign key cascade on delete and cascade on update. (2 marks)

%%sql

CREATE TABLE Song (
    artistId    int,
    songName    VARCHAR(20),
    duration    int,
    releaseDate DATE,
    cost        DECIMAL(4,2),    
    PRIMARY KEY (artistId, songName),
    FOREIGN KEY (artistId) REFERENCES Artist(artistId) ON DELETE CASCADE ON UPDATE CASCADE
);
 * sqlite:///data540.db
Done.
[]
  1. A PlaySong table that tracks whenever a person plays a song. Key fields: personId (integer), artistId (integer), songName (up to 20 characters), playDate (datetime). Other field: cost (capable of storing up to $99.99). Create appropriate foreign keys. Make foreign keys cascade on delete and cascade on update. (2 marks)

%%sql

CREATE TABLE PlaySong (
    personId     int,
    artistId     int,
    songName     VARCHAR(20),    
    playDate     DATETIME,
    cost         DECIMAL(4,2),    
    PRIMARY KEY (personId, artistId, songName, playDate),
    FOREIGN KEY (personId) REFERENCES Person(personId) ON DELETE CASCADE ON UPDATE CASCADE,
    FOREIGN KEY (artistId,songName) REFERENCES Song(artistId,songName) ON DELETE CASCADE ON UPDATE CASCADE
);
 * sqlite:///data540.db
Done.
[]

Question 2 (9 marks)ΒΆ

Write the SQL DDL to perform the following modifications to the database created in Question 1.

Insert the following records into the RecordLabel table (1 mark)ΒΆ

("Republic Records", "USA", '250456123.55')
("Island Records", "CDN", '100000000.00')
%%sql

INSERT INTO RecordLabel VALUES("Republic Records", "USA", '250456123.55');
INSERT INTO RecordLabel VALUES("Island Records", "CDN", '100000000.00');
 * sqlite:///data540.db
1 rows affected.
1 rows affected.
[]

Insert the following records into the Artist table (1 mark)ΒΆ

(1, "Ariana Grande", 25, "Republic Records")
(2, "Justin Bieber", 24, "Island Records")
(3, "Amy Grant", 48, "Island Records")
%%sql

INSERT INTO Artist VALUES(1, "Ariana Grande", 25, "Republic Records");
INSERT INTO Artist VALUES(2, "Justin Bieber", 24, "Island Records");
INSERT INTO Artist VALUES(3, "Amy Grant", 48, "Island Records");
 * sqlite:///data540.db
1 rows affected.
1 rows affected.
1 rows affected.
[]

Insert the following records into the Person table (1 mark)ΒΆ

(1, "Sarah", "Smith", 25, '254.00')
(2, "Fred", "Funk", 62, '22.22')
%%sql

INSERT INTO Person VALUES(1, "Sarah", "Smith", 25, '254.00');
INSERT INTO Person VALUES(2, "Fred", "Funk", 62, '22.22');
 * sqlite:///data540.db
1 rows affected.
1 rows affected.
[]

Insert the following records into the Song table (1 mark)ΒΆ

(1, 'No Tears Left to Cry', 206, '2018-04-20', '1.99')
(2, 'Despacito', 227, '2017-01-12', '0.99')
%%sql

INSERT INTO Song VALUES(1, 'No Tears Left to Cry', 206, '2018-04-20', '1.99');
INSERT INTO Song VALUES(2, 'Despacito', 227, '2017-01-12', '0.99');
 * sqlite:///data540.db
1 rows affected.
1 rows affected.
[]

Insert the following records into the PlaySong table (1 mark)ΒΆ

(1, 1, 'No Tears Left to Cry', '2018-10-01 10:00:00', '1.99')
(1, 1, 'No Tears Left to Cry', '2018-10-01 10:13:30', '0.99')
(2, 2, 'Despacito', '2018-10-05 23:15:00', '0.79')
%%sql

INSERT INTO PlaySong VALUES(1, 1, 'No Tears Left to Cry', '2018-10-01 10:00:00', '1.99');
INSERT INTO PlaySong VALUES(1, 1, 'No Tears Left to Cry', '2018-10-01 10:13:30', '0.99');
INSERT INTO PlaySong VALUES(2, 2, 'Despacito', '2018-10-05 23:15:00', '0.79');
 * sqlite:///data540.db
1 rows affected.
1 rows affected.
1 rows affected.
[]

UpdateΒΆ

Write the SQL DDL to perform the following modifications to the database created in Question 1.

  1. Add $50 to Fred Funk’s balance. (1 mark)

%%sql

UPDATE Person SET balance=balance+50 WHERE firstName = 'Fred' and lastName = 'Funk';
 * sqlite:///data540.db
1 rows affected.
[]
  1. Modify all play times for Sarah Smith (personId=1) to be 1 hour later. Note: To add an hour to a date use the syntax DATE_ADD(yourDateField, INTERVAL 1 HOUR). (1 mark)

%%sql

UPDATE PlaySong SET playDate = DATETIME(playDate, '+1 hour') WHERE personId = 1;
 * sqlite:///data540.db
2 rows affected.
[]

DeleteΒΆ

Write the SQL DDL to perform the following modifications to the database created in Question 1.

  1. Delete any artist over 30. (1 mark)

%%sql

UPDATE PlaySong SET playDate = DATETIME(playDate, '+1 hour') WHERE personId = (SELECT personId from Person WHERE firstName = 'Sarah' and lastName = 'Smith');
 * sqlite:///data540.db
2 rows affected.
[]
  1. Delete all PlaySong records for Justin Bieber. (1 mark)

Note: You do not have to use a subquery for this question but try if you can!

Question 3 (4 marks)ΒΆ

Create a successful connection similar to that above for the tpch database. The cell will look like: %sql mysql+mysqldb://<USERNAME>:<PASSWORD>@127.0.0.1/<DATABASE>. (1 mark)

%sql sqlite:///tpch.db
%sql PRAGMA foreign_keys = ON;
   sqlite:///data540.db
 * sqlite:///tpch.db
Done.
[]

Note: Since you now have two database connections configured, it’s best to explicitly select one in future queries with the username and database as shown below.

%%sql tpch@tpch

select * from orders limit 5;
(sqlite3.OperationalError) no such table: orders
[SQL: select * from orders limit 5;]
(Background on this error at: http://sqlalche.me/e/13/e3q8)

Note: you can assign query results in Jupyter with the syntax %%sql works << SELECT ...

%%sql tpch@tpch order <<
select * from orders;
(sqlite3.OperationalError) no such table: orders
[SQL: select * from orders;]
(Background on this error at: http://sqlalche.me/e/13/e3q8)

Note: you can convert results in a Pandas DataFrame by calling the .DataFrame() method on the results object.

df = order.DataFrame()
df[:5]
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-24-71c6560b4a8b> in <module>
----> 1 df = order.DataFrame()
      2 df[:5]

NameError: name 'order' is not defined

Write and execute an SQL query that retrieves all entries from the lineitem table. Store the results in the lineitem variable. (1 mark)

%%sql tpch@tpch lineitem <<
select * from lineitem;
Done.
Returning data to local variable lineitem

Create a dataframe using the .DataFrame() method on lineitem object. (1 mark)

df_lineitem = lineitem.DataFrame()

Create a scatter plot of quantity vs price from the dataframe object using the pandas Dataframe.plot.scatter function. On the x-axis put l_quantity and on the y-axis put l_extendedprice. (1 mark)

df_lineitem.plot.scatter(x = 'l_quantity', y = 'l_extendedprice')
<AxesSubplot:xlabel='l_quantity', ylabel='l_extendedprice'>
../../_images/lab8_solutions_57_1.png

Question 4 (6 bonus marks)ΒΆ

Write and execute an SQL query that retrieves for each nation name and stores it in the nation column, and the number of orders for the country and stores it in the order_count column. Limit data to the year 2018. Store it in the variable orders_by_country. To see the schema, login to the tpch at adminer at http://localhost:8080 using tpch as both the username and password. (3 marks)

%%sql tpch@tpch orders_by_country <<
SELECT n_name AS nation, count(o_orderdate) AS order_count
  FROM orders JOIN customer ON customer.c_custkey = orders.o_custkey
              JOIN nation ON nation.n_nationkey = customer.c_nationkey
 WHERE strftime('%Y', o_orderdate) = "2018"
 GROUP BY n_name;
Done.
Returning data to local variable orders_by_country

Convert the orders_by_country result into a pandas DataFrame orders_by_country_df. (1 mark)

orders_by_country_df = orders_by_country.DataFrame()

Create a bar graph in Pandas that visualizes the information. (2 marks)

orders_by_country_df.plot.bar(x='nation', y='order_count')
<AxesSubplot:xlabel='nation'>
../../_images/lab8_solutions_63_1.png