DATA 301 Lab 8: SQL DDLΒΆ
ObjectivesΒΆ
Connect to a Sqlite database using Python
Write SQL DDL commands for CREATE TABLE, INSERT, UPDATE, and DELETE.
Query and visualize data in a database
Getting StartedΒΆ
Make sure that you have python packages
pandas
andmatplotlib
installed.Download the db file from here
!pip install ipython-sql pandas matplotlib
Requirement already satisfied: pandas in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (1.1.5)
Requirement already satisfied: matplotlib in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (3.3.3)
Collecting ipython-sql
Downloading ipython_sql-0.4.0-py3-none-any.whl (19 kB)
Requirement already satisfied: ipython-genutils>=0.1.0 in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from ipython-sql) (0.2.0)
Requirement already satisfied: sqlalchemy>=0.6.7 in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from ipython-sql) (1.3.22)
Requirement already satisfied: ipython>=1.0 in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from ipython-sql) (7.19.0)
Requirement already satisfied: six in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from ipython-sql) (1.15.0)
Requirement already satisfied: cycler>=0.10 in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from matplotlib) (0.10.0)
Requirement already satisfied: numpy>=1.15 in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from matplotlib) (1.19.4)
Requirement already satisfied: kiwisolver>=1.0.1 in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from matplotlib) (1.3.1)
Requirement already satisfied: python-dateutil>=2.1 in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from matplotlib) (2.8.1)
Requirement already satisfied: pillow>=6.2.0 in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from matplotlib) (8.0.1)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from matplotlib) (2.4.7)
Requirement already satisfied: pytz>=2017.2 in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from pandas) (2020.4)
Requirement already satisfied: python-dateutil>=2.1 in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from matplotlib) (2.8.1)
Requirement already satisfied: numpy>=1.15 in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from matplotlib) (1.19.4)
Requirement already satisfied: six in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from ipython-sql) (1.15.0)
Requirement already satisfied: setuptools>=18.5 in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from ipython>=1.0->ipython-sql) (47.1.0)
Requirement already satisfied: traitlets>=4.2 in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from ipython>=1.0->ipython-sql) (5.0.5)
Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from ipython>=1.0->ipython-sql) (3.0.8)
Requirement already satisfied: pickleshare in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from ipython>=1.0->ipython-sql) (0.7.5)
Requirement already satisfied: pexpect>4.3 in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from ipython>=1.0->ipython-sql) (4.8.0)
Requirement already satisfied: pygments in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from ipython>=1.0->ipython-sql) (2.7.3)
Requirement already satisfied: jedi>=0.10 in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from ipython>=1.0->ipython-sql) (0.17.2)
Requirement already satisfied: decorator in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from ipython>=1.0->ipython-sql) (4.4.2)
Requirement already satisfied: backcall in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from ipython>=1.0->ipython-sql) (0.2.0)
Requirement already satisfied: parso<0.8.0,>=0.7.0 in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from jedi>=0.10->ipython>=1.0->ipython-sql) (0.7.1)
Requirement already satisfied: ptyprocess>=0.5 in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from pexpect>4.3->ipython>=1.0->ipython-sql) (0.6.0)
Collecting prettytable<1
Downloading prettytable-0.7.2.zip (28 kB)
Requirement already satisfied: wcwidth in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->ipython>=1.0->ipython-sql) (0.2.5)
Requirement already satisfied: six in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from ipython-sql) (1.15.0)
Collecting sqlparse
Downloading sqlparse-0.4.1-py3-none-any.whl (42 kB)
?25l
|ββββββββ | 10 kB 32.4 MB/s eta 0:00:01
|ββββββββββββββββ | 20 kB 19.6 MB/s eta 0:00:01
|ββββββββββββββββββββββββ | 30 kB 24.6 MB/s eta 0:00:01
|βββββββββββββββββββββββββββββββ | 40 kB 29.1 MB/s eta 0:00:01
|ββββββββββββββββββββββββββββββββ| 42 kB 1.7 MB/s
?25hRequirement already satisfied: ipython-genutils>=0.1.0 in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from ipython-sql) (0.2.0)
Building wheels for collected packages: prettytable
Building wheel for prettytable (setup.py) ... ?25l-
done
?25h Created wheel for prettytable: filename=prettytable-0.7.2-py3-none-any.whl size=13699 sha256=43a400c4972bb5ef6d498707f81d125f2966f441076186f5957977d6ba3ca929
Stored in directory: /home/runner/.cache/pip/wheels/b2/7f/f6/f180315b584f00445045ff1699b550fa895d09471337ce21c6
Successfully built prettytable
Installing collected packages: sqlparse, prettytable, ipython-sql
Successfully installed ipython-sql-0.4.0 prettytable-0.7.2 sqlparse-0.4.1
WARNING: You are using pip version 20.3.1; however, version 20.3.3 is available.
You should consider upgrading via the '/opt/hostedtoolcache/Python/3.7.9/x64/bin/python -m pip install --upgrade pip' command.
%load_ext sql
Configuration (1 mark)ΒΆ
Connect to a sqlite database
data540
using the cell below.
%sql sqlite:///data540.db
%sql PRAGMA foreign_keys = ON;
* sqlite:///data540.db
Done.
[]
You can list all of the tables in your database using the query below. Initially, the database should be empty.
%%sql
SELECT name FROM sqlite_master
WHERE type IN ('table','view')
AND name NOT LIKE 'sqlite_%'
ORDER BY 1;
* sqlite:///data540.db
Done.
name |
---|
Artist |
Person |
PlaySong |
RecordLabel |
Song |
Make sure to drop any existing tables to clean the schema before running the lab.
%%sql
DROP TABLE IF EXISTS PlaySong;
DROP TABLE IF EXISTS Song;
DROP TABLE IF EXISTS Person;
DROP TABLE IF EXISTS Artist;
DROP TABLE IF EXISTS RecordLabel;
* sqlite:///data540.db
Done.
Done.
Done.
Done.
Done.
[]
Question 1 (10 marks)ΒΆ
Write the SQL DDL to create the following database about music consisting of 5 tables: Artist
, RecordLabel
, Song
, Person
, and PlaySong
:
A RecordLabel table where each record label has a labelName (string up to 20 characters), countryCode (exactly 3 characters), and a revenue capable of storing value up to $999,999,999.99. (2 marks)
%%sql
CREATE TABLE RecordLabel (
labelName VARCHAR(20),
countryCode CHAR(3),
revenue DECIMAL(11,2),
PRIMARY KEY (labelName)
);
* sqlite:///data540.db
Done.
[]
An Artist table where each musical artist is identified by a field called artistId that is an integer. Other attributes include artistName (string up to 30 characters), age (integer), and labelName (string up to 20 characters). Create a foreign key as appropriate for labelName. Make foreign key set to null on delete and cascade on update. (2 marks)
%%sql
CREATE TABLE Artist (
artistId int,
artistName VARCHAR(30),
age int,
labelName VARCHAR(20),
PRIMARY KEY (artistId),
FOREIGN KEY (labelName) REFERENCES RecordLabel(labelName) ON DELETE SET NULL ON UPDATE CASCADE
);
* sqlite:///data540.db
Done.
[]
A Person table with personId as key (integer), firstname (up to 15 characters), lastname (up to 20 characters), totalPlays (integer), and balance (capable of storing up to $99,999.99). (2 marks)
%%sql
CREATE TABLE Person (
personId int,
firstName VARCHAR(15),
lastName VARCHAR(20),
totalPlays int,
balance DECIMAL(7,2),
PRIMARY KEY (personId)
);
* sqlite:///data540.db
Done.
[]
A Song table with two fields as the primary key: artistId (integer) and songName (up to 20 characters). Other fields include: duration (integer), releaseDate (should store date info), and cost (capable of storing up to $99.99). Create an appropriate foreign key. Make foreign key cascade on delete and cascade on update. (2 marks)
%%sql
CREATE TABLE Song (
artistId int,
songName VARCHAR(20),
duration int,
releaseDate DATE,
cost DECIMAL(4,2),
PRIMARY KEY (artistId, songName),
FOREIGN KEY (artistId) REFERENCES Artist(artistId) ON DELETE CASCADE ON UPDATE CASCADE
);
* sqlite:///data540.db
Done.
[]
A PlaySong table that tracks whenever a person plays a song. Key fields: personId (integer), artistId (integer), songName (up to 20 characters), playDate (datetime). Other field: cost (capable of storing up to $99.99). Create appropriate foreign keys. Make foreign keys cascade on delete and cascade on update. (2 marks)
%%sql
CREATE TABLE PlaySong (
personId int,
artistId int,
songName VARCHAR(20),
playDate DATETIME,
cost DECIMAL(4,2),
PRIMARY KEY (personId, artistId, songName, playDate),
FOREIGN KEY (personId) REFERENCES Person(personId) ON DELETE CASCADE ON UPDATE CASCADE,
FOREIGN KEY (artistId,songName) REFERENCES Song(artistId,songName) ON DELETE CASCADE ON UPDATE CASCADE
);
* sqlite:///data540.db
Done.
[]
Question 2 (9 marks)ΒΆ
Write the SQL DDL to perform the following modifications to the database created in Question 1.
Insert the following records into the RecordLabel table (1 mark)ΒΆ
("Republic Records", "USA", '250456123.55')
("Island Records", "CDN", '100000000.00')
%%sql
INSERT INTO RecordLabel VALUES("Republic Records", "USA", '250456123.55');
INSERT INTO RecordLabel VALUES("Island Records", "CDN", '100000000.00');
* sqlite:///data540.db
1 rows affected.
1 rows affected.
[]
Insert the following records into the Artist table (1 mark)ΒΆ
(1, "Ariana Grande", 25, "Republic Records")
(2, "Justin Bieber", 24, "Island Records")
(3, "Amy Grant", 48, "Island Records")
%%sql
INSERT INTO Artist VALUES(1, "Ariana Grande", 25, "Republic Records");
INSERT INTO Artist VALUES(2, "Justin Bieber", 24, "Island Records");
INSERT INTO Artist VALUES(3, "Amy Grant", 48, "Island Records");
* sqlite:///data540.db
1 rows affected.
1 rows affected.
1 rows affected.
[]
Insert the following records into the Person table (1 mark)ΒΆ
(1, "Sarah", "Smith", 25, '254.00')
(2, "Fred", "Funk", 62, '22.22')
%%sql
INSERT INTO Person VALUES(1, "Sarah", "Smith", 25, '254.00');
INSERT INTO Person VALUES(2, "Fred", "Funk", 62, '22.22');
* sqlite:///data540.db
1 rows affected.
1 rows affected.
[]
Insert the following records into the Song table (1 mark)ΒΆ
(1, 'No Tears Left to Cry', 206, '2018-04-20', '1.99')
(2, 'Despacito', 227, '2017-01-12', '0.99')
%%sql
INSERT INTO Song VALUES(1, 'No Tears Left to Cry', 206, '2018-04-20', '1.99');
INSERT INTO Song VALUES(2, 'Despacito', 227, '2017-01-12', '0.99');
* sqlite:///data540.db
1 rows affected.
1 rows affected.
[]
Insert the following records into the PlaySong table (1 mark)ΒΆ
(1, 1, 'No Tears Left to Cry', '2018-10-01 10:00:00', '1.99')
(1, 1, 'No Tears Left to Cry', '2018-10-01 10:13:30', '0.99')
(2, 2, 'Despacito', '2018-10-05 23:15:00', '0.79')
%%sql
INSERT INTO PlaySong VALUES(1, 1, 'No Tears Left to Cry', '2018-10-01 10:00:00', '1.99');
INSERT INTO PlaySong VALUES(1, 1, 'No Tears Left to Cry', '2018-10-01 10:13:30', '0.99');
INSERT INTO PlaySong VALUES(2, 2, 'Despacito', '2018-10-05 23:15:00', '0.79');
* sqlite:///data540.db
1 rows affected.
1 rows affected.
1 rows affected.
[]
UpdateΒΆ
Write the SQL DDL to perform the following modifications to the database created in Question 1.
Add $50 to Fred Funkβs balance. (1 mark)
%%sql
UPDATE Person SET balance=balance+50 WHERE firstName = 'Fred' and lastName = 'Funk';
* sqlite:///data540.db
1 rows affected.
[]
Modify all play times for Sarah Smith (personId=1) to be 1 hour later. Note: To add an hour to a date use the syntax DATE_ADD(yourDateField, INTERVAL 1 HOUR). (1 mark)
%%sql
UPDATE PlaySong SET playDate = DATETIME(playDate, '+1 hour') WHERE personId = 1;
* sqlite:///data540.db
2 rows affected.
[]
DeleteΒΆ
Write the SQL DDL to perform the following modifications to the database created in Question 1.
Delete any artist over 30. (1 mark)
%%sql
UPDATE PlaySong SET playDate = DATETIME(playDate, '+1 hour') WHERE personId = (SELECT personId from Person WHERE firstName = 'Sarah' and lastName = 'Smith');
* sqlite:///data540.db
2 rows affected.
[]
Delete all PlaySong records for Justin Bieber. (1 mark)
Note: You do not have to use a subquery for this question but try if you can!
Question 3 (4 marks)ΒΆ
Create a successful connection similar to that above for the tpch
database. The cell will look like: %sql mysql+mysqldb://<USERNAME>:<PASSWORD>@127.0.0.1/<DATABASE>
. (1 mark)
%sql sqlite:///tpch.db
%sql PRAGMA foreign_keys = ON;
sqlite:///data540.db
* sqlite:///tpch.db
Done.
[]
Note: Since you now have two database connections configured, itβs best to explicitly select one in future queries with the username and database as shown below.
%%sql tpch@tpch
select * from orders limit 5;
(sqlite3.OperationalError) no such table: orders
[SQL: select * from orders limit 5;]
(Background on this error at: http://sqlalche.me/e/13/e3q8)
Note: you can assign query results in Jupyter with the syntax %%sql works << SELECT ...
%%sql tpch@tpch order <<
select * from orders;
(sqlite3.OperationalError) no such table: orders
[SQL: select * from orders;]
(Background on this error at: http://sqlalche.me/e/13/e3q8)
Note: you can convert results in a Pandas DataFrame by calling the .DataFrame() method on the results object.
df = order.DataFrame()
df[:5]
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-24-71c6560b4a8b> in <module>
----> 1 df = order.DataFrame()
2 df[:5]
NameError: name 'order' is not defined
Write and execute an SQL query that retrieves all entries from the lineitem
table. Store the results in the lineitem
variable. (1 mark)
%%sql tpch@tpch lineitem <<
select * from lineitem;
Done.
Returning data to local variable lineitem
Create a dataframe using the .DataFrame()
method on lineitem
object. (1 mark)
df_lineitem = lineitem.DataFrame()
Create a scatter plot of quantity vs price from the dataframe object using the pandas Dataframe.plot.scatter
function. On the x-axis put l_quantity
and on the y-axis put l_extendedprice
. (1 mark)
df_lineitem.plot.scatter(x = 'l_quantity', y = 'l_extendedprice')
<AxesSubplot:xlabel='l_quantity', ylabel='l_extendedprice'>

Question 4 (6 bonus marks)ΒΆ
Write and execute an SQL query that retrieves for each nation name and stores it in the nation
column, and the number of orders for the country and stores it in the order_count
column. Limit data to the year 2018. Store it in the variable orders_by_country
. To see the schema, login to the tpch
at adminer at http://localhost:8080
using tpch
as both the username and password. (3 marks)
%%sql tpch@tpch orders_by_country <<
SELECT n_name AS nation, count(o_orderdate) AS order_count
FROM orders JOIN customer ON customer.c_custkey = orders.o_custkey
JOIN nation ON nation.n_nationkey = customer.c_nationkey
WHERE strftime('%Y', o_orderdate) = "2018"
GROUP BY n_name;
Done.
Returning data to local variable orders_by_country
Convert the orders_by_country
result into a pandas DataFrame orders_by_country_df
. (1 mark)
orders_by_country_df = orders_by_country.DataFrame()
Create a bar graph in Pandas that visualizes the information. (2 marks)
orders_by_country_df.plot.bar(x='nation', y='order_count')
<AxesSubplot:xlabel='nation'>
