Capstone Projects in SQL
As part of my professional development and data analysis interests, I have completed various lessons and projects in SQL. A number of them are highlighted below, but for a more complete collection of SQL skills I have developed, please click here.
Use the links below to jump to your desired project:
Social Media Analytics: Window Functions and Subqueries
Funnel for Mattresses and More: Joins and Aggregates for Business Analytics
Songify Music Streaming: Basic and Advanced Queries, Joins and Date Functions
Social Media Analytics
-Window Functions and Subqueries-
Introduction
A digital marketing agent is looking to study the ad campaigns used by some of the most-followed Instagram accounts in the world. To achieve this task, assistance is required to analyse monthly account data including the change in the number of followers as well as the amount of content posted by each account.
The database consists of one table:
social_media
The database schema is shown below:
Column | Description |
---|---|
username | Username of Instagram account |
month | Month number for which data is recorded |
posts | Number of posts by each account in a given month |
followers_change | Change in account followers for a given month |
Queries
SQL queries are used to answer the following questions:
1. Explore the dataset
SELECT * FROM social_media;
username | month | posts | followers_change |
---|---|---|---|
ladygaga | 1 | 32 | 6.27 |
ladygaga | 2 | 39 | 4.71 |
ladygaga | 3 | 39 | 5.73 |
ladygaga | 4 | 41 | 5.43 |
ladygaga | 5 | 45 | 4.99 |
ladygaga | 6 | 5 | 2.52 |
ladygaga | 7 | 41 | 4.61 |
ladygaga | 8 | 44 | 3.91 |
cristiano | 1 | 21 | 5.33 |
cristiano | 2 | 19 | 4.93 |
cristiano | 3 | 7 | 4.3 |
cristiano | 4 | 17 | 5.74 |
cristiano | 5 | 20 | 5.81 |
cristiano | 6 | 19 | 5.74 |
cristiano | 7 | 15 | 5.99 |
cristiano | 8 | 21 | 4.6 |
arianagrande | 1 | 12 | 3.69 |
arianagrande | 2 | 22 | 4.4 |
arianagrande | 3 | 14 | 4.93 |
arianagrande | 4 | 22 | 4.12 |
arianagrande | 5 | 48 | 5.47 |
arianagrande | 6 | 14 | 4.01 |
arianagrande | 7 | 23 | 3.71 |
arianagrande | 8 | 24 | 4.02 |
therock | 1 | 44 | 3.75 |
therock | 2 | 44 | 3.15 |
therock | 3 | 36 | 3.5 |
therock | 4 | 56 | 4.4 |
therock | 5 | 70 | 3.77 |
therock | 6 | 19 | 3.25 |
therock | 7 | 62 | 3.54 |
therock | 8 | 56 | 4.32 |
aliaabhatt | 1 | 9 | 1.15 |
aliaabhatt | 2 | 14 | 1.16 |
aliaabhatt | 3 | 25 | 1.52 |
aliaabhatt | 4 | 9 | 1.57 |
aliaabhatt | 5 | 9 | 1.47 |
aliaabhatt | 6 | 7 | -0.76 |
aliaabhatt | 7 | 5 | 1.09 |
aliaabhatt | 8 | 13 | 1 |
theellenshow | 1 | 129 | 1.76 |
theellenshow | 2 | 123 | 1.43 |
theellenshow | 3 | 126 | 1.88 |
theellenshow | 4 | 104 | 1.41 |
theellenshow | 5 | 121 | 1.73 |
theellenshow | 6 | 72 | 1.26 |
theellenshow | 7 | 30 | 1.16 |
theellenshow | 8 | 0 | 0.79 |
leomessi | 1 | 2 | 0.03 |
leomessi | 2 | 4 | 0.01 |
leomessi | 3 | 1 | 0.02 |
leomessi | 4 | 3 | 0.01 |
leomessi | 5 | 2 | 0.01 |
leomessi | 6 | 1 | 0.01 |
leomessi | 7 | 0 | 0.03 |
leomessi | 8 | 2 | 0.05 |
2. Select all the monthly data stored in the dataset, but only for accounts with a total number of posts greater than 300 over the 8 months. Add a column to the table showing the running total of followers after each month
WITH filtered_table AS( SELECT * FROM social_media WHERE username IN( SELECT username FROM social_media GROUP BY username HAVING SUM(posts) > 300 ) ) SELECT username, month, posts, followers_change, ROUND(SUM(followers_change) OVER ( PARTITION BY username ORDER BY month ), 2) AS 'followers_running_tot' FROM filtered_table;
username | month | posts | followers_change | followers_running_tot |
---|---|---|---|---|
theellenshow | 1 | 129 | 1.76 | 1.76 |
theellenshow | 2 | 123 | 1.43 | 3.19 |
theellenshow | 3 | 126 | 1.88 | 5.07 |
theellenshow | 4 | 104 | 1.41 | 6.48 |
theellenshow | 5 | 121 | 1.73 | 8.21 |
theellenshow | 6 | 72 | 1.26 | 9.47 |
theellenshow | 7 | 30 | 1.16 | 10.63 |
theellenshow | 8 | 0 | 0.79 | 11.42 |
therock | 1 | 44 | 3.75 | 3.75 |
therock | 2 | 44 | 3.15 | 6.9 |
therock | 3 | 36 | 3.5 | 10.4 |
therock | 4 | 56 | 4.4 | 14.8 |
therock | 5 | 70 | 3.77 | 18.57 |
therock | 6 | 19 | 3.25 | 21.82 |
therock | 7 | 62 | 3.54 | 25.36 |
therock | 8 | 56 | 4.32 | 29.68 |
Visualisation of Results
The results were imported into RStudio (data analysis in R programming language) to visualise the results using the ggplot2 library. There was particular interest in how the total number of followers for each account changed throughout the months analysed:
Funnel for Mattresses and More
-Joins and Aggregates for Business Analytics-
Introduction
Mattresses and More is a bedding essentials e-commerce store. Their purchase funnel is:
User browses products and adds them to shopping cart
User views shopping cart and proceeds to checkout page
Use enters payment details and purchases the products
The Management Team is requesting to see the conversion of users from the browse to the checkout stage, and from the checkout to the purchase stage. As a sales analyst, I am tasked to provide this information, as well as to examine the trend in the data on the approach to Christmas Day.
The database consists of three tables:
browse
checkout
purchase
The database schema is shown below:
Column | Description |
---|---|
user_id | A unique identifier for each user |
browse_date | The date when the user browsed the product |
item_id | A unique identifier for each product |
Column | Description |
---|---|
user_id | A unique identifier for each user |
checkout_date | The date when the user entered the checkout page |
item_id | A unique identifier for each product |
Column | Description |
---|---|
user_id | A unique identifier for each user |
purchase_date | The date when the user purchased the product |
item_id | A unique identifier for each product |
Queries
SQL queries are used to answer the following question:
1. Create a funnel for Mattresses and More which displays the number of users at each stage (browse, checkout and purchase) for each date available
WITH funnels AS ( SELECT DISTINCT b.browse_date, b.user_id, c.user_id IS NOT NULL AS 'is_checkout', p.user_id IS NOT NULL AS 'is_purchase' FROM browse AS 'b' LEFT JOIN checkout AS 'c' ON c.user_id = b.user_id LEFT JOIN purchase AS 'p' ON p.user_id = c.user_id) SELECT browse_date, COUNT(*) AS 'num_browse', SUM(is_checkout) AS 'num_checkout', SUM(is_purchase) AS 'num_purchase', ROUND(1.0 * SUM(is_checkout) / COUNT(user_id), 3) AS 'browse_to_checkout', ROUND(1.0 * SUM(is_purchase) / SUM(is_checkout), 3) AS 'checkout_to_purchase' FROM funnels GROUP BY browse_date ORDER BY browse_date;
browse_date | num_browse | num_checkout | num_purchase | browse_to_checkout |
checkout_to_purchase
|
---|---|---|---|---|---|
2017-12-20 |
100 |
20 | 16 | 0.2 | 0.8 |
2017-12-21 | 150 | 33 | 28 | 0.22 | 0.848 |
2017-12-22 | 250 | 62 | 55 | 0.248 | 0.887 |
2017-12-23 | 275 | 68 | 64 | 0.247 | 0.941 |
Analysis of Results
As suspected, there was an increase in sales on the approach to Christmas Day (look at the increase in checkout_to_purchase rate). This analysis also shows that once a user has entered the checkout page, they are more likely to complete the purchase compared to the rate of users who enter the checkout page after only browsing (for example 0.941 vs 0.247 on 2017-12-23).
Songify Music Streaming
-Basic and Advanced Queries, Joins and Date Functions-
Introduction
A music streaming company offers both a free product and a premium paid product. Assistance is needed to analyse their customer database which consists of six tables:
plans
users
premium_users
songs
months
plays
The database schema is shown below:
Column | Description |
---|---|
id | A unique identifier for each plan |
price | The monthly cost of the plan |
description | A description of the plan |
Column | Description |
---|---|
id | A unique identifier for each user |
first_name | The first name of the user |
last_name | The last name of the user |
age | The age of the user |
gender | The gender of the user |
Column | Description |
---|---|
user_id | A unique identifier for each user. Matches users.id |
membership_plan_id | An ID for which payment plan that user is on. Matches plans.id |
purchase_date | The date when the user purchased their premium plan |
cancel_date | The date when the user canceled their plan (which can be `NULL` if they haven't canceled yet) |
Column | Description |
---|---|
id | A unique identifier for each song |
title | The title of the song |
artist | The artist who recorded the song |
year | The year that the song was released |
Column | Description |
---|---|
user_id | A unique identifier for each user. Matches users.id |
song_id | An ID for which payment plan that user is on. Matches songs.id |
play_date | The date when the user played this song |
play_hour | The hour of day (0 - 23) when the user played this song |
Column | Description |
---|---|
months | The first date of a month |
Queries
SQL queries are used to answer the following questions:
1. Which plans are used by which premium members?
SELECT premium_users.user_id, plans.description FROM premium_users JOIN plans ON premium_users.membership_plan_id = plans.id;
user_id | description |
---|---|
e9005106-b5d3-49d8-bb80-c6dac246140b |
unlimited songs |
a4c787c5-fa70-42a9-a139-b288a6d6c8ec | limited songs per month |
42038fb3-b538-4c25-9c76-cabf57c29e16 | unlimited songs |
c89db18c-94c2-45e3-8cb2-a5c5df2d75f7 | unlimited songs |
... | ... |
2. Which titles of songs are played by each user?
SELECT plays.user_id, plays.play_date, songs.title FROM plays JOIN songs ON plays.song_id = songs.id;
user_id | play_date | title |
---|---|---|
220201d9-21ad-4380-94f2-148ce930be06 |
2017-01-03 |
What About Love?
|
0fd581d4-e543-4651-8aa0-0a8e84d18405 | 2017-01-03 |
Illegal Smile
|
220201d9-21ad-4380-94f2-148ce930be06 | 2017-01-03 |
What's Your Name
|
42038fb3-b538-4c25-9c76-cabf57c29e16 | 2017-01-04 |
Small Town
|
... | ... | ... |
3. Which users aren’t premium users?
SELECT users.id FROM users LEFT JOIN premium_users ON users.id = premium_users.user_id WHERE premium_users.user_id IS NULL;
id |
---|
90038787-ceed-41db-9bb6-4a3d1d644bbe
|
9244d64e-8dbb-4382-a4a5-5318661da692
|
aac29bd6-61be-4cbd-8891-f05e703dff01
|
8a08151d-ff10-47a0-be3f-2164459981fa
|
... |
4. Find which users played songs in January, but not February
WITH january AS ( SELECT * FROM plays WHERE strftime("%m", play_date) = '01' ), february AS ( SELECT * FROM plays WHERE strftime("%m", play_date) = '02' ) SELECT january.user_id FROM january LEFT JOIN february ON january.user_id = february.user_id WHERE february.user_id IS NULL;
user_id |
---|
220201d9-21ad-4380-94f2-148ce930be06
|
0fd581d4-e543-4651-8aa0-0a8e84d18405
|
220201d9-21ad-4380-94f2-148ce930be06
|
42038fb3-b538-4c25-9c76-cabf57c29e16
|
... |
5. For each month, we want to know if each user in 'premium_users' was active or canceled.
SELECT premium_users.user_id, months.months, CASE WHEN ( premium_users.purchase_date <= months.months ) AND ( premium_users.cancel_date >= months.months OR premium_users.cancel_date IS NULL ) THEN 'active' ELSE 'not_active' END AS 'status' FROM premium_users CROSS JOIN months;
user_id | months | status |
---|---|---|
e9005106-b5d3-49d8-bb80-c6dac246140b |
2017-01-01 |
not_active
|
e9005106-b5d3-49d8-bb80-c6dac246140b | 2017-02-01 |
active
|
e9005106-b5d3-49d8-bb80-c6dac246140b | 2017-03-01 |
not_active
|
a4c787c5-fa70-42a9-a139-b288a6d6c8ec | 2017-01-01 |
not_active
|
... | ... | ... |
6. Songify has added some new songs to their catalog. Combine the existing songs with the new bonus songs.
SELECT * FROM songs UNION SELECT * FROM bonus_songs;
index | id | title | artist | year |
---|---|---|---|---|
0 |
0 |
Caught Up in You
|
.38 Special | 1982 |
0 | 2229 |
Can't Stop the Feeling
|
Justin Timberlake | 2016 |
1 | 1 |
Fantasy Girl
|
.38 Special | NULL |
1 | 2230 |
Send My Love (To Your New Lover)
|
Adele | 2016 |
... | ... | ... | ... | ... |
7. Find how many times each song was played, displaying the title, artist and play count.
WITH play_count AS( SELECT song_id, COUNT(*) AS 'times_played' FROM plays GROUP BY song_id ) SELECT songs.title, songs.artist, play_count.times_played FROM songs JOIN play_count ON songs.id = play_count.song_id;
title | artist | times_played |
---|---|---|
Ballroom Blitz |
Sweet |
7
|
Karn Evil 9 | Emerson, Lake & Palmer |
10
|
Hello It's Me | Jefferson Starship |
8
|
Jane | Slaughter |
9
|
... | ... | ... |