Skip to content

Commit 12dcccf

Browse files
Merge pull request #2 from Chun-YuanChen/assignment-two
UofT-DSI | sql - Assignment 2
2 parents 71db80a + a57a885 commit 12dcccf

4 files changed

Lines changed: 238 additions & 10 deletions

File tree

02_activities/assignments/Assignment2.md

Lines changed: 44 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,50 @@ The store wants to keep customer addresses. Propose two architectures for the CU
5454
**HINT:** search type 1 vs type 2 slowly changing dimensions.
5555

5656
```
57-
Your answer...
57+
58+
59+
Module: SQL
60+
Assignment: 2
61+
Section: 1
62+
Prompt: 3
63+
Name: Chun-Yuan Chen
64+
65+
66+
SCD Type 1: Overwriting the old address with the new one (i.e., old records overwritten)
67+
68+
|customer_id|province|city |street_name |street_number|unit_number|postal_code| last_update_date |
69+
|-----------|--------|---------------|----------------|-------------|-----------|-----------|------------------|
70+
| 566 | ON | Toronto | Yonge Street | 12 | 503 |M5E 1R4 |2025-08-15 |
71+
| 889 | ON | Richmond Hill | Yonge Street | 8868 | 702E |L4C 1Z8 |2025-08-15 |
72+
73+
In a Type 1 architecture, when a customer's address changes, the old address is overwritten with the new one,
74+
in the general case so the table keeps only the most recent address for each customer.
75+
In the illustrated example above, I put an 'last_update_date' column so can see when the address was last changed.
76+
77+
78+
SCD Type 2: While Keeping the old address, creating new rows for the new one (i.e., changes retained)
79+
80+
|customer_id|province|city |street_name |street_number|unit_number|postal_code|effective_date_start|effective_date_end|
81+
|-----------|--------|---------------|----------------|-------------|-----------|-----------|--------------------|------------------|
82+
| 566 | ON | Markham | Main Street N | 68 | 311 |L3P 0N5 |2023-01-25 |2025-08-14 |
83+
| 566 | ON | Toronto | Yonge Street | 12 | 503 |M5E 1R4 |2025-08-15 |NULL |
84+
| 889 | ON | Hamilton | Barton Street E| 2782 | 814 |L8E 2J8 |2020-06-17 |2025-08-14 |
85+
| 889 | ON | Richmond Hill | Yonge Street | 8868 | 702E |L4C 1Z8 |2025-08-15 |NULL |
86+
87+
In contrast to the Type 1 architecture, the Type 2 architecture retains all the old addresses in the table when a customer's address changes and
88+
adds the new one as a new row. In addition, two date columns (i.e., 'effective_date_start' and 'effective_date_end')
89+
present the effective period for each address. For the current address, the 'effective_date_end' column is NULL because it is still active.
90+
91+
92+
In my personal view, if the bookstore is small and has very limited storage resources, the Type 1 architecture would be easier to manage and query.
93+
However, the Type 2 architecture offers a window to review past records,
94+
which can be useful for checking back logistics and delivery issues that occurred before the address update.
95+
96+
97+
Chun-Yuan Chen
98+
2025-08-15
99+
100+
58101
```
59102

60103
***
72.9 KB
Binary file not shown.
75.3 KB
Binary file not shown.

02_activities/assignments/assignment2.sql

Lines changed: 194 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,16 @@
1+
2+
--Module: SQL
3+
--Name: Chun-Yuan Chen
4+
--Assignment: 2
5+
--Sections: 2 & 3
6+
7+
8+
19
/* ASSIGNMENT 2 */
210
/* SECTION 2 */
311

12+
13+
414
-- COALESCE
515
/* 1. Our favourite manager wants a detailed long list of products, but is afraid of tables!
616
We tell them, no problem! We can produce a list with all of the appropriate details.
@@ -12,14 +22,33 @@ product_name || ', ' || product_size|| ' (' || product_qty_type || ')'
1222
FROM product
1323
1424
But wait! The product table has some bad data (a few NULL values).
15-
Find the NULLs and then using COALESCE, replace the NULL with a
16-
blank for the first problem, and 'unit' for the second problem.
25+
Find the NULLs and then using COALESCE, replace the NULL with a blank for the first column with nulls, and
26+
'unit' for the second column with nulls.
1727
1828
HINT: keep the syntax the same, but edited the correct components with the string.
1929
The `||` values concatenate the columns into strings.
2030
Edit the appropriate columns -- you're making two edits -- and the NULL rows will be fixed.
2131
All the other rows will remain the same.) */
2232

33+
SELECT
34+
/* Notes:
35+
1. Initially, I checked for any empty strings in each separate column of interest,
36+
and if found, converted them to NULLs.
37+
2. In this case, I see these lines of code as a data cleaning step, although sometimes a blank can
38+
represent specific meaning depending on the context.
39+
*/
40+
NULLIF(product_name, '') AS product_name,
41+
NULLIF(product_size, '') AS product_size,
42+
NULLIF(product_qty_type, '') AS product_qty_type,
43+
44+
COALESCE(product_name, '') || ', ' || COALESCE(product_size, '') || ' (' || COALESCE(product_qty_type, 'unit') || ')' AS product_list
45+
/* Notes:
46+
1. Although the product_name column contains no NULLs, I still applied COALESCE for consistency and to ensure robustness.
47+
2. In the new product_list column, the two NULLs originally in product_size have now been replaced with blank.
48+
3. In the new product_list column, the two NULLs originally in product_qty_type have now been replaced with 'unit'.
49+
*/
50+
FROM product;
51+
2352

2453

2554
--Windowed Functions
@@ -32,17 +61,36 @@ each new market date for each customer, or select only the unique market dates p
3261
(without purchase details) and number those visits.
3362
HINT: One of these approaches uses ROW_NUMBER() and one uses DENSE_RANK(). */
3463

64+
SELECT DISTINCT customer_id, market_date, /* Notes: I added DISTINCT to ensure that only unique market dates per customer are returned. */
65+
DENSE_RANK() OVER (PARTITION BY customer_id ORDER BY market_date ASC) AS visit_number_asc
66+
/* Notes: Based on the question, I assumed that multiple transactions on the same date, regardless of time,
67+
count as the same visit. Therefore, I did not bring transaction_time into the code.*/
68+
FROM customer_purchases;
69+
3570

3671

3772
/* 2. Reverse the numbering of the query from a part so each customer’s most recent visit is labeled 1,
3873
then write another query that uses this one as a subquery (or temp table) and filters the results to
3974
only the customer’s most recent visit. */
4075

76+
SELECT customer_id, market_date
77+
FROM (
78+
SELECT DISTINCT customer_id, market_date,
79+
DENSE_RANK() OVER (PARTITION BY customer_id ORDER BY market_date DESC) AS visit_number_desc
80+
/* Notes: I used DESC to ensure each customer’s most recent visit is labeled 1. */
81+
FROM customer_purchases
82+
)
83+
WHERE visit_number_desc = 1; /* Notes: Now, only the most recent visit for each customer is returned. */
84+
4185

4286

4387
/* 3. Using a COUNT() window function, include a value along with each row of the
4488
customer_purchases table that indicates how many different times that customer has purchased that product_id. */
4589

90+
PRAGMA table_info(customer_purchases); /* Notes: I used this just to get a quick look myself at all the columns. */
91+
SELECT *, COUNT(product_id) OVER (PARTITION BY customer_id, product_id) AS product_purchase_count
92+
FROM customer_purchases;
93+
4694

4795

4896
-- String manipulations
@@ -51,16 +99,26 @@ These are separated from the product name with a hyphen.
5199
Create a column using SUBSTR (and a couple of other commands) that captures these, but is otherwise NULL.
52100
Remove any trailing or leading whitespaces. Don't just use a case statement for each product!
53101
54-
| product_name | description |
55-
|----------------------------|-------------|
56-
| Habanero Peppers - Organic | Organic |
102+
| product_name | description |
103+
|---------------------------- |-------------|
104+
| Habanero Peppers - Organic | Organic |
57105
58106
Hint: you might need to use INSTR(product_name,'-') to find the hyphens. INSTR will help split the column. */
59107

108+
SELECT product_name,
109+
CASE
110+
WHEN INSTR(product_name, '-') THEN TRIM(SUBSTR(product_name, INSTR(product_name, '-')+1))
111+
ELSE NULL
112+
END AS description
113+
FROM product;
114+
60115

61116

62117
/* 2. Filter the query to show any product_size value that contain a number with REGEXP. */
63118

119+
SELECT * FROM product
120+
WHERE product_size REGEXP '[0-9]';
121+
64122

65123

66124
-- UNION
@@ -73,6 +131,30 @@ HINT: There are a possibly a few ways to do this query, but if you're struggling
73131
3) Query the second temp table twice, once for the best day, once for the worst day,
74132
with a UNION binding them. */
75133

134+
DROP TABLE IF EXISTS temp.sales_values_by_date; /* Notes: I found temp. appears to be not necessarily required. */
135+
CREATE TEMP TABLE sales_values_by_date AS
136+
SELECT market_date, SUM(quantity * cost_to_customer_per_qty) AS total_sales_values
137+
FROM customer_purchases
138+
GROUP BY market_date;
139+
140+
141+
DROP TABLE IF EXISTS temp.sales_values_ranked;
142+
CREATE TEMP TABLE sales_values_ranked AS
143+
SELECT market_date, total_sales_values,
144+
RANK() OVER (ORDER BY total_sales_values DESC) AS total_sales_values_desc,
145+
RANK() OVER (ORDER BY total_sales_values ASC) AS total_sales_values_asc
146+
FROM sales_values_by_date;
147+
148+
149+
SELECT market_date, total_sales_values, 'best day' AS total_sales_values_marked
150+
FROM sales_values_ranked
151+
WHERE total_sales_values_desc = 1
152+
153+
UNION
154+
155+
SELECT market_date, total_sales_values, 'worst day' AS total_sales_values_marked
156+
FROM sales_values_ranked
157+
WHERE total_sales_values_asc = 1;
76158

77159

78160

@@ -90,27 +172,97 @@ How many customers are there (y).
90172
Before your final group by you should have the product of those two queries (x*y). */
91173

92174

93-
175+
/* Notes:
176+
1. This question is really no walk in the park, pretty hard!
177+
2. Original tables needed: customer, vendor, product, vendor_inventory
178+
3. Derived tables in my case: all_possible_vendor_product_pairs, vendor_original_prices, how_much_vendor_make_per_product.
179+
*/
180+
181+
WITH
182+
total_number_customers AS (
183+
SELECT COUNT(DISTINCT c.customer_id) AS num_customers FROM customer c),
184+
/* Notes: Get total #customers first, 26, and apply this number later,
185+
because the question highlighted 'every customer on record'. */
186+
187+
all_possible_vendor_product_pairs AS (
188+
SELECT v.vendor_id, v.vendor_name, p.product_id, p.product_name FROM vendor v
189+
CROSS JOIN product p),
190+
/* Notes: Get all possible vendor-product pair, 9 vendors x 23 products, 207 pairs. */
191+
192+
vendor_original_prices AS (
193+
SELECT DISTINCT vendor_id, product_id, original_price FROM vendor_inventory),
194+
/* Notes: Get each original price for each of the products listed from the three vendors in this table. */
195+
196+
how_much_vendor_make_per_product AS (
197+
SELECT
198+
apvpp.vendor_id,
199+
apvpp.vendor_name,
200+
apvpp.product_id,
201+
apvpp.product_name,
202+
vop.original_price,
203+
tnc.num_customers,
204+
5 * vop.original_price * tnc.num_customers AS vendor_revenue
205+
206+
FROM all_possible_vendor_product_pairs apvpp
207+
LEFT JOIN vendor_original_prices vop ON apvpp.vendor_id = vop.vendor_id AND apvpp.product_id = vop.product_id
208+
CROSS JOIN total_number_customers tnc)
209+
/* Notes: Derive the revenue variable. */
210+
211+
SELECT vendor_name, product_name, original_price, num_customers, COALESCE(vendor_revenue, 0) AS vendor_revenue
212+
FROM how_much_vendor_make_per_product
213+
WHERE original_price IS NOT NULL
214+
ORDER BY vendor_name, product_name;
215+
216+
217+
94218
-- INSERT
95219
/*1. Create a new table "product_units".
96220
This table will contain only products where the `product_qty_type = 'unit'`.
97221
It should use all of the columns from the product table, as well as a new column for the `CURRENT_TIMESTAMP`.
98222
Name the timestamp column `snapshot_timestamp`. */
99223

224+
DROP TABLE IF EXISTS product_units;
225+
CREATE TABLE product_units AS
226+
SELECT *, DATETIME('now', 'localtime') AS snapshot_timestamp
227+
FROM product
228+
WHERE product_qty_type = 'unit';
229+
230+
SELECT * FROM product_units; /* Notes: This line of code just for myself to do a check. */
231+
100232

101233

102234
/*2. Using `INSERT`, add a new row to the product_units table (with an updated timestamp).
103235
This can be any product you desire (e.g. add another record for Apple Pie). */
104236

237+
INSERT INTO product_units (product_id, product_name, product_size, product_category_id, product_qty_type, snapshot_timestamp)
238+
VALUES (3, 'Poblano Peppers - Organic', 'large', 1, 'unit', DATETIME('now', 'localtime'));
239+
/* Notes: So, now there are two same records (product_id = 3) except snapshot_timestamp,
240+
one is old and the other new in my case. */
241+
242+
SELECT * FROM product_units; /* Notes: This line of code just for myself to do a check. */
243+
105244

106245

107246
-- DELETE
108247
/* 1. Delete the older record for the whatever product you added.
109248
110249
HINT: If you don't specify a WHERE clause, you are going to have a bad time.*/
111-
112-
113-
250+
251+
DELETE FROM product_units AS pu1
252+
WHERE pu1.snapshot_timestamp < (
253+
SELECT MAX(snapshot_timestamp)
254+
FROM product_units AS pu2
255+
WHERE pu2.product_id = pu1.product_id
256+
AND pu2.product_name = pu1.product_name
257+
AND pu2.product_size = pu1.product_size
258+
AND pu2.product_category_id = pu1.product_category_id
259+
AND pu2.product_qty_type = pu1.product_qty_type
260+
);
261+
262+
SELECT * FROM product_units; /* Notes: This line of code just for myself to do a check. */
263+
264+
265+
114266
-- UPDATE
115267
/* 1.We want to add the current_quantity to the product_units table.
116268
First, add a new column, current_quantity to the table using the following syntax.
@@ -128,6 +280,39 @@ Finally, make sure you have a WHERE statement to update the right row,
128280
you'll need to use product_units.product_id to refer to the correct row within the product_units table.
129281
When you have all of these components, you can run the update statement. */
130282

283+
ALTER TABLE product_units ADD current_quantity INT;
284+
SELECT * FROM product_units; /* Notes: This line of code just for myself to do a check. */
285+
286+
DROP TABLE IF EXISTS vendor_inventory_copy;
287+
CREATE TABLE vendor_inventory_copy AS SELECT * FROM vendor_inventory;
288+
/*Notes: I made a copy to vendor_inventory, didn't want to affect the original one. */
289+
290+
ALTER TABLE vendor_inventory_copy ADD COLUMN current_quantity INT;
291+
SELECT * FROM vendor_inventory_copy; /* Notes: This line of code just for myself to do a check. */
292+
293+
UPDATE vendor_inventory_copy
294+
/* Notes: It appears to not able to use alias for vendor_inventory_copy in update command here. */
295+
SET current_quantity = (
296+
SELECT quantity
297+
FROM vendor_inventory vi
298+
WHERE vi.product_id = vendor_inventory_copy.product_id
299+
ORDER BY market_date DESC
300+
LIMIT 1
301+
);
302+
303+
SELECT * FROM vendor_inventory_copy; /* Notes: This line of code just for myself to do a check. */
304+
305+
306+
DROP TABLE IF EXISTS vendor_inventory_current_quantity;
307+
CREATE TABLE vendor_inventory_current_quantity AS
308+
SELECT DISTINCT product_id, current_quantity
309+
FROM vendor_inventory_copy;
131310

132311

312+
UPDATE product_units
313+
SET current_quantity = COALESCE(
314+
(SELECT vicq.current_quantity FROM vendor_inventory_current_quantity vicq WHERE vicq.product_id = product_units.product_id),
315+
0); /* Notes: If not matched, then just use 0 instead. */
316+
317+
SELECT * FROM product_units; /* Notes: This line of code just for myself to do a check. */
133318

0 commit comments

Comments
 (0)