Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 44 additions & 1 deletion 02_activities/assignments/Assignment2.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,50 @@ The store wants to keep customer addresses. Propose two architectures for the CU
**HINT:** search type 1 vs type 2 slowly changing dimensions.

```
Your answer...


Module: SQL
Assignment: 2
Section: 1
Prompt: 3
Name: Chun-Yuan Chen


SCD Type 1: Overwriting the old address with the new one (i.e., old records overwritten)

|customer_id|province|city |street_name |street_number|unit_number|postal_code| last_update_date |
|-----------|--------|---------------|----------------|-------------|-----------|-----------|------------------|
| 566 | ON | Toronto | Yonge Street | 12 | 503 |M5E 1R4 |2025-08-15 |
| 889 | ON | Richmond Hill | Yonge Street | 8868 | 702E |L4C 1Z8 |2025-08-15 |

In a Type 1 architecture, when a customer's address changes, the old address is overwritten with the new one,
in the general case so the table keeps only the most recent address for each customer.
In the illustrated example above, I put an 'last_update_date' column so can see when the address was last changed.


SCD Type 2: While Keeping the old address, creating new rows for the new one (i.e., changes retained)

|customer_id|province|city |street_name |street_number|unit_number|postal_code|effective_date_start|effective_date_end|
|-----------|--------|---------------|----------------|-------------|-----------|-----------|--------------------|------------------|
| 566 | ON | Markham | Main Street N | 68 | 311 |L3P 0N5 |2023-01-25 |2025-08-14 |
| 566 | ON | Toronto | Yonge Street | 12 | 503 |M5E 1R4 |2025-08-15 |NULL |
| 889 | ON | Hamilton | Barton Street E| 2782 | 814 |L8E 2J8 |2020-06-17 |2025-08-14 |
| 889 | ON | Richmond Hill | Yonge Street | 8868 | 702E |L4C 1Z8 |2025-08-15 |NULL |

In contrast to the Type 1 architecture, the Type 2 architecture retains all the old addresses in the table when a customer's address changes and
adds the new one as a new row. In addition, two date columns (i.e., 'effective_date_start' and 'effective_date_end')
present the effective period for each address. For the current address, the 'effective_date_end' column is NULL because it is still active.


In my personal view, if the bookstore is small and has very limited storage resources, the Type 1 architecture would be easier to manage and query.
However, the Type 2 architecture offers a window to review past records,
which can be useful for checking back logistics and delivery issues that occurred before the address update.


Chun-Yuan Chen
2025-08-15


```

***
Expand Down
Binary file added 02_activities/assignments/ERD1_Chun-YuanChen.pdf
Binary file not shown.
Binary file added 02_activities/assignments/ERD2_Chun-YuanChen.pdf
Binary file not shown.
203 changes: 194 additions & 9 deletions 02_activities/assignments/assignment2.sql
Original file line number Diff line number Diff line change
@@ -1,6 +1,16 @@

--Module: SQL
--Name: Chun-Yuan Chen
--Assignment: 2
--Sections: 2 & 3



/* ASSIGNMENT 2 */
/* SECTION 2 */



-- COALESCE
/* 1. Our favourite manager wants a detailed long list of products, but is afraid of tables!
We tell them, no problem! We can produce a list with all of the appropriate details.
Expand All @@ -12,14 +22,33 @@ product_name || ', ' || product_size|| ' (' || product_qty_type || ')'
FROM product

But wait! The product table has some bad data (a few NULL values).
Find the NULLs and then using COALESCE, replace the NULL with a
blank for the first problem, and 'unit' for the second problem.
Find the NULLs and then using COALESCE, replace the NULL with a blank for the first column with nulls, and
'unit' for the second column with nulls.

HINT: keep the syntax the same, but edited the correct components with the string.
The `||` values concatenate the columns into strings.
Edit the appropriate columns -- you're making two edits -- and the NULL rows will be fixed.
All the other rows will remain the same.) */

SELECT
/* Notes:
1. Initially, I checked for any empty strings in each separate column of interest,
and if found, converted them to NULLs.
2. In this case, I see these lines of code as a data cleaning step, although sometimes a blank can
represent specific meaning depending on the context.
*/
NULLIF(product_name, '') AS product_name,
NULLIF(product_size, '') AS product_size,
NULLIF(product_qty_type, '') AS product_qty_type,

COALESCE(product_name, '') || ', ' || COALESCE(product_size, '') || ' (' || COALESCE(product_qty_type, 'unit') || ')' AS product_list
/* Notes:
1. Although the product_name column contains no NULLs, I still applied COALESCE for consistency and to ensure robustness.
2. In the new product_list column, the two NULLs originally in product_size have now been replaced with blank.
3. In the new product_list column, the two NULLs originally in product_qty_type have now been replaced with 'unit'.
*/
FROM product;



--Windowed Functions
Expand All @@ -32,17 +61,36 @@ each new market date for each customer, or select only the unique market dates p
(without purchase details) and number those visits.
HINT: One of these approaches uses ROW_NUMBER() and one uses DENSE_RANK(). */

SELECT DISTINCT customer_id, market_date, /* Notes: I added DISTINCT to ensure that only unique market dates per customer are returned. */
DENSE_RANK() OVER (PARTITION BY customer_id ORDER BY market_date ASC) AS visit_number_asc
/* Notes: Based on the question, I assumed that multiple transactions on the same date, regardless of time,
count as the same visit. Therefore, I did not bring transaction_time into the code.*/
FROM customer_purchases;



/* 2. Reverse the numbering of the query from a part so each customer’s most recent visit is labeled 1,
then write another query that uses this one as a subquery (or temp table) and filters the results to
only the customer’s most recent visit. */

SELECT customer_id, market_date
FROM (
SELECT DISTINCT customer_id, market_date,
DENSE_RANK() OVER (PARTITION BY customer_id ORDER BY market_date DESC) AS visit_number_desc
/* Notes: I used DESC to ensure each customer’s most recent visit is labeled 1. */
FROM customer_purchases
)
WHERE visit_number_desc = 1; /* Notes: Now, only the most recent visit for each customer is returned. */



/* 3. Using a COUNT() window function, include a value along with each row of the
customer_purchases table that indicates how many different times that customer has purchased that product_id. */

PRAGMA table_info(customer_purchases); /* Notes: I used this just to get a quick look myself at all the columns. */
SELECT *, COUNT(product_id) OVER (PARTITION BY customer_id, product_id) AS product_purchase_count
FROM customer_purchases;



-- String manipulations
Expand All @@ -51,16 +99,26 @@ These are separated from the product name with a hyphen.
Create a column using SUBSTR (and a couple of other commands) that captures these, but is otherwise NULL.
Remove any trailing or leading whitespaces. Don't just use a case statement for each product!

| product_name | description |
|----------------------------|-------------|
| Habanero Peppers - Organic | Organic |
| product_name | description |
|---------------------------- |-------------|
| Habanero Peppers - Organic | Organic |

Hint: you might need to use INSTR(product_name,'-') to find the hyphens. INSTR will help split the column. */

SELECT product_name,
CASE
WHEN INSTR(product_name, '-') THEN TRIM(SUBSTR(product_name, INSTR(product_name, '-')+1))
ELSE NULL
END AS description
FROM product;



/* 2. Filter the query to show any product_size value that contain a number with REGEXP. */

SELECT * FROM product
WHERE product_size REGEXP '[0-9]';



-- UNION
Expand All @@ -73,6 +131,30 @@ HINT: There are a possibly a few ways to do this query, but if you're struggling
3) Query the second temp table twice, once for the best day, once for the worst day,
with a UNION binding them. */

DROP TABLE IF EXISTS temp.sales_values_by_date; /* Notes: I found temp. appears to be not necessarily required. */
CREATE TEMP TABLE sales_values_by_date AS
SELECT market_date, SUM(quantity * cost_to_customer_per_qty) AS total_sales_values
FROM customer_purchases
GROUP BY market_date;


DROP TABLE IF EXISTS temp.sales_values_ranked;
CREATE TEMP TABLE sales_values_ranked AS
SELECT market_date, total_sales_values,
RANK() OVER (ORDER BY total_sales_values DESC) AS total_sales_values_desc,
RANK() OVER (ORDER BY total_sales_values ASC) AS total_sales_values_asc
FROM sales_values_by_date;


SELECT market_date, total_sales_values, 'best day' AS total_sales_values_marked
FROM sales_values_ranked
WHERE total_sales_values_desc = 1

UNION

SELECT market_date, total_sales_values, 'worst day' AS total_sales_values_marked
FROM sales_values_ranked
WHERE total_sales_values_asc = 1;



Expand All @@ -90,27 +172,97 @@ How many customers are there (y).
Before your final group by you should have the product of those two queries (x*y). */



/* Notes:
1. This question is really no walk in the park, pretty hard!
2. Original tables needed: customer, vendor, product, vendor_inventory
3. Derived tables in my case: all_possible_vendor_product_pairs, vendor_original_prices, how_much_vendor_make_per_product.
*/

WITH
total_number_customers AS (
SELECT COUNT(DISTINCT c.customer_id) AS num_customers FROM customer c),
/* Notes: Get total #customers first, 26, and apply this number later,
because the question highlighted 'every customer on record'. */

all_possible_vendor_product_pairs AS (
SELECT v.vendor_id, v.vendor_name, p.product_id, p.product_name FROM vendor v
CROSS JOIN product p),
/* Notes: Get all possible vendor-product pair, 9 vendors x 23 products, 207 pairs. */

vendor_original_prices AS (
SELECT DISTINCT vendor_id, product_id, original_price FROM vendor_inventory),
/* Notes: Get each original price for each of the products listed from the three vendors in this table. */

how_much_vendor_make_per_product AS (
SELECT
apvpp.vendor_id,
apvpp.vendor_name,
apvpp.product_id,
apvpp.product_name,
vop.original_price,
tnc.num_customers,
5 * vop.original_price * tnc.num_customers AS vendor_revenue

FROM all_possible_vendor_product_pairs apvpp
LEFT JOIN vendor_original_prices vop ON apvpp.vendor_id = vop.vendor_id AND apvpp.product_id = vop.product_id
CROSS JOIN total_number_customers tnc)
/* Notes: Derive the revenue variable. */

SELECT vendor_name, product_name, original_price, num_customers, COALESCE(vendor_revenue, 0) AS vendor_revenue
FROM how_much_vendor_make_per_product
WHERE original_price IS NOT NULL
ORDER BY vendor_name, product_name;



-- INSERT
/*1. Create a new table "product_units".
This table will contain only products where the `product_qty_type = 'unit'`.
It should use all of the columns from the product table, as well as a new column for the `CURRENT_TIMESTAMP`.
Name the timestamp column `snapshot_timestamp`. */

DROP TABLE IF EXISTS product_units;
CREATE TABLE product_units AS
SELECT *, DATETIME('now', 'localtime') AS snapshot_timestamp
FROM product
WHERE product_qty_type = 'unit';

SELECT * FROM product_units; /* Notes: This line of code just for myself to do a check. */



/*2. Using `INSERT`, add a new row to the product_units table (with an updated timestamp).
This can be any product you desire (e.g. add another record for Apple Pie). */

INSERT INTO product_units (product_id, product_name, product_size, product_category_id, product_qty_type, snapshot_timestamp)
VALUES (3, 'Poblano Peppers - Organic', 'large', 1, 'unit', DATETIME('now', 'localtime'));
/* Notes: So, now there are two same records (product_id = 3) except snapshot_timestamp,
one is old and the other new in my case. */

SELECT * FROM product_units; /* Notes: This line of code just for myself to do a check. */



-- DELETE
/* 1. Delete the older record for the whatever product you added.

HINT: If you don't specify a WHERE clause, you are going to have a bad time.*/




DELETE FROM product_units AS pu1
WHERE pu1.snapshot_timestamp < (
SELECT MAX(snapshot_timestamp)
FROM product_units AS pu2
WHERE pu2.product_id = pu1.product_id
AND pu2.product_name = pu1.product_name
AND pu2.product_size = pu1.product_size
AND pu2.product_category_id = pu1.product_category_id
AND pu2.product_qty_type = pu1.product_qty_type
);

SELECT * FROM product_units; /* Notes: This line of code just for myself to do a check. */



-- UPDATE
/* 1.We want to add the current_quantity to the product_units table.
First, add a new column, current_quantity to the table using the following syntax.
Expand All @@ -128,6 +280,39 @@ Finally, make sure you have a WHERE statement to update the right row,
you'll need to use product_units.product_id to refer to the correct row within the product_units table.
When you have all of these components, you can run the update statement. */

ALTER TABLE product_units ADD current_quantity INT;
SELECT * FROM product_units; /* Notes: This line of code just for myself to do a check. */

DROP TABLE IF EXISTS vendor_inventory_copy;
CREATE TABLE vendor_inventory_copy AS SELECT * FROM vendor_inventory;
/*Notes: I made a copy to vendor_inventory, didn't want to affect the original one. */

ALTER TABLE vendor_inventory_copy ADD COLUMN current_quantity INT;
SELECT * FROM vendor_inventory_copy; /* Notes: This line of code just for myself to do a check. */

UPDATE vendor_inventory_copy
/* Notes: It appears to not able to use alias for vendor_inventory_copy in update command here. */
SET current_quantity = (
SELECT quantity
FROM vendor_inventory vi
WHERE vi.product_id = vendor_inventory_copy.product_id
ORDER BY market_date DESC
LIMIT 1
);

SELECT * FROM vendor_inventory_copy; /* Notes: This line of code just for myself to do a check. */


DROP TABLE IF EXISTS vendor_inventory_current_quantity;
CREATE TABLE vendor_inventory_current_quantity AS
SELECT DISTINCT product_id, current_quantity
FROM vendor_inventory_copy;


UPDATE product_units
SET current_quantity = COALESCE(
(SELECT vicq.current_quantity FROM vendor_inventory_current_quantity vicq WHERE vicq.product_id = product_units.product_id),
0); /* Notes: If not matched, then just use 0 instead. */

SELECT * FROM product_units; /* Notes: This line of code just for myself to do a check. */