diff --git a/02_activities/assignments/DC_Cohort/Assignment2.md b/02_activities/assignments/DC_Cohort/Assignment2.md index 9b804e9ee..768a36294 100644 --- a/02_activities/assignments/DC_Cohort/Assignment2.md +++ b/02_activities/assignments/DC_Cohort/Assignment2.md @@ -44,6 +44,12 @@ Additionally, include a date table. There are several tools online you can use, I'd recommend [Draw.io](https://www.drawio.com/) or [LucidChart](https://www.lucidchart.com/pages/). **HINT:** You do not need to create any data for this prompt. This is a conceptual model only. +**relation ship of my tables *** +customer to order 1-to-many +employee to order 1-to-many +order to Sales 1-to-many +book to sales 1-to-many +date to order 1-to-many #### Prompt 2 We want to create employee shifts, splitting up the day into morning and evening. Add this to the ERD. @@ -51,12 +57,19 @@ We want to create employee shifts, splitting up the day into morning and evening #### Prompt 3 The store wants to keep customer addresses. Propose two architectures for the CUSTOMER_ADDRESS table, one that will retain changes, and another that will overwrite. Which is type 1, which is type 2? -**HINT:** search type 1 vs type 2 slowly changing dimensions. +**HINT:** search type 1 vs type 2 slowly changing dimensions (SCD). ``` Your answer... ``` +'''Type 1 SCD- is a type of architecture that overwrites the old CUSTOMER_ADDRESS information. This type of architecture only retains the current address inputed. Therefore, it does not save the cutomers address permanently. one can use this SCD type to collect information like shipping address. + +WHILE + +Type 2 SCD - the architecture here retains changes in the CUSTOMER_ADDRESS because a unique Id is assigned to each address input. it keeps all previous addresses and everytime time an address changes, a new record is added with effective dates. Therfore, you can see historical addresses.''' + + *** ## Section 2: @@ -185,3 +198,7 @@ Consider, for example, concepts of labour, bias, LLM proliferation, moderating c ``` Your thoughts... ``` +## The article touched on so many ethical issues ranging from how technological data bases are built by human labor which often goes unrecognized and sometimes undervalued and underpaid. Their example of how humans make garment hit home to me personally as I worry about how quickly fast fashion and over consumption is becoming normal in the world we currently live in and how we probably should be asking a lot more questions about the ethic and fairness accorded to the manual workers. +## Another big point that was discussed in this article is how bias is introduced into data and models. Since human labor and decisions in labeling and taxonomy is how these AI models are built, it is therefore very easy to introduce cultural, social, and subjective bias into AI systems. Some of these biases the begs the question on who gets to decided what is “safe” or “offensive”, what and who is labelled and why? What criteria justify the way data is labelled? Finally, who would be held responsible for the human errors or harms that is coded into the data? +## A personal example a few years back (around 2022) was when black women like me realized that when we googled “Professional hairstyles for women” not a single black woman’s hair was highlighted. This brought about a lot of conversation about the bias that was built into googles data base on what a professional woman should look like. This experience highlights how any biases about identities or categories that are underrepresented or misrepresented in the training datasets can be scaled into bigger databases which can lead to unfair or harmful outputs. +## Overall, the author highlights how AI development is not purely technical but social, and why we need ethical governance that recognizes the human work behind the models. They also recommend that we acknowledge and fairly compensate the human labor that makes these systems possible which I agree with. ## diff --git a/02_activities/assignments/DC_Cohort/assignment2.sql b/02_activities/assignments/DC_Cohort/assignment2.sql index 5ad40748a..3998e8351 100644 --- a/02_activities/assignments/DC_Cohort/assignment2.sql +++ b/02_activities/assignments/DC_Cohort/assignment2.sql @@ -20,6 +20,13 @@ The `||` values concatenate the columns into strings. Edit the appropriate columns -- you're making two edits -- and the NULL rows will be fixed. All the other rows will remain the same.) */ +SELECT +product_name || ', ' || product_size|| ' (' || product_qty_type || ')' +FROM product; + +SELECT +product_name || ', ' ||coalesce (product_size,'')|| '('||coalesce (product_qty_type,'unit') || ')' AS product_list --(added a column name because it looked weird) +FROM product; --Windowed Functions @@ -32,18 +39,44 @@ each new market date for each customer, or select only the unique market dates p (without purchase details) and number those visits. HINT: One of these approaches uses ROW_NUMBER() and one uses DENSE_RANK(). */ +--counter changing on EACH NEW MARKET DATE (so 1 count for each visit day) for each customer +SELECT + customer_id, + market_date, + ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY market_date) AS customer_visit_number +FROM ( SELECT DISTINCT customer_id, market_date FROM customer_purchases) ; ---Used Distinct statement because sometimes customers visited multiple time a day. +--OR THIS + +SELECT + customer_id, + market_date, + DENSE_RANK() OVER ( PARTITION BY customer_id ORDER BY market_date) AS customer_visit_number +FROM (SELECT DISTINCT customer_id, market_date FROM customer_purchases); /* 2. Reverse the numbering of the query from a part so each customer’s most recent visit is labeled 1, then write another query that uses this one as a subquery (or temp table) and filters the results to only the customer’s most recent visit. */ +SELECT * +FROM ( + SELECT + customer_id, + market_date, + ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY market_date DESC) AS customer_visit_number + FROM ( SELECT DISTINCT customer_id, market_date FROM customer_purchases) +) +WHERE customer_visit_number = 1 /* 3. Using a COUNT() window function, include a value along with each row of the customer_purchases table that indicates how many different times that customer has purchased that product_id. */ - +SELECT DISTINCT + customer_id, + product_id, + COUNT() OVER (PARTITION BY customer_id, product_id) AS times_purchased +FROM customer_purchases; -- String manipulations /* 1. Some product names in the product table have descriptions like "Jar" or "Organic". @@ -58,9 +91,60 @@ Remove any trailing or leading whitespaces. Don't just use a case statement for Hint: you might need to use INSTR(product_name,'-') to find the hyphens. INSTR will help split the column. */ +SELECT + product_name, + TRIM( + SUBSTR( + product_name, + INSTR(product_name, '-') + 1 + ) + ) AS product_description +FROM product +WHERE INSTR(product_name, '-') > 0; + +--I am not sure if you did not want to see any CASE statements, but i could not show the full list with the NULL items without using the CASE statement so I put both codes + +SELECT + product_name, + CASE + WHEN INSTR(product_name, '-') > 0 + THEN TRIM(SUBSTR(product_name, INSTR(product_name, '-') + 1)) + ELSE NULL + END AS product_description +FROM product; + +/* I wrote this part to just experiment because I was curious to see if i could completely severe the product name from the description to create two new columns*/ + +SELECT + -- to extract the main name BEFORE the hyphen and create its own column + TRIM( + CASE + WHEN INSTR(product_name, '-') > 0 + THEN SUBSTR(product_name, 1, INSTR(product_name, '-') - 1) + ELSE product_name + END) AS product_main_name, + + -- to extract the description AFTER the hyphen and create its own column, + TRIM( + CASE + WHEN INSTR(product_name, '-') > 0 + THEN SUBSTR(product_name, INSTR(product_name, '-') + 1) + ELSE NULL + END) AS product_description +FROM product; + /* 2. Filter the query to show any product_size value that contain a number with REGEXP. */ +SELECT + product_size, product_name, + CASE + WHEN INSTR(product_name, '-') > 0 + THEN TRIM(SUBSTR(product_name, INSTR(product_name, '-') + 1)) + ELSE NULL + END AS product_description +FROM product +WHERE product_size REGEXP '[0-9]'; -- UNION @@ -73,6 +157,47 @@ HINT: There are a possibly a few ways to do this query, but if you're struggling 3) Query the second temp table twice, once for the best day, once for the worst day, with a UNION binding them. */ + /*SELECT + product_id, market_date, + SUM(quantity * cost_to_customer_per_qty) AS total_sales + FROM customer_purchases + GROUP BY market_date*/ + + -- to calculate total sales per market_date +WITH sales_by_date AS ( + SELECT + product_id, + market_date, + SUM(quantity * cost_to_customer_per_qty) AS total_sales + FROM customer_purchases + GROUP BY market_date +), +-- ranking the days by total sales (highest to lowest) +ranked_sales AS ( + SELECT + market_date, + total_sales, + RANK() OVER (ORDER BY total_sales DESC) AS best_rank, + RANK() OVER (ORDER BY total_sales ASC) AS worst_rank + FROM sales_by_date +) + +-- to select the best and worst days, and combine them +SELECT + market_date, + total_sales, + 'Highest Sales Day' AS category +FROM ranked_sales +WHERE best_rank = 1 + +UNION + +SELECT + market_date, + total_sales, + 'Lowest Sales Day' AS category +FROM ranked_sales +WHERE worst_rank = 1; @@ -89,7 +214,35 @@ Think a bit about the row counts: how many distinct vendors, product names are t How many customers are there (y). Before your final group by you should have the product of those two queries (x*y). */ - +/*--in summary, i want to get the " total revenue = (5 unit)*(number of customer) *(original price)" +and output the names of vendors and the names of product that they sold. We will need to extract these names from vendor and product tables +Looking at the vendor_inventory table, only 3 vendors actually had an inventory that contained a total of 8 products. So our table should have 8 rows*/ + +--first, how many coustomers do we have? +WITH customer_count AS ( + SELECT COUNT(*) AS num_customers + FROM customer +), +-- to get vendor + product details +vendor_products AS ( + SELECT + v.vendor_name, + p.product_name, + vi.original_price + FROM vendor_inventory vi + JOIN vendor v + ON v.vendor_id = vi.vendor_id + JOIN product p + ON p.product_id = vi.product_id +) +-- to now calculate hypothetical TOTAL revenue +SELECT DISTINCT + vp.vendor_name, + vp.product_name, + (5 * cc.num_customers * vp.original_price) AS total_revenue +FROM vendor_products vp +CROSS JOIN customer_count cc +ORDER BY vp.vendor_name, vp.product_name; -- INSERT /*1. Create a new table "product_units". @@ -97,18 +250,65 @@ This table will contain only products where the `product_qty_type = 'unit'`. It should use all of the columns from the product table, as well as a new column for the `CURRENT_TIMESTAMP`. Name the timestamp column `snapshot_timestamp`. */ - +CREATE TABLE product_units ( + product_id INTEGER, + product_name TEXT, + product_size TEXT, + product_qty_type TEXT, + product_category_id INTEGER, + snapshot_timestamp TIMESTAMP +); +INSERT INTO product_units ( + product_id, + product_name, + product_size, + product_qty_type, + product_category_id, + snapshot_timestamp +) +SELECT + product_id, + product_name, + product_size, + product_qty_type, + product_category_id, + CURRENT_TIMESTAMP +FROM product +WHERE product_qty_type = 'unit'; /*2. Using `INSERT`, add a new row to the product_units table (with an updated timestamp). This can be any product you desire (e.g. add another record for Apple Pie). */ +INSERT INTO product_units ( + product_id, + product_name, + product_size, + product_qty_type, + product_category_id, + snapshot_timestamp +) +VALUES ( + 72, + 'Apple Pie', + 'Large', + 'unit', + 9, + CURRENT_TIMESTAMP +); -- DELETE /* 1. Delete the older record for the whatever product you added. -HINT: If you don't specify a WHERE clause, you are going to have a bad time.*/ +HINT: If you don't specify a WHERE clause, you are going to have a bad time.*/ +DELETE FROM product_units +WHERE product_name = 'Apple Pie' +AND snapshot_timestamp = ( + SELECT MIN(snapshot_timestamp) + FROM product_units + WHERE product_name = 'Apple Pie' +); -- UPDATE @@ -128,6 +328,36 @@ Finally, make sure you have a WHERE statement to update the right row, you'll need to use product_units.product_id to refer to the correct row within the product_units table. When you have all of these components, you can run the update statement. */ +ALTER TABLE product_units +ADD current_quantity INT; + +-- to get the latest quantity per product (one row per product) + +SELECT product_id, + COALESCE(quantity, 0) AS last_quantity, + market_date +FROM ( + SELECT + product_id, + quantity, + market_date, + ROW_NUMBER() OVER (PARTITION BY product_id + ORDER BY market_date DESC) AS rn + FROM vendor_inventory +) AS t +WHERE rn = 1; + + --to update product_units table to add data to the current_quantity column + +UPDATE product_units as pu +SET current_quantity = ( + SELECT COALESCE(vi.quantity, 0) + FROM vendor_inventory vi + WHERE vi.product_id = pu.product_id + ORDER BY vi.market_date DESC + LIMIT 1 +) +WHERE pu.product_id IN (SELECT product_id FROM vendor_inventory); diff --git a/02_activities/assignments/DC_Cohort/images/PROMPT 1 of assignment-2 logical diagram.drawio.png b/02_activities/assignments/DC_Cohort/images/PROMPT 1 of assignment-2 logical diagram.drawio.png new file mode 100644 index 000000000..c0c8391df Binary files /dev/null and b/02_activities/assignments/DC_Cohort/images/PROMPT 1 of assignment-2 logical diagram.drawio.png differ diff --git a/02_activities/assignments/DC_Cohort/images/Prompt 2 of 2nd logical diagram.drawio copy 2.png b/02_activities/assignments/DC_Cohort/images/Prompt 2 of 2nd logical diagram.drawio copy 2.png new file mode 100644 index 000000000..a624c4ffe Binary files /dev/null and b/02_activities/assignments/DC_Cohort/images/Prompt 2 of 2nd logical diagram.drawio copy 2.png differ diff --git a/02_activities/assignments/DC_Cohort/images/Prompt 3 of 2nd logical diagram.drawio copy.png b/02_activities/assignments/DC_Cohort/images/Prompt 3 of 2nd logical diagram.drawio copy.png new file mode 100644 index 000000000..4ee4f8942 Binary files /dev/null and b/02_activities/assignments/DC_Cohort/images/Prompt 3 of 2nd logical diagram.drawio copy.png differ