Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 18 additions & 1 deletion 02_activities/assignments/DC_Cohort/Assignment2.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,19 +44,32 @@ Additionally, include a date table.
There are several tools online you can use, I'd recommend [Draw.io](https://www.drawio.com/) or [LucidChart](https://www.lucidchart.com/pages/).

**HINT:** You do not need to create any data for this prompt. This is a conceptual model only.
**relation ship of my tables ***
customer to order 1-to-many
employee to order 1-to-many
order to Sales 1-to-many
book to sales 1-to-many
date to order 1-to-many

#### Prompt 2
We want to create employee shifts, splitting up the day into morning and evening. Add this to the ERD.

#### Prompt 3
The store wants to keep customer addresses. Propose two architectures for the CUSTOMER_ADDRESS table, one that will retain changes, and another that will overwrite. Which is type 1, which is type 2?

**HINT:** search type 1 vs type 2 slowly changing dimensions.
**HINT:** search type 1 vs type 2 slowly changing dimensions (SCD).

```
Your answer...
```

'''Type 1 SCD- is a type of architecture that overwrites the old CUSTOMER_ADDRESS information. This type of architecture only retains the current address inputed. Therefore, it does not save the cutomers address permanently. one can use this SCD type to collect information like shipping address.

WHILE

Type 2 SCD - the architecture here retains changes in the CUSTOMER_ADDRESS because a unique Id is assigned to each address input. it keeps all previous addresses and everytime time an address changes, a new record is added with effective dates. Therfore, you can see historical addresses.'''


***

## Section 2:
Expand Down Expand Up @@ -185,3 +198,7 @@ Consider, for example, concepts of labour, bias, LLM proliferation, moderating c
```
Your thoughts...
```
## The article touched on so many ethical issues ranging from how technological data bases are built by human labor which often goes unrecognized and sometimes undervalued and underpaid. Their example of how humans make garment hit home to me personally as I worry about how quickly fast fashion and over consumption is becoming normal in the world we currently live in and how we probably should be asking a lot more questions about the ethic and fairness accorded to the manual workers.
## Another big point that was discussed in this article is how bias is introduced into data and models. Since human labor and decisions in labeling and taxonomy is how these AI models are built, it is therefore very easy to introduce cultural, social, and subjective bias into AI systems. Some of these biases the begs the question on who gets to decided what is “safe” or “offensive”, what and who is labelled and why? What criteria justify the way data is labelled? Finally, who would be held responsible for the human errors or harms that is coded into the data?
## A personal example a few years back (around 2022) was when black women like me realized that when we googled “Professional hairstyles for women” not a single black woman’s hair was highlighted. This brought about a lot of conversation about the bias that was built into googles data base on what a professional woman should look like. This experience highlights how any biases about identities or categories that are underrepresented or misrepresented in the training datasets can be scaled into bigger databases which can lead to unfair or harmful outputs.
## Overall, the author highlights how AI development is not purely technical but social, and why we need ethical governance that recognizes the human work behind the models. They also recommend that we acknowledge and fairly compensate the human labor that makes these systems possible which I agree with. ##
238 changes: 234 additions & 4 deletions 02_activities/assignments/DC_Cohort/assignment2.sql
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,13 @@ The `||` values concatenate the columns into strings.
Edit the appropriate columns -- you're making two edits -- and the NULL rows will be fixed.
All the other rows will remain the same.) */

SELECT
product_name || ', ' || product_size|| ' (' || product_qty_type || ')'
FROM product;

SELECT
product_name || ', ' ||coalesce (product_size,'')|| '('||coalesce (product_qty_type,'unit') || ')' AS product_list --(added a column name because it looked weird)
FROM product;


--Windowed Functions
Expand All @@ -32,18 +39,44 @@ each new market date for each customer, or select only the unique market dates p
(without purchase details) and number those visits.
HINT: One of these approaches uses ROW_NUMBER() and one uses DENSE_RANK(). */

--counter changing on EACH NEW MARKET DATE (so 1 count for each visit day) for each customer
SELECT
customer_id,
market_date,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY market_date) AS customer_visit_number
FROM ( SELECT DISTINCT customer_id, market_date FROM customer_purchases) ; ---Used Distinct statement because sometimes customers visited multiple time a day.

--OR THIS

SELECT
customer_id,
market_date,
DENSE_RANK() OVER ( PARTITION BY customer_id ORDER BY market_date) AS customer_visit_number
FROM (SELECT DISTINCT customer_id, market_date FROM customer_purchases);

/* 2. Reverse the numbering of the query from a part so each customer’s most recent visit is labeled 1,
then write another query that uses this one as a subquery (or temp table) and filters the results to
only the customer’s most recent visit. */

SELECT *
FROM (
SELECT
customer_id,
market_date,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY market_date DESC) AS customer_visit_number
FROM ( SELECT DISTINCT customer_id, market_date FROM customer_purchases)
)
WHERE customer_visit_number = 1


/* 3. Using a COUNT() window function, include a value along with each row of the
customer_purchases table that indicates how many different times that customer has purchased that product_id. */


SELECT DISTINCT
customer_id,
product_id,
COUNT() OVER (PARTITION BY customer_id, product_id) AS times_purchased
FROM customer_purchases;

-- String manipulations
/* 1. Some product names in the product table have descriptions like "Jar" or "Organic".
Expand All @@ -58,9 +91,60 @@ Remove any trailing or leading whitespaces. Don't just use a case statement for
Hint: you might need to use INSTR(product_name,'-') to find the hyphens. INSTR will help split the column. */


SELECT
product_name,
TRIM(
SUBSTR(
product_name,
INSTR(product_name, '-') + 1
)
) AS product_description
FROM product
WHERE INSTR(product_name, '-') > 0;

--I am not sure if you did not want to see any CASE statements, but i could not show the full list with the NULL items without using the CASE statement so I put both codes

SELECT
product_name,
CASE
WHEN INSTR(product_name, '-') > 0
THEN TRIM(SUBSTR(product_name, INSTR(product_name, '-') + 1))
ELSE NULL
END AS product_description
FROM product;

/* I wrote this part to just experiment because I was curious to see if i could completely severe the product name from the description to create two new columns*/

SELECT
-- to extract the main name BEFORE the hyphen and create its own column
TRIM(
CASE
WHEN INSTR(product_name, '-') > 0
THEN SUBSTR(product_name, 1, INSTR(product_name, '-') - 1)
ELSE product_name
END) AS product_main_name,

-- to extract the description AFTER the hyphen and create its own column,
TRIM(
CASE
WHEN INSTR(product_name, '-') > 0
THEN SUBSTR(product_name, INSTR(product_name, '-') + 1)
ELSE NULL
END) AS product_description
FROM product;


/* 2. Filter the query to show any product_size value that contain a number with REGEXP. */

SELECT
product_size, product_name,
CASE
WHEN INSTR(product_name, '-') > 0
THEN TRIM(SUBSTR(product_name, INSTR(product_name, '-') + 1))
ELSE NULL
END AS product_description
FROM product
WHERE product_size REGEXP '[0-9]';


-- UNION
Expand All @@ -73,6 +157,47 @@ HINT: There are a possibly a few ways to do this query, but if you're struggling
3) Query the second temp table twice, once for the best day, once for the worst day,
with a UNION binding them. */

/*SELECT
product_id, market_date,
SUM(quantity * cost_to_customer_per_qty) AS total_sales
FROM customer_purchases
GROUP BY market_date*/

-- to calculate total sales per market_date
WITH sales_by_date AS (
SELECT
product_id,
market_date,
SUM(quantity * cost_to_customer_per_qty) AS total_sales
FROM customer_purchases
GROUP BY market_date
),
-- ranking the days by total sales (highest to lowest)
ranked_sales AS (
SELECT
market_date,
total_sales,
RANK() OVER (ORDER BY total_sales DESC) AS best_rank,
RANK() OVER (ORDER BY total_sales ASC) AS worst_rank
FROM sales_by_date
)

-- to select the best and worst days, and combine them
SELECT
market_date,
total_sales,
'Highest Sales Day' AS category
FROM ranked_sales
WHERE best_rank = 1

UNION

SELECT
market_date,
total_sales,
'Lowest Sales Day' AS category
FROM ranked_sales
WHERE worst_rank = 1;



Expand All @@ -89,26 +214,101 @@ Think a bit about the row counts: how many distinct vendors, product names are t
How many customers are there (y).
Before your final group by you should have the product of those two queries (x*y). */


/*--in summary, i want to get the " total revenue = (5 unit)*(number of customer) *(original price)"
and output the names of vendors and the names of product that they sold. We will need to extract these names from vendor and product tables
Looking at the vendor_inventory table, only 3 vendors actually had an inventory that contained a total of 8 products. So our table should have 8 rows*/

--first, how many coustomers do we have?
WITH customer_count AS (
SELECT COUNT(*) AS num_customers
FROM customer
),
-- to get vendor + product details
vendor_products AS (
SELECT
v.vendor_name,
p.product_name,
vi.original_price
FROM vendor_inventory vi
JOIN vendor v
ON v.vendor_id = vi.vendor_id
JOIN product p
ON p.product_id = vi.product_id
)
-- to now calculate hypothetical TOTAL revenue
SELECT DISTINCT
vp.vendor_name,
vp.product_name,
(5 * cc.num_customers * vp.original_price) AS total_revenue
FROM vendor_products vp
CROSS JOIN customer_count cc
ORDER BY vp.vendor_name, vp.product_name;

-- INSERT
/*1. Create a new table "product_units".
This table will contain only products where the `product_qty_type = 'unit'`.
It should use all of the columns from the product table, as well as a new column for the `CURRENT_TIMESTAMP`.
Name the timestamp column `snapshot_timestamp`. */


CREATE TABLE product_units (
product_id INTEGER,
product_name TEXT,
product_size TEXT,
product_qty_type TEXT,
product_category_id INTEGER,
snapshot_timestamp TIMESTAMP
);
INSERT INTO product_units (
product_id,
product_name,
product_size,
product_qty_type,
product_category_id,
snapshot_timestamp
)
SELECT
product_id,
product_name,
product_size,
product_qty_type,
product_category_id,
CURRENT_TIMESTAMP
FROM product
WHERE product_qty_type = 'unit';

/*2. Using `INSERT`, add a new row to the product_units table (with an updated timestamp).
This can be any product you desire (e.g. add another record for Apple Pie). */

INSERT INTO product_units (
product_id,
product_name,
product_size,
product_qty_type,
product_category_id,
snapshot_timestamp
)
VALUES (
72,
'Apple Pie',
'Large',
'unit',
9,
CURRENT_TIMESTAMP
);


-- DELETE
/* 1. Delete the older record for the whatever product you added.

HINT: If you don't specify a WHERE clause, you are going to have a bad time.*/
HINT: If you don't specify a WHERE clause, you are going to have a bad time.*/

DELETE FROM product_units
WHERE product_name = 'Apple Pie'
AND snapshot_timestamp = (
SELECT MIN(snapshot_timestamp)
FROM product_units
WHERE product_name = 'Apple Pie'
);


-- UPDATE
Expand All @@ -128,6 +328,36 @@ Finally, make sure you have a WHERE statement to update the right row,
you'll need to use product_units.product_id to refer to the correct row within the product_units table.
When you have all of these components, you can run the update statement. */

ALTER TABLE product_units
ADD current_quantity INT;

-- to get the latest quantity per product (one row per product)

SELECT product_id,
COALESCE(quantity, 0) AS last_quantity,
market_date
FROM (
SELECT
product_id,
quantity,
market_date,
ROW_NUMBER() OVER (PARTITION BY product_id
ORDER BY market_date DESC) AS rn
FROM vendor_inventory
) AS t
WHERE rn = 1;

--to update product_units table to add data to the current_quantity column

UPDATE product_units as pu
SET current_quantity = (
SELECT COALESCE(vi.quantity, 0)
FROM vendor_inventory vi
WHERE vi.product_id = pu.product_id
ORDER BY vi.market_date DESC
LIMIT 1
)
WHERE pu.product_id IN (SELECT product_id FROM vendor_inventory);



Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.