Step 1: Create an Entity Relationship Diagram (ERD)
- Inspect the provided CSV files to determine the structure of the data.
- Define how many tables are necessary and the relationships among them.
- Use a tool like Quick Database Diagrams to create your ERD.
Hints:
- For the
credit_card
andtransaction
tables, thecard
column should be aVARCHAR(20)
rather than anINT
. - For the
transaction
table, thedate
column should be aTIMESTAMP
rather thanDATE
.
Step 2: Create a Database Schema
- Using your ERD as a blueprint, create the database schema.
- Define data types, primary keys, foreign keys, and other constraints for each table.
- Populate the database with data from the corresponding CSV files.
Part 1: Analyze Potential Fraudulent Transactions
Step 1: Generate Queries
-
Identify Small Transactions:
- Isolate transactions less than $2.00 per cardholder.
- Count the transactions per cardholder.
-
Evidence of Hacked Credit Cards:
- Analyze if there's evidence of hacking based on the number and pattern of small transactions.
-
High Transactions in Specific Time Frame:
- Find the top 100 highest transactions made between 7:00 am and 9:00 am.
- Identify any anomalous transactions and compare the number of fraudulent transactions during this period versus the rest of the day.
-
Merchants Prone to Hacking:
- Identify the top 5 merchants prone to small transaction hacks.
Step 2: Create Views for Queries
- Create views in the database for each query for easy reference and analysis.
Part 2: Detailed Trends Data for Specific Cardholders
-
Important Customers Analysis:
- Cardholder IDs 2 and 18: Verify fraudulent transactions.
- Create a line plot for each cardholder representing the time series of transactions over the year.
- Create a combined line plot for comparison.
-
Corporate Credit Card Analysis:
- Cardholder ID 25: Analyze suspected unauthorized transactions.
- Create a box plot of expenditures from January 2018 to June 2018.
- Identify and analyze outliers and anomalies.
Step 1: Detect Outliers
-
Standard Deviation Method:
- Code a Python function to identify anomalies based on standard deviation for any cardholder.
-
Interquartile Range (IQR) Method:
- Code a Python function to identify anomalies based on IQR for any cardholder.
Reference Articles: