In this lab, you will build an event-driven data processing pipeline using AWS services. You'll create a FastAPI application running on EC2 that automatically receives notifications whenever new CSV files are uploaded to an S3 bucket. This pattern is fundamental to modern cloud architectures and enables real-time data processing workflows.
However, this example is not particularly "cloud-native" in that it runs within EC2 that will quite likely sit idle 99% of the time. More elegant "serverless" designs will be presented later in the course.
Key Concepts:
- Event-driven architecture patterns
- AWS Simple Notification Service (SNS) for message routing
- S3 event notifications
- HTTP endpoints and webhook patterns
- JSON message parsing and handling
Time Estimate: 60-90 minutes
By the end of this lab, you will be able to:
- Deploy and configure a FastAPI application on EC2 to serve as an HTTP endpoint
- Create and configure SNS topics and subscriptions for event routing
- Enable S3 event notifications to trigger messages on object creation
- Parse and process JSON event payloads from AWS services
- Debug event-driven workflows using application logs and AWS console tools
- Explain the benefits and use cases of event-driven vs. polling-based architectures
- Personal AWS account
- SSH client and key pair for EC2 access
- AWS CLI configured on your local machine (for testing)
- Basic familiarity with Python and REST APIs
┌─────────────────┐
│ S3 Bucket │
│ (CSV upload) │
└────────┬────────┘
│ Event
↓
┌─────────────────┐
│ SNS Topic │
│ "DS5220" │
└────────┬────────┘
│ HTTP POST
↓
┌─────────────────┐
│ EC2 Instance │
│ FastAPI App │
│ (port 80) │
└─────────────────┘
Launch a new Ubuntu instance in EC2 with the following specifications:
- AMI: Ubuntu Server 24.04 LTS or newer
- Instance Type: t3.micro (free tier eligible)
- Security Group: Allow inbound traffic on:
- Port 22 (SSH) from your IP
- Port 80 (HTTP) from Anywhere (0.0.0.0/0) - required for SNS to reach your endpoint
- Storage: 8 GB (default is fine)
Create a file named main.py with content from this source.
This simple API sends and receives JSON data payloads, and its primary purpose
for this lab is to receive or "catch" JSON event messages sent via SNS, which are triggered
whenever new files arrive in S3. The data/ resource of the API will receive the
notification via an HTTP POST and then parse its contents.
Either bootstrap your EC2 instance with the steps below, or SSH into your EC2 instance and manually run the following commands:
# Update package lists
sudo apt update
# Install Python and FastAPI dependencies
sudo apt install -y python3-pip python3-fastapi uvicorn
# Create application directory
mkdir -p ~/api
cd ~/api
# Create the main.py file (paste the file above)
# or fetch by URL: https://raw.githubusercontent.com/uvasds-systems/ds5220-cloud/refs/heads/main/labs/lab05/main.py
nano main.pyRun the FastAPI application. Change be sure you are in the /home/ubuntu/api/ subdirectory before running this command:
sudo uvicorn main:app --reload --host 0.0.0.0 --port 80Important: Keep this terminal session open. You'll need to see the log output throughout this lab.
From your local browser, navigate to:
http://YOUR-EC2-PUBLIC-IP/
You should see:
{"message":"Hello World"}And in your EC2 terminal, you should see a GET request logged.
✅ Checkpoint 1: Screenshot showing both your browser with the JSON response AND your EC2 terminal with the corresponding log entry.
- In the AWS Console, navigate to Simple Notification Service (SNS)
- Click Topics in the left sidebar
- Click Create topic
- Select Standard type
- Name:
DS5220 - Display name:
DS5220 - Leave other settings as default
- Click Create topic
Note the Topic ARN - you'll need this for configuration.
- Within your newly created topic, click Create subscription
- Configure the subscription:
- Protocol: HTTP (not HTTPS)
- Endpoint:
http://YOUR-EC2-PUBLIC-IP/data - Leave other settings as default
- Click Create subscription
After creating the subscription:
- Check your EC2 terminal - you should see a
SubscriptionConfirmationmessage appear in the logs - Look for the
SubscribeURLin the output - Copy the entire SubscribeURL (it will be very long)
- Open the URL in a new browser tab to confirm the subscription
- Return to the SNS console and verify the subscription status changes from "Pending confirmation" to "Confirmed"
✅ Checkpoint 2: Screenshot of your SNS subscription showing "Confirmed" status.
Create a new S3 bucket:
- Navigate to S3 in the AWS Console
- Click Create bucket
- Bucket name:
YOUR-COMPUTING-ID-data(e.g.,mst3k-data) - Region: Same as your EC2 instance (likely
us-east-1) - Leave other settings as default
- Click Create bucket
- Click into your new bucket
- Select the Properties tab
- Scroll down to Event notifications
- Click Create event notification
Configure the event:
- Event name:
NewCSVFile - Prefix: (leave blank to watch entire bucket)
- Suffix:
.csv - Event types: Check All object create events
- Destination: Select SNS topic
- SNS topic: Select
DS5220from the dropdown - Click Save changes
Note: If you receive a permissions error, AWS will automatically add the necessary bucket policy to allow S3 to publish to SNS.
✅ Checkpoint 3: Screenshot of your S3 event notification configuration.
On your local machine, create a test script:
#!/bin/bash
# Create a CSV file with test data
cat > test_upload_$(date +%s).csv << 'EOF'
name,age,city,occupation
Alice Johnson,28,Seattle,Software Engineer
Tanya Smith,35,Austin,Data Scientist
Nina Vanayasi,42,Boston,Product Manager
Carlos Rodriguez,31,Denver,DevOps Engineer
EOF
# Upload to S3 (replace with your bucket name)
aws s3 cp test_upload_*.csv s3://YOUR-BUCKET-NAME/Make it executable and run it:
chmod +x test_upload.sh
./test_upload.shSwitch back to your EC2 terminal where the FastAPI app is running. You should see:
- A
Notificationmessage type - Parsed S3 event details including:
- Event name (ObjectCreated:Put)
- Bucket name
- Object key (filename)
- Event timestamp
Example output:
================================================================================
Received SNS Message:
{
"Type": "Notification",
"MessageId": "b2550f11-729e-5eb4-bbdc-6d16858ccdf4",
"TopicArn": "arn:aws:sns:us-east-1:440848399208:ds5220",
"Message": "{\"Records\":[{...}]}"
...
}
================================================================================
NOTIFICATION RECEIVED
Event: ObjectCreated:Put
Bucket: mst3k-data
Object: test_upload_1707753729.csv
✅ Checkpoint 4: Screenshot of your EC2 terminal showing the complete notification with bucket name and object key clearly visible.
Test with different file types and observe what happens:
# This should trigger a notification
echo "test,data,here" > data.csv
aws s3 cp data.csv s3://YOUR-BUCKET-NAME/
# This should NOT trigger a notification (wrong file extension)
echo "test data" > data.txt
aws s3 cp data.txt s3://YOUR-BUCKET-NAME/
# This should trigger a notification
mkdir -p subfolder
echo "more,test,data" > subfolder/nested.csv
aws s3 cp subfolder/nested.csv s3://YOUR-BUCKET-NAME/subfolder/✅ Checkpoint 5: Explain in 2-3 sentences why the .txt file did not trigger a notification.
Modify your main.py to add the following functionality:
Update the /data endpoint to:
- Extract the bucket name and object key from the S3 event
- Download the CSV file from S3 using boto3
- Parse the CSV and count the number of rows
- Log the row count to the console
You'll need to:
- Install boto3:
sudo pip3 install boto3 - Ensure your EC2 instance has an IAM role with S3 read permissions
Hint: Use boto3's S3 client:
import boto3
s3 = boto3.client('s3')
obj = s3.get_object(Bucket=bucket_name, Key=object_key)✅ Checkpoint 6:
- Submit your updated
main.pycode - Screenshot showing the console output with row count after uploading a CSV
Add error handling for:
- Missing or malformed S3 events
- Files that don't exist
- CSV parsing errors
✅ Checkpoint 7: Demonstrate your error handling by uploading a malformed CSV file and showing how your application handles it gracefully.
Answer the following questions (3-5 sentences each):
-
Event-Driven vs. Polling: Compare this event-driven architecture to a polling-based approach where your application checks S3 every 30 seconds for new files. What are the trade-offs?
-
Scalability: How would this architecture handle 1,000 CSV files being uploaded simultaneously? What components might become bottlenecks?
-
Message Delivery: SNS provides "at-least-once delivery" which means messages might be delivered more than once. How would you modify your application to handle duplicate messages? (Hint: consider idempotency)
-
Real-World Applications: Describe a real-world data engineering scenario where this event-driven pattern would be beneficial. What types of data processing would you trigger?
-
Security Considerations: Currently, your EC2 instance accepts HTTP traffic from anywhere (0.0.0.0/0) on port 80. What are the security implications? How could you restrict this while still allowing SNS to deliver messages?
Submit a single PDF document containing:
- All seven checkpoints (screenshots and code as specified)
- Answers to all five reflection questions
- Your final
main.pycode (copy/paste into document) - A brief summary (1 paragraph) of what you learned and any challenges you encountered.
To avoid unnecessary charges:
- Delete your S3 bucket (and all objects within it)
- Delete your SNS topic
- Delete your SNS subscription
- Terminate your EC2 instance
- Check that port 80 is open in your security group to 0.0.0.0/0
- Verify your FastAPI application is running
- Check that you're using HTTP (not HTTPS) for the endpoint
- Look at EC2 terminal logs for incoming requests
- Verify the file extension matches your suffix filter (.csv)
- Check that the SNS subscription is "Confirmed"
- Ensure your S3 event notification is configured correctly
- Review CloudWatch Logs for SNS delivery failures
- Ensure your EC2 instance has an IAM role attached
- The IAM role needs
s3:GetObjectpermission for your bucket - Verify boto3 is installed:
pip3 list | grep boto3
- Confirm security group allows inbound traffic on port 80
- Verify you're using HTTP (not HTTPS) in your browser
- Check that uvicorn is running:
sudo netstat -tlnp | grep 80