Using the AWS command-line tools, practice with the following exercises:
Make a bucket:
aws s3 mb s3://<USERID>-data
# for example s3://nem2p-data
List all your buckets:
aws s3 ls
Copy a single file from the instructor's bucket into your own:
aws s3 cp s3://uvasds-data/taxi/yellow_tripdata_2025-11.parquet s3://nem2p-ds5220-data
Copy all files matching a pattern from the instructor's bucket into your own:
aws s3 cp s3://uvasds-data/taxi/ s3://nem2p-ds5220-data --recursive --exclude "*" --include "*.parquet"
List all files in a bucket or subfolder of a bucket:
aws s3 ls s3://nem2p-ds5220-data/
Sync objects from a source to a destination
aws s3 sync SOURCE DESTINATION
# Sync objects from a source and remove files on the destination
# that do not exist in the source
aws s3 sync . s3://amzn-s3-demo-bucket --delete
Presign a URL to a private file that expires in 60 seconds:
# know the S3 URI for a known file that already exists in your bucket
aws s3 presign --expires-in 60 s3://nem2p-ds5220-data/yellow_tripdata_2025-11.parquet
This returns a signed URL to the object: https://nem2p-ds5220-data.s3.us-east-1.amazonaws.com/yellow_tripdata_2025-11.parquet?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAWNJE4XNUL4LXBHTQ%2F20260210%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20260210T195611Z&X-Amz-Expires=60&X-Amz-SignedHeaders=host&X-Amz-Signature=1faed501ebe70dfc54a0fdf7f7b7db27cc2709cb766e5778c1a1442dc82faaff
-- set up credentials
D SET s3_use_ssl=true;
D CALL load_aws_credentials();
-- a simple select from an S3 object using duckdb
-- UPDATE this s3 URI to the bucket you own
D select * from 's3://uvasds-data/taxi/yellow_tripdata_2025-11.parquet';CREATE VIEW my_s3_data AS
SELECT * FROM 's3://bucket/data/*.parquet';
-- Now query the view
SELECT * FROM my_s3_data WHERE condition;-- select from all objects matching a pattern
-- UPDATE this s3 URI to the bucket you own
select count(*) from 's3://uvasds-data/taxi/*.parquet';Other
-- Hive-partitioned data
SELECT * FROM 's3://bucket/data/year=*/month=*/*.parquet';