Working with Amazon S3

Using the AWS command-line tools, practice with the following exercises:

Make a bucket:

aws s3 mb s3://<USERID>-data
# for example s3://nem2p-data

List all your buckets:

aws s3 ls

Copy a single file from the instructor's bucket into your own:

aws s3 cp s3://uvasds-data/taxi/yellow_tripdata_2025-11.parquet s3://nem2p-ds5220-data

Copy all files matching a pattern from the instructor's bucket into your own:

aws s3 cp s3://uvasds-data/taxi/ s3://nem2p-ds5220-data --recursive --exclude "*" --include "*.parquet"

List all files in a bucket or subfolder of a bucket:

aws s3 ls s3://nem2p-ds5220-data/

Sync objects from a source to a destination

aws s3 sync SOURCE DESTINATION

# Sync objects from a source and remove files on the destination
# that do not exist in the source

aws s3 sync . s3://amzn-s3-demo-bucket --delete

Presign a URL to a private file that expires in 60 seconds:

# know the S3 URI for a known file that already exists in your bucket
aws s3 presign --expires-in 60 s3://nem2p-ds5220-data/yellow_tripdata_2025-11.parquet

This returns a signed URL to the object: https://nem2p-ds5220-data.s3.us-east-1.amazonaws.com/yellow_tripdata_2025-11.parquet?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAWNJE4XNUL4LXBHTQ%2F20260210%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20260210T195611Z&X-Amz-Expires=60&X-Amz-SignedHeaders=host&X-Amz-Signature=1faed501ebe70dfc54a0fdf7f7b7db27cc2709cb766e5778c1a1442dc82faaff

DuckDB

Install the DuckDB CLI

Remote Queries

-- set up credentials
D SET s3_use_ssl=true;
D CALL load_aws_credentials();

-- a simple select from an S3 object using duckdb
-- UPDATE this s3 URI to the bucket you own
D select * from 's3://uvasds-data/taxi/yellow_tripdata_2025-11.parquet';

Reusable Views

CREATE VIEW my_s3_data AS 
SELECT * FROM 's3://bucket/data/*.parquet';

-- Now query the view
SELECT * FROM my_s3_data WHERE condition;

Glob Patterns

-- select from all objects matching a pattern
-- UPDATE this s3 URI to the bucket you own
select count(*) from 's3://uvasds-data/taxi/*.parquet';

Other

-- Hive-partitioned data
SELECT * FROM 's3://bucket/data/year=*/month=*/*.parquet';

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Working with Amazon S3

DuckDB

Remote Queries

Reusable Views

Glob Patterns

FilesExpand file tree

S3.md

Latest commit

History

S3.md

File metadata and controls

Working with Amazon S3

DuckDB

Remote Queries

Reusable Views

Glob Patterns