A serverless microservice that provides secure access to package assets and data files in the Pennsieve platform. Built with AWS Lambda, Go, and CloudFront for high-performance, scalable data access.
The Packages Service handles:
- Package restoration from deleted/archived state
- Presigned URL generation for secure S3 access
- CloudFront signed URLs for optimized content delivery
- Unauthenticated proxy for cross-origin requests
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ API Gateway │───▶│ Lambda Service │───▶│ RDS Postgres │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
│ ┌─────────────────┐ │
│ │ CloudFront │ │
│ │ (Signed URLs) │ │
│ └─────────────────┘ │
│ │ │
└───────────────────────┼───────────────────────┘
│
┌─────────────────┐
│ S3 Package │
│ Assets Bucket │
└─────────────────┘
Restores deleted packages and collections from archived state.
Authentication: Required (dataset-level permissions)
Request:
POST /packages/restore?dataset_id=N:dataset:123
{
"nodeIds": ["N:package:456", "N:collection:789"]
}Response:
{
"success": ["N:package:456"],
"failures": [
{"id": "N:collection:789", "error": "not found"}
]
}Generates presigned URLs for direct S3 access to package files and viewer assets.
Authentication: Required (dataset-level permissions)
Request:
GET /packages/presign/s3?dataset_id=N:dataset:123&package_id=N:package:456&path=preview/thumbnail.jpgResponse: HTTP 307 redirect to presigned S3 URL
Query Parameters:
dataset_id(required): Dataset node IDpackage_id(required): Package node IDpath(optional): Path to viewer asset within packageredirect(optional):falseto return JSON instead of redirect
Generates CloudFront signed URLs for optimized content delivery with CDN caching.
Authentication: Required (dataset-level permissions)
Request:
GET /packages/cloudfront/sign?dataset_id=N:dataset:123&package_id=N:package:456&path=data.parquetResponse:
{
"signed_url": "https://d1234567890.cloudfront.net/O1/D123/P456/data.parquet?Expires=1699123456&Signature=...",
"expires_at": 1699123456
}Features:
- 1-hour URL expiration
- Optimized caching for Parquet files (30 days)
- Private distribution (signed URLs required)
- Better performance than direct S3 access
Unauthenticated proxy for S3 requests using presigned URLs. Handles CORS for browser clients.
Authentication: None required (validates presigned URL)
Request:
GET /packages/proxy/s3?presigned_url=https://bucket.s3.amazonaws.com/key?X-Amz-Signature=...Response: HTTP 307 redirect to presigned URL with CORS headers
Features:
- Validates presigned URL signatures
- Bucket allowlist support via
PROXY_ALLOWED_BUCKETSenv var - CORS headers for browser compatibility
- HEAD request support for metadata
-
Service Lambda (
lambda/service/)- Main API handler for all endpoints
- Handles authentication and authorization
- Manages database connections and S3 operations
-
Restore Lambda (
lambda/restore/)- Background processing for package restoration
- Triggered by SQS messages from service lambda
- Updates package states and metadata
- Private distribution requiring signed URLs
- Origin Access Control (OAC) for secure S3 access
- Optimized caching for different file types:
- Parquet files: 30-day cache
- General files: 24-hour cache
- Geographic distribution with PriceClass_100 (US, Canada, Europe)
- Package viewer assets storage
- Intelligent Tiering for cost optimization
- Versioning enabled with 30-day cleanup
- Private access (CloudFront and service lambda only)
- Server-side encryption with AES256
- PostgreSQL via RDS Proxy for connection pooling
- Organization-based schema (
"{org_id}".packages,"{org_id}".datasets) - Package ownership validation through dataset relationships
- JWT token validation via shared authorizer lambda
- Dataset-level permissions enforced for all operations
- Package ownership verification through database queries
- S3 bucket policy denies direct access
- CloudFront signed URLs with time-based expiration
- Presigned URL validation in proxy endpoint
- Origin Access Control for CloudFront-to-S3 communication
- CloudFront signing keys stored in AWS SSM Parameter Store
- Private key encrypted with KMS SecureString
- Keys fetched once during Lambda cold start (not per request)
- Manual key rotation through AWS Console
- Go 1.23+
- Docker & Docker Compose
- AWS CLI configured
- PostgreSQL client (for local development)
# Run tests locally
make test
# Run CI tests (Docker-based)
make test-ci
# Start local services
make local-services
# Build Lambda packages
make package
# Deploy to S3
make publishThe service includes comprehensive tests covering:
- Unit tests for individual handlers
- Integration tests with PostgreSQL, MinIO, and DynamoDB
- Authorization tests for cross-dataset access
- CORS and proxy functionality tests
| Variable | Description | Required |
|---|---|---|
ENV |
Environment name (dev/prod) | ✓ |
PENNSIEVE_DOMAIN |
API domain | ✓ |
RDS_PROXY_ENDPOINT |
Database connection endpoint | ✓ |
VIEWER_ASSETS_BUCKET |
S3 bucket for package assets | ✓ |
CLOUDFRONT_DISTRIBUTION_DOMAIN |
CloudFront distribution domain | ✓ |
CLOUDFRONT_KEY_ID |
CloudFront public key ID | ✓ |
CLOUDFRONT_PRIVATE_KEY_SSM_PARAM |
SSM parameter name for private key | ✓ |
PROXY_ALLOWED_BUCKETS |
Comma-separated list of allowed S3 buckets | - |
RESTORE_PACKAGE_QUEUE_URL |
SQS queue for restore operations | ✓ |
The service is deployed using:
- Terraform for infrastructure provisioning
- AWS Lambda for serverless compute
- API Gateway for HTTP routing and authentication
- CloudFormation for resource orchestration
- Deploy with dummy keys (CI/CD safe):
terraform applyThe deployment uses secure dummy keys by default, allowing all infrastructure to be created automatically.
- Replace with production keys (after deployment):
# Generate real RSA key pair
cd terraform
./generate-cloudfront-keys.sh
# Update SSM parameters with real keys
aws ssm put-parameter \
--name "/{environment}/{service}/cloudfront/private-key" \
--value "$(cat .cloudfront-keys/private_key_base64.txt)" \
--type "SecureString" \
--overwrite
aws ssm put-parameter \
--name "/{environment}/{service}/cloudfront/public-key" \
--value "$(cat .cloudfront-keys/public_key.pem)" \
--type "String" \
--overwrite
# Create new CloudFront public key and update key group
aws cloudfront create-public-key \
--public-key-config Name="pkg-assets-{environment}-key",CallerReference="pkg-assets-$(date +%s)",EncodedKey="$(cat .cloudfront-keys/public_key.pem)" \
--query 'PublicKey.Id' --output text- Update CloudFront key group with the new public key ID:
# Get the new public key ID from the previous command, then update key group
NEW_KEY_ID="<public-key-id-from-previous-command>"
aws cloudfront update-key-group \
--id "$(terraform output -raw cloudfront_key_group_id)" \
--key-group-config Items="$NEW_KEY_ID",Name="package-assets-{environment}-key-group",Comment="Key group for package assets CloudFront signed URLs"- Generate keys locally:
cd terraform
./generate-cloudfront-keys.sh- Deploy with local keys (if overriding variables):
terraform apply -var="cloudfront_public_key_pem=$(cat .cloudfront-keys/public_key.pem)" \
-var="cloudfront_private_key_base64=$(cat .cloudfront-keys/private_key_base64.txt)"Security Note: The
.cloudfront-keys/directory is gitignored. Never commit real signing keys to version control. The dummy keys in variables.tf are safe for CI/CD and testing but should be replaced with real keys in production environments.
- CloudWatch Logs for Lambda execution logs
- X-Ray tracing for request flow analysis
- CloudWatch Metrics for performance monitoring
- Structured logging with request IDs for correlation
- Follow Go conventions and gofmt formatting
- Add tests for new functionality
- Update API documentation in
terraform/packages-service.yml - Ensure all tests pass:
make test && make test-ci - Update this README for significant changes