This is a serverless Python application that helps you to push data from Kinesis Data Firehose to your MongoDB Atlas cluster. This project utilizes AWS Lambda as a resolver to push data into the Atlas cluster. The data flows in the Lambda from an API Gateway through the medium of a Kinesis Data Firehose stream.
Before proceeding, ensure you have the following prerequisites in place:
- Install AWS CLI
- Create IAM User for AWS CLI, Create Access Keys & secret keys
- Configure AWS CLI using
aws configure
with Account Id, Access Key, Secret Key, and Region - Install SAM
- This application requires a minimum version of Python 3.9 to run. You can install Python 3.9 from Install Python
- A Firehose Stream that will help you move data from source to destination. For this case, our source is
Direct PUT
and destination isHTTP Endpoint
. To create your Firehose stream, refer documentation
You have to create a Kinesis Data Firehose stream that will help in moving the data from configured source to destination:
- Click here to go to Kinesis Data Firehose console
- Click on
Create Firehose stream
- Select your desired source of data using the
Source
dropdown - Select
HTTP Endpoint
option in theDestination
dropdown - Enter the API Gateway URL that we created in the previous step in the
HTTP Endpoint URL
field - Copy the API Key value generated in the previous step and paste it in the
Access Key
field - Under the
Backup Settings
section, configure a S3 bucket to store the source record backup if the data transformation doesn't produce the desired results
- Go to Lambda section in your AWS console
- Click on the
Applications
section present on the left navigation bar and then click on Create application - Type MongoDB-Firehose-Ingestion-App in the search bar and check the "Show apps that create custom IAM roles or resource policies" checkbox
- Fill in the required information and click on Deploy
- Go to Outputs section of your stack in the cloudformation console and check the outputs of the resources deployed. Keep this tab open
- Copy the API Key ID mentioned in the Value column alongside the ApiKeyValue key. Go to the Lambda > Select the Authorizer Lambda function > Environment Variables > Paste the copied API Key ID there
- Go to API Gateway console, click on Resources > ANY > Click on Edit under Method request settings. Disable the
API key required
flag. After saving the changes, Deploy API forProd
stage for the changes to take effect - Copy the API Gateway endpoint URL for Prod stage from the cloudformation Outputs section and copy the API Key value from the API Keys section in the API Gateway
- Go to the Firehose console and click on the stream that you created then go to Configuration > Under the Destination Settings section click on Edit. Paste the API Key Value and the API Gateway Web Endpoint in the fields highlighted below and click on Save changes
- In your Firehose stream, click on Start sending demo data under the Test with demo data section
- Go to your MongoDB Atlas cluster and check whether you're able to see the records being inserted in your collection
- For demo purposes, we have allowed access from anywhere
(0.0.0.0/0)
under the Network Access section of MongoDB Atlas Project. We would strictly not recommend this for production scenarios. For production usage, kindly establish a Private Endpoint.