Streaming DocumentDB events to Amazon S3

Be sure the Cloud Formation Stack in previous step is completely deployed. To test the deployment, complete the following steps:

  1. In your AWS Cloud9 environment, enter the following code to insert sample data into your Amazon DocumentDB cluster. For the purposes of this walkthrough, we insert a few tweets from New Year’s Eve in 2014:

    export USERNAME=`jq -r '.[] | select(.OutputKey == "DocDBUser") | .OutputValue' jsonData.json`
    export PASSWORD=`jq -r '.[] | select(.OutputKey == "DocDBPassword") | .OutputValue' jsonData.json`
    export DOCDB_ENDPOINT=`jq -r '.[] | select(.OutputKey == "ClusterEndpoint") | .OutputValue' jsonData.json`
       
    python es-test.py
    
  2. Validate that documents were inserted by using the Mongo shell to connect to your DocumentDB cluster and read the documents you just inserted:

    mongo --ssl --host $docdbEndpoint:27017 --sslCAFile rds-combined-ca-bundle.pem --username $docdbUser --password $docdbPass
    use sampledb
    db.tweets.find()
    

    Log out from the DocumentDB cluster:

    quit()
    

    After the data is inserted into your Amazon DocumentDB cluster, it will automatically be streamed to your Amazon S3 bucket when the Lambda function runs. To automate the solution, we use Amazon EventBridge to trigger a message every 120 seconds to Amazon Simple Notification Service (Amazon SNS), which invokes the Lambda function on a schedule.

  3. After the Lambda function is triggered, validate the events have been streamed by entering the following commands from your workspace:

    aws s3 ls s3://$S3BucketEvents
    

    You should see that a new folder is created with the data from your Amazon DocumentDB cluster. If not, wait a couple minutes for the Lambda function to run since it is schedule to do so every 120 seconds.

You can also use change streams to integrate Amazon DocumentDB with other AWS services. For example, you can replicate change stream events to Amazon Managed Streaming for Apache Kafka (or any other Apache Kafka distro), Amazon Kinesis Data Streams, and Amazon Simple Queue Service (Amazon SQS).