ANALYTICS & ARCHIVING WITH AMAZON DOCUMENTDB CHANGE STREAMS

Prerequisites : This lab requires Prerequisites to be completed before you can continue. Please continue if you have already completed the prerequisites.

The change streams feature in Amazon DocumentDB (with MongoDB compatibility) provides a time-ordered sequence of change events that occur within your cluster’s collections. You can read events from a change stream to implement many different use cases, including the following:

  • Change notification
  • Full-text search with Amazon Elasticsearch Service (Amazon ES)
  • Analytics with Amazon Redshift or Amazon S3/Athena
  • Archiving to Amazon S3

Change stream events are ordered as they occur on the cluster and are stored for 3 hours (by default) after the event has been recorded. The retention period can be extended up to 7 days using the change_stream_log_retention_duration parameter.

In this lab you will integrate Amazon DocumentDB with Amazon S3. This setup can also be used for archiving purposes using Amazon S3 lifecycle management. We will show you how to use an AWS Lambda function to stream events from your Amazon DocumentDB cluster’s change stream to an Amazon S3 bucket and run queries on that data with Amazon Athena. To automate the solution, we use Amazon EventBridge to trigger a message every 120 seconds to Amazon Simple Notification Service (Amazon SNS), which invokes the Lambda function on a schedule.

The deployment architecture that you will be building in this lab will look like the below.

Final Deployment Architecture