Archiving/Retaining DocumentDB events

To delete a record from your Amazon DocumentDB cluster, go back to the Cloud9 environment and run the following commands:

mongo --ssl --host $docdbEndpoint:27017 --sslCAFile rds-combined-ca-bundle.pem --username $docdbUser --password $docdbPass
use sampledb
db.tweets.findOneAndDelete({"tweet_id" : NumberLong("550363000000000000"),"tweet_location":"New Jersey"},{projection:{"_id":1}});

You should get an output similar to the following. Please make a note of this “_id” as you will use this later in for validation.

{ “_id” : ObjectId(“612939a717d82b167533ed58”) }

You can verify that the record has been deleted from the DocumentDB cluster by running:

db.tweets.findOne({"tweet_id" : NumberLong("550363000000000000"),"tweet_location":"New Jersey"});

Before comparing records in Athena, run the AWS Glue Crawler again as you did in step 12 of the Query events with Athena module. Go back to Athena and run the following command to count records in your DocumentDB cluster:

SELECT count(*) FROM "lambda:docdb".sampledb.tweets;

Note that there is one less record. Now do the same for your s3 bucket:

SELECT count(*) FROM "change-streams"."sampledb";

The s3 bucket will store the original record plus one additional record related to the delete operation you just performed in your DocumentDB cluster. You can run the following query to verify. Use the “_id” returned by the findOneAndDelete query above.

SELECT * FROM "change-streams"."sampledb" where _id = '<Use id from findOneAndDelete>' 

Once you have DocumentDB events in S3, you can manage your objects so that they are stored cost effectively throughout their lifecycle: Managing your storage lifecycle.

For archiving purposes, you can move data from S3 to Glacier; however, it is not recommended to move small objects due to the cost associated with its transition. Before moving DocumentDB events to Glacier, you should consider bundling several events into larger objects; take a look at this sample solution S3 Bundler.