Run analytics with Athena on DocumentDB events

Before you can start querying Amazon DocumentDB events in the S3 bucket, complete the following steps to create a AWS Glue crawler to crawl S3 bucket.

  1. In the AWS services console, search for Glue.

Glue1

  1. On the AWS Glue menu, select Crawlers. Choose Add crawler. Glue2

  2. Enter crawler-change-streams as the crawler name for initial data load. Optionally, enter a description. Choose Next. Glue3

  3. Choose Data stores, choose Crawl all folders, and choose Next. Glue4

  4. On the Add a data store section, make the following selections:

    • For Choose a data store, click the drop-down box and select S3.
    • For Crawl data in, select Specified path in my account.
    • For Include path, browse to the sampledb folder in the S3 bucket you created earlier (e.g. s3://<CHANGE_STREAMS_BUCKET>/sampledb).
    • Leave everything else as default.

    Choose Next. Glue5

  5. On the Add another data store section, select No and choose Next

  6. On the Choose an IAM role section, make the following selections:

    • Select Create an IAM role.
    • For IAM role, enter change-streams

    Choose Next. Glue11

  7. On the Create a schedule for this crawler section, for Frequency select Run on demand and choose Next.

  8. On the Configure the crawler’s output section, choose Add database to create a new database for our Glue Catalogue. Glue8

  9. Enter change-streams as your database name, leave everything else as is, and choose Create. choose Next Glue9

  10. Review the summary page noting the Include path (Data Stores section) and Database (Output section). Choose Finish. The crawler is now ready to run.

  11. Select the crawler-change-streams crawler and choose the Run crawler button. Glue10

The crawler will change status from Starting to Stopping. Wait until the crawler status changes to Ready (the process will take a few minutes). You can see that it has added 1 table.

Query Data with Amazon Athena

  1. In the AWS services console, search for Athena. Athena1
  2. If you are using Amazon Athena for the first time, choose Get Started on the introduction screen. Athena2
  3. Choose set up a query result location in Amazon S3. Athena3
  4. Choose the Select icon to the right of the Query result location text field. Choose the S3BucketFunctionCode bucket that was created when you deployed the CloudFormation template in the Account setup section portion of the lab (e.g. s3://< S3BucketReplicationCode >-). Choose Select.
  5. Append athena/ to the end of the S3 bucket name. Choose Save Athena4
  6. In the Query Editor, select your newly created database e.g., “change-streams
  7. Choose the table named sampledb to inspect the fields
  8. Choose on the three dots to the right of sampledb and select Preview table Athena5

You will see the data inside the s3 bucket.

Athena6