Monitor the Primary Instance

To get started, let’s quickly review the metrics to track for the Amazon DocumentDB primary (writer) node. The definitive monitoring reference is the monitoring chapter in the Amazon DocumentDB Developer Guide. That guide points out several types of metrics you should consider:

  • CPU, RAM, and IOPS. These metrics can be consistently high depending on your applications. Monitor so you understand baseline levels.
  • Storage volume consumption. Your storage volume consumption should not exceed 85% of the 64 TiB space limit.
  • Network traffic. You should understand the network traffic patterns for your database and database clients. If your network traffic exceeds the available throughput for your database instances, you may see performance impacts.
  • Database connections. Monitor steady-state and peak database connections and put in safeguards if a high connection count is causing performance degradation.

Note that the goal of monitoring is simply to provide information that you can act on. If you see concerning patterns in your database metrics, you have several options, including resizing the database, adding a caching layer, or limiting the number of concurrent connections.

Understanding Amazon CloudWatch metrics

As a quick review, Amazon CloudWatch stores metrics for up to 15 months. Metrics are time-ordered sets of data points. Metrics have a name, a namespace, and optionally dimensions. Dimensions are categories that help you refine metrics of interest.

The diagram below shows a view of a single Amazon CloudWatch metric, ReadLatency. You may choose to see this metric for a single database instance, for all the read replicas in a cluster, or for an entire cluster.

Amazon CloudWatch Metrics

The Amazon CloudWatch Concepts documentation has more information on metrics and dimensions.

Choosing metrics

For this workshop, you will monitor several metrics on the primary node:

  • CPU, RAM, and IOPS
    • CPUUtilization
    • FreeableMemory
    • VolumeWriteIOPs
    • VolumeReadIOPs
  • Storage volume consumption
    • VolumeBytesUsed
  • Network traffic
    • NetworkThroughput
  • Database connections and cursors
    • DatabaseConnections
    • DatabaseConnectionsMax
    • DatabaseCursors
    • DatabaseCursorsMax
  • Others
    • ReadLatency
    • WriteLatency
    • EngineUptime
    • OpcountersCommand
    • BufferCacheHitRatio
    • IndexBufferCacheHitRatio

There are several other metrics available, but this set is a good basic start.

Using the built-in dashboard

On the Amazon DocumentDB console, you will find a built-in monitoring dashboard for each cluster by navigating to the Clusters part of the console and clicking on the cluster identifier.

Amazon DocumentDB cluster dashboard

This built-in cluster dashboard shows several metrics in these categories:

  • Resource utilization
  • Throughput
  • Latency
  • Operations
  • System

The screenshot below shows the first two rows of the Resource utilization section for a cluster.

Amazon DocumentDB cluster dashboard

Similarly, you will find an instance monitoring dashboard by navigating to the Instances section of the console and clicking on an instance identifier. The instance metrics shown fall under Resource Utilization, Throughput, Latency, Operations, and System.

Amazon DocumentDB instance dashboard

Setting up our first custom dashboard

You will set up an Amazon CloudWatch dashboard for the primary node manually now, and see how to automate the process in a later chapter.

First, go to the Amazon CloudWatch console and make sure you are in the correct region. Now go to the Dashboards section.

Amazon CloudWatch Console

On the Dashboards page, click Create dashboard.

Amazon CloudWatch Dashboards

Give your dashboard a name.

Amazon CloudWatch Dashboard Name

Select a type of visualization (widget). For most metrics, the Line widget is a good place to start.

Amazon CloudWatch Dashboard Widget

Our data source is Metrics.

Amazon CloudWatch Dashboard Widget Source

Now you can add a graph to the dashboard. On the next page, start by looking at All metrics, then enter DocDB in the search field to find all of the metrics available in the DocDB namespace.

Amazon CloudWatch Dashboard Graph Namespace

Select DocDB > Cluster Metrics by Role. On the next page, find the CPUUtilization metric in the WRITER dimension.

Amazon CloudWatch Dashboard Graph Namespace

On the next page, give the graph a title by clicking on the pencil icon near the top of the page. Then select the Graphed metrics tab and review the options for our metric. By default the widget will display an average over a 5-minute period, but you can choose a different time frame or a different aggregation.

Amazon CloudWatch Dashboard Graph Metric

Finally, click Create Widget.

Adding more widgets

Go back to the list of all dashboards and select the dashboard you just created.

Amazon CloudWatch Dashboard Writer Node

You can adjust the time frame the dashboard shows, which defaults to 3 hours, by accessing the options menu in the top right corner. Feel free to explore the other options.

At this point you can click the Add Widget button and add graphs for the other metrics discussed earlier. You will see how to automate that process in a later chapter; for now, try adding just one or two additional metrics.

You can review the Amazon CloudWatch Metrics documentation to learn about other concepts such as metric math, which lets you produce metrics that are a combination of other metrics.