Alarms

Dashboards are useful tools, but for production use cases you want alarms triggered automatically if our metrics cross certain thresholds. That way you can alert your operations team, open tickets in your operations systems, or even start an automated response in some cases.

Amazon CloudWatch alarms

As a quick review, Amazon CloudWatch alarms trigger based on individual metrics or combinations of metrics. Alarms send notifications to SNS topics. The SNS topic will notify recipients via email, SMS, or via an HTTP endpoint. You can also have the topic send the notification to an Amazon SQS queue or invoke a AWS Lambda function for additional automation. For example, a Lambda function could initiate a scaling action. (Amazon SQS is a fully managed message queuing service, and AWS Lambda is a serverless computing service.)

Setting thresholds

You can set an alarm trigger threshold for any Amazon CloudWatch metric. The thresholds will vary on your use case. For example, a read-heavy workload may see consistently high load on the read replicas and a lightly used primary node. Write-heavy workloads would see the opposite.

Here are a few suggestions for initial alarm thresholds, but be sure to adjust these thresholds based on your application.

  • CPUUtilization over 80%
  • DatabaseConnectionsMax over 90% of the limit
  • DatabaseCursorsMax over 90% of the limit
  • VolumeBytesUsed over 85% of the 64 TiB limit
  • NetworkThroughput over 90% of the sustainable limit
  • DBClusterReplicaLagMaximum consistently over 10 seconds
  • Buffer cache hit ratio under 90%
  • Index buffer cache hit ratio under 90%

Creating your first alarm

You will set up one alarm manually in this chapter and look at automated alarm creation in the next chapter.

First, go to the Amazon CloudWatch console and make sure you are in the correct region. Then go to the Alarms section and click Create alarm.

Amazon CloudWatch Alarms

Click Select metric.

Amazon CloudWatch Alarm Metric

Now, just as when you set up a dashboard, select the metric you want to alarm on. Go into the DocDB namespace and select the dimension Cluster Metrics by Role for getting-started-with-documentdb database. In this example choose CPUUtilization on the read replicas, as it’s fairly easy to artificially trigger this alarm for testing.

Amazon CloudWatch Alarm Metric Selection

On the next page, configure the alarm. First, select the metric aggregation and period, which defaults to the average over 5 minutes. Change the period to 1 minute.

Amazon CloudWatch Alarm Configuration

On the second half of the screen, you specify how the alarm triggers. A static value triggers on a specific limit, such as CPU utilization exceeding 80%. An anomaly detection band will trigger based on expected ranges of the metric. For this example, let’s set it to trigger when the reader CPU utilization exceeds 10% once over a minute (10% is an artificially low threshold, used to generate an alarm).

Amazon CloudWatch Alarm Configuration

On the next screen, you determine what happens when the alarm triggers. For the sake of a quick test, choose to have a new SNS topic configured and specify a valid email address to receive the notification.

Amazon CloudWatch Alarm SNS

Click Create Topic and then Next. Finally, give the alarm a name.

Amazon CloudWatch Alarm Finish

On the final preview page, click Create alarm. After a few minutes you will receive an SNS confirmation email. You need to confirm the subscription in order to receive the alarm emails.

Triggering the alarm

In order to see the alarm in action, connect to the Cloud9 IDE and introduce some read-heavy load. You will use the Yahoo! Cloud Serving Benchmark (YCSB), a database benchmarking framework, to run workloads on the database cluster.

Navigate to the AWS Cloud9 management Console and choose open IDE to launch the AWS Cloud9 environment.

Prepare the IDE

You will need to resize the cloud9 volumes. Please run following command to finish resizing:

wget https://s3.amazonaws.com/ee-assets-prod-us-east-1/modules/c55fc8f9e8cf4231b0c09a7a493fdf78/v1/nested/resize.sh
chmod +x resize.sh
sh resize.sh 40

Next, upgrade to JDK 8 (required by YCSB):

sudo yum install java-1.8.0-openjdk-devel -y
sudo alternatives --config java

select JDK 8

sudo yum remove -y java-1.7.0-openjdk-devel

Prepare the certificate store

Since TLS is used for the Amazon DocumentDB connection, you need to prepare a Java keystore for YCSB. Then run the following commands:

mkdir /tmp/certs
wget https://s3.amazonaws.com/ee-assets-prod-us-east-1/modules/c55fc8f9e8cf4231b0c09a7a493fdf78/v1/nested/cert.sh
chmod +x cert.sh
./cert.sh

Configure YCSB

Now install YCSB.

curl -O --location https://github.com/brianfrankcooper/YCSB/releases/download/0.17.0/ycsb-0.17.0.tar.gz
tar xfvz ycsb-0.17.0.tar.gz
cd ycsb-0.17.0

Next, edit the file bin/ycsb to set the Java keystore properties. Open this file in an editor on Cloud9, and change the ycsb_command on line 331:

ycsb_command = ([java] + args.jvm_args +
                ['-Djavax.net.ssl.trustStore=/tmp/certs/rds-truststore.jks','-Djavax.net.ssl.trustStorePassword=changeit'] +
                ["-cp", classpath,
                 main_classname, "-db", db_classname] + remaining)

Run load test

Run the YCSB load test. Use the workloadb workload, which is a read-heavy workload.

python2 ./bin/ycsb load mongodb -s -P workloads/workloadb -p recordcount=100000 -p mongodb.url=mongodb://$docdbUser:$docdbPass@$docdbEndpoint:27017/?ssl=true&replicaSet=rs0&readPreference=secondaryPreferred&retryWrites=false > load.dat

After several minutes the alarm will trigger and you will receive an email. You should either delete the alarm or raise the threshold so you won’t get repeated alarms; 10% is a very low threshold for CPU utilization.

Additional alarms

You can add more alarms for other metrics. Beyond alarms on single metrics, you can use an anomaly detection capability to trigger whenever the metric is outside of a normal range. Or, you can set a compound alarm that triggers when multiple conditions are met. For example, you may want to send an alarm email when both CPU and memory usage are high, but not when only one is high.