Dashboards are useful tools, but for production use cases you want alarms triggered automatically if our metrics cross certain thresholds. That way you can alert your operations team, open tickets in your operations systems, or even start an automated response in some cases.
As a quick review, Amazon CloudWatch alarms trigger based on individual metrics or combinations of metrics. Alarms send notifications to SNS topics. The SNS topic will notify recipients via email, SMS, or via an HTTP endpoint. You can also have the topic send the notification to an Amazon SQS queue or invoke a AWS Lambda function for additional automation. For example, a Lambda function could initiate a scaling action. (Amazon SQS is a fully managed message queuing service, and AWS Lambda is a serverless computing service.)
You can set an alarm trigger threshold for any Amazon CloudWatch metric. The thresholds will vary on your use case. For example, a read-heavy workload may see consistently high load on the read replicas and a lightly used primary node. Write-heavy workloads would see the opposite.
Here are a few suggestions for initial alarm thresholds, but be sure to adjust these thresholds based on your application.
You will set up one alarm manually in this chapter and look at automated alarm creation in the next chapter.
First, go to the Amazon CloudWatch console and make sure you are in the correct region. Then go to the
Alarms section and click
Now, just as when you set up a dashboard, select the metric you want to alarm on. Go into the
DocDB namespace and select the dimension
Cluster Metrics by Role for
getting-started-with-documentdb database. In this example choose
CPUUtilization on the read replicas, as it’s fairly easy to artificially trigger this alarm for testing.
On the next page, configure the alarm. First, select the metric aggregation and period, which defaults to the average over 5 minutes. Change the period to 1 minute.
On the second half of the screen, you specify how the alarm triggers. A static value triggers on a specific limit, such as CPU utilization exceeding 80%. An anomaly detection band will trigger based on expected ranges of the metric. For this example, let’s set it to trigger when the reader CPU utilization exceeds 10% once over a minute (10% is an artificially low threshold, used to generate an alarm).
On the next screen, you determine what happens when the alarm triggers. For the sake of a quick test, choose to have a new SNS topic configured and specify a valid email address to receive the notification.
Create Topic and then
Next. Finally, give the alarm a name.
On the final preview page, click
Create alarm. After a few minutes you will receive an SNS confirmation email. You need to confirm the subscription in order to receive the alarm emails.
In order to see the alarm in action, connect to the Cloud9 IDE and introduce some read-heavy load. You will use the Yahoo! Cloud Serving Benchmark (YCSB), a database benchmarking framework, to run workloads on the database cluster.
Navigate to the AWS Cloud9 management Console and choose open IDE to launch the AWS Cloud9 environment.
You will need to resize the cloud9 volumes. Please run following command to finish resizing:
wget https://s3.amazonaws.com/ee-assets-prod-us-east-1/modules/c55fc8f9e8cf4231b0c09a7a493fdf78/v1/nested/resize.sh chmod +x resize.sh sh resize.sh 40
Next, upgrade to JDK 8 (required by YCSB):
sudo yum install java-1.8.0-openjdk-devel -y
sudo alternatives --config java
select JDK 8
sudo yum remove -y java-1.7.0-openjdk-devel
Since TLS is used for the Amazon DocumentDB connection, you need to prepare a Java keystore for YCSB. Then run the following commands:
mkdir /tmp/certs wget https://s3.amazonaws.com/ee-assets-prod-us-east-1/modules/c55fc8f9e8cf4231b0c09a7a493fdf78/v1/nested/cert.sh chmod +x cert.sh ./cert.sh
Now install YCSB.
curl -O --location https://github.com/brianfrankcooper/YCSB/releases/download/0.17.0/ycsb-0.17.0.tar.gz tar xfvz ycsb-0.17.0.tar.gz cd ycsb-0.17.0
Next, edit the file
bin/ycsb to set the Java keystore properties. Open this file in an editor on Cloud9, and change the
ycsb_command on line 331:
ycsb_command = ([java] + args.jvm_args + ['-Djavax.net.ssl.trustStore=/tmp/certs/rds-truststore.jks','-Djavax.net.ssl.trustStorePassword=changeit'] + ["-cp", classpath, main_classname, "-db", db_classname] + remaining)
Run the YCSB load test. Use the
workloadb workload, which is a read-heavy workload.
python2 ./bin/ycsb load mongodb -s -P workloads/workloadb -p recordcount=100000 -p mongodb.url=mongodb://$docdbUser:$docdbPass@$docdbEndpoint:27017/?ssl=true&replicaSet=rs0&readPreference=secondaryPreferred&retryWrites=false > load.dat
After several minutes the alarm will trigger and you will receive an email. You should either delete the alarm or raise the threshold so you won’t get repeated alarms; 10% is a very low threshold for CPU utilization.
You can add more alarms for other metrics. Beyond alarms on single metrics, you can use an anomaly detection capability to trigger whenever the metric is outside of a normal range. Or, you can set a compound alarm that triggers when multiple conditions are met. For example, you may want to send an alarm email when both CPU and memory usage are high, but not when only one is high.