Dashboards are useful tools, but for production use cases we want alarms triggered automatically if our metrics cross certain thresholds. That way we can alert our operations team, open tickets in our operations systems, or even start an automated response in some cases.
As a quick review, Amazon CloudWatch alarms trigger based on individual metrics or combinations of metrics. Alarms send notifications to SNS topics. The SNS topic will notify recipients via email, SMS, or via an HTTP endpoint. You can also have the topic send the notification to an Amazon SQS queue or invoke a AWS Lambda function for additional automation. For example, a Lambda function could initiate a scaling action. (Amazon SQS is a fully managed message queuing service, and AWS Lambda is a serverless computing service.)
We can set an alarm trigger threshold for any Amazon CloudWatch metric. The thresholds will vary on your use case. For example, a read-heavy workload may see consistently high load on the read replicas and a lightly used primary node. Write-heavy workloads would see the opposite.
Here are a few suggestions for initial alarm thresholds, but be sure to adjust these thresholds based on your application.
Let’s set up one alarm manually in this chapter. We’ll look at automated alarm creation in the next chapter.
First, go to the Amazon CloudWatch console and make sure you are in the correct region. Now go to the
Alarms section and click
Now, just as when we set up a dashboard, select the metric you want to alarm on. Go into the
DocDB namespace and select the dimension
Cluster metrics by role for the cluster you want to monitor; if you created your cluster in the first chapter, the cluster identifier is
getting-started-with-documentdb. In this example we can choose
CPUUtilization on the read replicas, as it’s fairly easy to artifically trigger this alarm for testing.
On the next page, we’ll configure the alarm. First we’ll select the metric aggregation and period, which defaults to the average over 5 minutes. Change the period to 1 minute.
On the second half of the screen, we specify how the alarm triggers. A static value triggers on a specific limit, such as CPU utilization exceeding 80%. An anomaly detection band will trigger based on expected ranges of the metric. For this example, let’s set it to trigger when the reader CPU utilization exceeds 3% once over a minute. (3% is an artificially low threshold, used so we can generate an alarm.)
On the next screen, we determine what happens when the alarm triggers. For the sake of a quick test, choose to have a new SNS topic configured and specify an email address to receive the notification.
Create Topic and then
Next. Finally, give the alarm a name.
On the final preview page, click
Create alarm. Check your email; you will receive an SNS confirmation email in a few minutes. You need to confirm the subscription before you’ll receive any alarm emails.
In order to see the alarm in action, we’ll connect to the Cloud9 IDE and introduce some read-heavy load. We’ll use the Yahoo! Cloud Serving Benchmark (YCSB), a database benchmarking framework, to run workloads on the database cluster.
Navigate to the AWS cloud9 management Console and choose open IDE to launch the AWS Cloud9 environment.
We’ll need to resize the EBS volume used for Cloud9. Follow these steps:
chmod +x resize.sh
sh resize.sh 40
We also need to upgrade to JDK 8:
sudo yum install java-1.8.0-openjdk-devel -y
sudo alternatives --config java (select JDK 8)
sudo yum remove -y java-1.7.0-openjdk-devel
Since we are using TLS for the Amazon DocumentDB connection, we have to prepare a Java keystore for YCSB. First, download this script and upload it to Cloud9, or just paste it into an editor in Cloud9. Then run:
mkdir /tmp/certs chmod +x cert.sh ./cert.sh
Now install YCSB.
curl -O --location https://github.com/brianfrankcooper/YCSB/releases/download/0.17.0/ycsb-0.17.0.tar.gz tar xfvz ycsb-0.17.0.tar.gz cd ycsb-0.17.0
We need to edit the file
bin/ycsb to set the Java keystore properties. Open this file in an editor on Cloud9, and change the
ycsb_command on line 331:
ycsb_command = ([java] + args.jvm_args + ['-Djavax.net.ssl.trustStore=/tmp/certs/rds-truststore.jks','-Djavax.net.ssl.trustStorePassword=changeit'] + ["-cp", classpath, main_classname, "-db", db_classname] + remaining)
Run the load tester. We’ll use
workloadb, which is a read-heavy workload. In these commands, note that
DatabasePassword are the values you specified for your Amazon DocumentDB database when you created it.
DbEndpoint is available on the console, as described in the Query Cluster section.
python2 ./bin/ycsb load mongodb -s -P workloads/workloadb -p recordcount=100000 -p mongodb.url=mongodb://<DatabaseUser>:<DatabasePassword>@<DbEndpoint>:27017/?ssl=true&replicaSet=rs0&readPreference=secondaryPreferred&retryWrites=false > load.dat
After several minutes the alarm will trigger and you’ll receive an email. You should either delete the alarm or raise the threshold so you won’t get repeated alarms; 3% is a very low threshold for CPU utilization.
You can add more alarms for other metrics. Beyond alarms on single metrics, you can use an anomaly detection capability to trigger whenever the metric is outside of a normal range. Or, you can set a compound alarm that triggers when multiple conditions are met. For example, you may want to send an alarm email when both CPU and memory usage are high, but not when only one is high.