CloudWatch

CloudWatch Overview:

  • Performance monitoring, alarms, log collections and automated remediation.
  • Used to collect performance metrics (AWS and On-prem), event driven responses, improve performance and resource optimisation and operational visibility with logs.

CloudWatch Features:

  • Metrics
  • Alarm
  • Logs
  • Events

CloudWatch Metrics:

  • Metrics are sent from many services.
  • EC2 for example are sent every 5 mins (free).
  • This can be changed to every minute for a cost.
  • CloudWatch Agent required for system level metrics (Memory/Disk).
  • Can also publish custom metrics: Standard Resolution: 1 Min, High Res: 1 sec.
  • Metrics exist within a region, cannot be deleted, expire after 15 months (default).
  • Metrics are uniquely defined by: Name, Namespace and zero or more dimensions.
  • Timestamps are two weeks in the past to two weeks in the future.
  • Metric retention depends on data period length:
  • <60 seconds = Retained for 3 hours, 60 seconds = 15 day retention, 300 seconds = 63 Day retention and 3600 seconds = 455 Days retention.

CloudWatch Alarm:

  • Metric Alarm performs an action based on single metric.
  • Composite Alarm rule expression includes multiple alarms.
  • Can create custom alarm using 'set-alarm-state' API/CLI Command.
  • Metric Alarm States: OK, Alarm and Insufficient_Data.

Namespaces:

  • A namespace is simply a container for metrics of different services.
  • Metrics in different namespaces are isolated.
  • Namespace for each data point you publish must be specified.

Dimensions:

  • A way to further categorise a namespace.
  • Key-Value pairs.
  • Up to 10 dimensions for a metric.
  • Dimensions are categories that define characteristics of each metric.
  • Enable to organise and filter metric data based on specific attributes.
  • Attributes could be resource-id,  service type etc.

CloudWatch Statistics:

  • Statistics of data collected from metrics of Data Points (EC2 etc).
  • Examples include: Averages (min, max, samplecount etc).

CloudWatch API Metrics:

  • 'GetMetricData' = Retrieve up to 500 Metrics in a single call.
  • 'PutMetricData' = Publishes metric data points to CloudWatch. If it doesnt exist, it is created.
  • 'GetMetricStatistics' = Get stats for specified metric, aggregates data points based on time specified (Max No. of Data Points is 1440 in single call).

Unified CloudWatch Agent:

  • Used to collect internal system-level metrics for AWS and On-Prem devices.
  • Can collect custom app metrics.
  • Can collect logs for Windows/Linux.
  • Installation process is as follows:
  1. Create IAM role to enable metric collection.
  2. install/download agent package.
  3. Update CloudWatch agent config file.
  4. Start agent.

When starting the agent, you must attach an instance profile or specify a named profile for On-Prem, or you can integrate it with Systems Manager.

CloudWatch Logs:

  • Gather and centralise logs for apps and system logs.
  • Define expiration policies and KMS Encryption.
  • Send the logs to: S3, Kinesis Streams or Kinesis Firehose.
  • Log Insights used to interactively search and analyze data.
  • Create alarms based on API activity captured by CloudTrail.
  • Logs are kept indefinitely (Default).
  • CloudWatch logs can be used with Route 53 for DNS Queries.
Log Components:
  • Events: Record or activity by monitored resource.
  • Stream: Sequence of events that share same resource.
  • Groups: Groups of log streams that share retention/monitoring settings.
  • Metric Filters: Used to extract metric observations from ingested events and transform them to Data Points in CloudWatch Metric.
  • Retention Setting: How long log events are kept in the CloudWatch Logs.

EventBridge (Formally CloudWatch Events):

  • It is a serverless service.
  • It helps to create Event-Driven architecture.
  • Services and Applications send events to the EventBridge Event Bus.
  • The EventBridge has some configured rules defined by the user.
  • These rules invoke a remediation or trigger a 'Target' action.
  • Common targets can be Lambda or Kinesis.