Amazon API Gateway Observability Best Practices with Datadog



This content originally appeared on DEV Community and was authored by Indika_Wimalasuriya

AWS API Gateway is a fully managed service from AWS that allows you to create, publish, and maintain APIs at any scale. It acts as a gateway to your application’s backend services, including AWS Lambda, EKS, ECS, EC2, and more.

You can explore the full documentation here:
🔗 API Gateway Developer Guide – You can refer all the details you wants related to API Gateway here

To make sure we’re aligned on the fundamentals, I’ve created an API Gateway Essentials summary below. It gives you a quick overview of the core capabilities this service offers.

API Gateway Essentials

The main objective of this blog is to walk through how to monitor and observe AWS API Gateway using Datadog — one of the leading observability platforms that provides full-stack visibility into AWS environments.

Before diving in, a quick refresher:
Observability is the practice of using telemetry data (logs, metrics, and traces) to understand a system’s internal state. In this case, we’ll leverage API Gateway’s logs, metrics, and traces to gain insights into what’s really happening under the hood.

API Gateway Logs
AWS provides built-in support for enabling logs. You can enable them under API Gateway → Stages, where logging options are available for both access logs and execution logs.

API Gateway Logs Configuration

Once logging is enabled, you can configure API Gateway to send logs to Datadog.

Configuration guide: Datadog + API Gateway Integration

API Gateway Logs in Datadog

Why Logs Matter
Logs are essential for troubleshooting issues in API Gateway. In most cases, failures fall into one of two categories:

Backend-related issues
Unresponsive services (e.g., Lambda, EC2, EKS) or misconfigurations such as timeouts or incorrect integration responses.

AWS infrastructure-level issues (rare)
These could include internal AWS errors or regional service disruptions.

Common Causes of API Gateway Failures

  • Misconfigured integrations (e.g., VPC links, request/response mapping templates)
  • Backend timeouts
  • Incorrect or missing HTTP status code mappings

API Gateway Metrics

AWS provides a rich set of metrics for API Gateway that align with the three golden signals of observability: traffic, errors, and latency. These metrics are essential for monitoring the health, performance, and reliability of your APIs — helping you detect issues early and respond proactively.

API Gateway Metrics – Grouped Summary

Type Metric Description
Traffic aws.apigateway.count Total number of API requests received
aws.apigateway.count.p50.p99 Percentile distribution of request count
trace.aws.apigateway.hits Total hits from traces
trace.aws.apigateway.hits.by_http_status Hits grouped by HTTP status code
trace.aws.apigateway.stage.hits Hits per deployment stage
trace.aws.apigateway.stage.hits.by_http_status Stage-level hits by HTTP status
Errors aws.apigateway.4xxerror Client-side errors (e.g., invalid request, unauthorized)
aws.apigateway.4xxerror.p50.p99 Percentiles of 4xx error rates
aws.apigateway.5xxerror Server-side/API errors (e.g., backend failure)
aws.apigateway.5xxerror.p50.p99 Percentiles of 5xx error rates
Latency aws.apigateway.latency Total time from request to response (includes backend)
aws.apigateway.latency.p50.p99 Percentile breakdown of total latency
aws.apigateway.latency.minimum / .maximum Min and max observed latency values
Integration Latency aws.apigateway.integration_latency Time spent in the backend integration only
aws.apigateway.integration_latency.p50.p99 Percentile breakdown of backend latency
aws.apigateway.integration_latency.minimum / .maximum Min and max integration latency
Tracing / Duration trace.aws.apigateway.duration Trace-based total API duration
trace.aws.apigateway.duration.by_http_status Duration per status code
trace.aws.apigateway.stage.duration Duration per stage
trace.aws.apigateway.stage.duration.by_http_status Stage duration by status code
Tracing / Apdex trace.aws.apigateway.stage.apdex User satisfaction score (Apdex) per stage
Meta trace.aws.apigateway Base trace for API Gateway
trace.aws.apigateway.stage Trace identifier for specific stage

API Gateway Tracing

A best practice is to enable tracing for Application Performance Monitoring (APM) on your backend services—such as AWS Lambda or microservices running on ECS, EKS, or EC2. Enabling tracing automatically provides you with the API Gateway tracer view, giving detailed insights into the flow and performance of your APIs.

In the example below, I have enabled tracing for an AWS Lambda backend, which allows me to view the API Gateway trace data.

API Gateway Trace View

The example below shows a trace starting from API Gateway, capturing the end-to-end flow through the backend Lambda function and any other integrated services

A Trace starting from API Gateway

Service Level Indicator (SLI) Dashboard for API Gateway

Finally, you need to bring everything together and create a single source of truth dashboard for API Gateway, which provides insights into traffic, errors, and latency. It should include request volume and trends to help identify potential issues promptly.

The dashboard should also highlight:

Failed traces

Traces taking more than x seconds — useful for identifying slow requests passing through API Gateway that require further investigation

Relevant logs for deeper analysis

A combination of all these elements will give you a comprehensive view of your API Gateway, enabling effective monitoring and faster troubleshooting of any potential failures or performance issues.

API Gateway Dashboard

And that wraps up a complete guide to achieving observability for Amazon API Gateway using Datadog.


This content originally appeared on DEV Community and was authored by Indika_Wimalasuriya