Traces at Scale: Head or Tail? Sampling Strategies & Scaling the Collector

This content originally appeared on DEV Community and was authored by taman9333

Tracing at Scale Isn’t Free

In the previous article, we got everything working – traces flowed between services, we visualized them in Jaeger, and we saw end-to-end visibility in action.

But here’s the catch:

In production, things look very different.

Your system might generate millions of traces every day, and I’ve seen companies that run with 100% sampling, storing every single trace, but only keeping them for a very short period, like 1 to 2 days.

Even if the company can afford the storage cost for a longer period than 1 or 2 days, this setup is inefficient:

Developers can’t investigate incidents that happened more than a couple of days ago
A lot of traces are just noise, health checks, fast 200 OKs, and routine traffic
The traces that actually matter (slow requests, failures, edge cases) get lost in the crowd
High cost for exporting and storing all spans – especially when using hosted platforms

Let me quote this visual from the OpenTelemetry documentation:

This image perfectly illustrates the problem. Sampling would solve the above issues

This is where sampling comes in

Sampling helps you reduce volume while still capturing the traces that matter, the ones that help you debug real problems and improve performance.

In this article, we’ll cover:

The difference between head-based and tail-based sampling
How to configure tail-based sampling using OpenTelemetry
And how to scale your collector setup to handle production traffic

Let’s get into it.

Head based Sampling

Head based sampling means deciding whether to keep or drop a trace right at the start, as soon as the first span is created. The decision is made without knowing how the full trace will look.

A common example of this is probability sampling. It uses the trace ID and a set percentage to decide which traces to keep. For example. you might keep 30 percent of all traces. If a trace is selected, all its spans are kept together, so you do not end up with missing spans.

In OpenTelemetry, we usually combine a parent-based sampler with a probability-based sampler. This means if the parent span was sampled, all child spans will be sampled as well. If not, the entire trace will be dropped.

How to Configure Head based Sampling

Head based sampling is simple to set up. You do not need to change your system architecture or add extra components.

You can configure it directly in your application’s code using the OpenTelemetry SDK.

In our case, we are going to use a 30 percent sampling rate in all of the services.

For example, in our Go service, here is how you can enable head based sampling by combining a parent-based sampler with a trace ID ratio-based sampler:

+ sampler := sdktrace.ParentBased(
+              sdktrace.TraceIDRatioBased(0.3)
+            )

tp := sdktrace.NewTracerProvider(
      sdktrace.WithBatcher(exp),
+     sdktrace.WithSampler(sampler),
      sdktrace.WithResource(resource.NewWithAttributes(
            semconv.SchemaURL,
            semconv.ServiceName("service-x"),

In a Ruby service you will need to add a single line

OpenTelemetry::SDK.configure do |c|
  c.service_name = 'service-y'
  c.use 'OpenTelemetry::Instrumentation::Sinatra'
  c.use 'OpenTelemetry::Instrumentation::Faraday'
end

+ OpenTelemetry.tracer_provider.sampler = OpenTelemetry::SDK::Trace::Samplers::TraceIdRatioBased.new(0.3)

In the Node service, you will need to add the following:

+ const { ParentBasedSampler, TraceIdRatioBasedSampler } = require('@opentelemetry/sdk-trace-base');

const provider = new NodeTracerProvider({
  resource: new resourceFromAttributes({
    [ATTR_SERVICE_NAME]: "service-z",
  }),
  spanProcessors: [new SimpleSpanProcessor(exporter)],
+ sampler: new ParentBasedSampler({
+   root: new TraceIdRatioBasedSampler(0.3),
+ })
});

With these changes in place, all three services now use head based sampling by combining a parent-based strategy with a probability sampler set to 30 percent.

Time to test:

I will run the hit_x_service.sh script three times which will generate 30 requests, so we will probably see around 9 traces (It’s a statistical estimate, the actual count may vary but trends align with the percentage as requests grow) in the Jaeger UI.

Yaaay it is working, as you can see in the screenshot we have 9 traces sampled out of 30 requests.

The Catch with Head Based Sampling

If you look again at the screenshot, you will notice the problem with head based sampling. The nine sampled traces do not include any of the error requests. You can also confirm this by observing the scatter plot in the top. The 9 dots represent the 9 sampled traces and the time they were captured. All of them are blue, which means they are successful requests. If an error trace was captured, it would appear as a red dot.

This shows a major limitation. Although head based sampling is simple to understand and configure, it makes the sampling decision before the request is fully processed. That means it can miss important spans such as failures or high latency cases.

In our case, all three errors were dropped. This makes head based sampling unreliable when your goal is to capture anomalies or debug edge cases.

Tail based sampling

Tail based sampling works differently from head-based sampling, instead of making a decision right when a trace starts, the decision to sample a trace takes place by considering all or most of the spans within the trace.

This means you can make smarter choices by looking at the full picture, like checking if any span had an error or latency.

Let me show you a visual from the OpenTelemetry docs that explains it well:

Tail-based Sampling Rules

With tail based sampling, you can define smart rules to decide which traces to keep. For example:

Always keep traces that include an error
Keep traces with high latency
Sample based on specific span attributes – like keeping more traces from a newly deployed service
Drop traces that match certain paths – like health check endpoints

And many other policies you can define based on your needs or business logic.

How to Implement Tail-based Sampling

To use tail based sampling, you’ll need to introduce a new component into your infrastructure, the OpenTelemetry Collector.

But wait… what the heck is that?

Let’s break it down.

What is the OpenTelemetry Collector?

The OpenTelemetry Collector is a vendor-agnostic implementation of how to receive, process and export telemetry data(logs, metrics & traces) to an observability backend.

It simplifies your setup by removing the need to run different agents for each type of telemetry. Instead, it acts as a single, unified point to collect and forward all your data.

Looking at the photo makes it much clearer.

On the left side, we have a typical cluster or host with different services running. These services continuously produce logs, metrics, and traces.

All of this data is sent to the OpenTelemetry Collector. The collector performs three steps for each telemetry type:

Receive the data
Process it based on your pipeline configuration before it’s exported. These processors can:
- Filter unnecessary data
- Transform or enrich spans with additional metadata
- Batch data to improve performance and reduce backend load
Export it to your observability backend

On the right, you can see popular observability platforms where the data can be exported, such as Prometheus, Grafana, Datadog, Loki and others.

The real power of using the OpenTelemetry Collector is that it acts as a central hub for all your telemetry data. Instead of asking every single service in your system to know where to send logs, metrics, or traces, or how to talk to different backends like Prometheus, Grafana, or Datadog, you let the collector handle it all in one place.

This means:

Your services stay lightweight and simple
You can change or add observability backends without touching the code in your services
You gain more control over processing and filtering data before it gets stored

In short, the collector decouples your application code from your observability tooling, which makes your system more flexible, maintainable, and scalable.

Installing OpenTelemetry Collector

To keep things simple, we won’t include the changes we made earlier for head-based sampling. That code lives in the head-based-sampling branch.

Instead, we’ll use the same instrumentation setup from the first article, which is available in the main branch.

All the changes required for tail-based sampling will be done in a new branch called tail-based-sampling. These changes include:

Updating the docker-compose.yml file to add the OpenTelemetry Collector

version: '3'

services:
+  otel-collector:
+    image: otel/opentelemetry-collector-contrib:0.130.0
+    command: ["--config=/etc/otel-collector.yaml"]
+    volumes:
+      - ./otel-collector.yaml:/etc/otel-collector.yaml
+    ports:
+      - 4317:4317
+      - 4318:4318
+    depends_on:
+      - jaeger

  jaeger:
    image: jaegertracing/all-in-one:1.71.0
-    command:
-      - "--collector.otlp.grpc.tls.enabled=false"
    ports:
      - "16686:16686"   # Jaeger UI
-      - "4317:4317"     # OTLP gRPC
-      - "4318:4318"     # OTLP HTTP

Notice that we removed ports 4317 and 4318 from the Jaeger service. That’s because we won’t send traces from our services directly to Jaeger anymore.

Instead, we’ll route all trace data to the otel-collector service first. To enable that, we exposed ports 4317 and 4318 in the otel-collector service, where our x, y, and z services send traces via HTTP to port 4318.

Then, the collector will export traces to Jaeger internally via OTLP gRPC on port 4317 inside the Docker network.

Creating a new file named otel-collector.yaml in the root directory for the collector configuration

This file defines how the OpenTelemetry Collector will receive, process, and export trace data using tail-based sampling.

Let’s break down what each section in the otel-collector.yaml file is doing.

Receivers

receivers:
  otlp:
    protocols:
      http:
        endpoint: 0.0.0.0:4318
      grpc:
        endpoint: 0.0.0.0:4317

Receivers are the entry point to the OpenTelemetry Collector. They collect telemetry data from your services and pass it into the processing pipeline.

In our case, we are only dealing with traces. All three services, Go, Ruby, and Node.js, are configured to send traces via the OTLP HTTP protocol.

The OTLP receiver starts both HTTP and gRPC servers, listening on ports 4318 and 4317 respectively. But since all our services send traces using HTTP, we do not need the gRPC server but we will not remove the code of grpc endpoint since we will use it later when scaling the collector.

The HTTP server on port 4318 will continue receiving traces, we configured all of our 3 services to send data through http to this port.

Receivers are mandatory in every collector configuration, without at least one, the collector will not function.

Processors

processors:
  tail_sampling:
    decision_wait: 10s
    num_traces: 100
    expected_new_traces_per_sec: 10
    policies:
      [
        {
          name: errors,
          type: status_code,
          status_code: { status_codes: [ERROR] }
        },
        {
            name: drop-health-checks,
            type: drop,
            drop: {
              drop_sub_policy:
              [
                {
                    name: drop-health-paths,
                    type: string_attribute,
                    string_attribute: {key: url.path, values: [\/health], enabled_regex_matching: true}
                }
              ]
            }
         },
        {
          name: probabilistic_30_percent,
          type: probabilistic,
          probabilistic: { sampling_percentage: 30 }
        }
      ]

Processors sit in the middle of the pipeline between collecting data and exporting it. They handle tasks like filtering out noise, enriching spans with more context, transforming data formats, or batching data together to improve performance. This step ensures your telemetry data is optimized before being sent to your backend.

Understanding the Processor Configuration

Let’s go through the processors section of the otel-collector.yaml file.

We’re using the tail_sampling processor, one of many available in the OpenTelemetry Collector.

`decision_wait`

The decision_wait option sets how long the collector should wait (starting from the first span of a trace) before making a sampling decision.

By default, it’s set to 30s. In our config, we’ve reduced it to 10s to speed things up for local development.

When to increase `decision_wait`:

Long-running traces – If your system involves operations that take time to complete (e.g., async workflows, retries, or background jobs), a longer wait ensures all spans are included.
Retry and backoff logic – If some spans are delayed due to retries, a short wait might cause them to be missed in the sampling decision.

Potential downsides of increasing `decision_wait`:

Increased memory usage – The collector needs to buffer spans in memory for a longer period.
More latency – Traces will be processed and exported later, as the collector waits before deciding.

`num_traces`

Default: 50000
Purpose: Controls how many traces are kept in memory at a time.
If your services generate a high number of traces, you might need to increase this to avoid dropping spans before a sampling decision is made.

`expected_new_traces_per_sec`

Default: 0
Purpose: An estimate of how many new traces the collector expects per second.
This helps the collector allocate memory and data structures more efficiently.

Why it matters:

Too low → Frequent reallocations, hurting performance.
Too high → Wasted memory.

`decision_cache`

Even though we haven’t included decision_cache in our config, it’s a useful tuning parameter for systems with delayed or long-lived traces (async, retry & backoff logic).

Purpose: Controls the number of traces for which a sampling decision is cached.
Even after a sampling decision is made for a trace, spans might continue to arrive. This cache ensures those late spans are handled properly.
Use case: In systems where spans can arrive out of order or with delays (e.g., async processing or retries), setting this appropriately helps avoid dropping important spans that arrive after the decision.

It keeps a short-term memory of sampling decisions:

sampled_cache_size: remembers trace IDs that were sampled → accepts late spans
non_sampled_cache_size: remembers trace IDs that were dropped → drops late spans

Sampling Policies

The policies section defines how we want to sample traces based on specific criteria. Here’s what each policy does:

errors Samples any trace that contains a span with status.code = ERROR. This ensures we always keep traces that highlight problems.

{
  name: errors,
  type: status_code,
  status_code: { status_codes: [ERROR] }
},

drop-health-checks Drops traces where the span path matches /health. Health checks are frequent and usually not helpful for debugging, so we exclude them to reduce noise.

{
  name: drop-health-checks,
  type: drop,
  drop: {
    drop_sub_policy:
    [
      {
          name: drop-health-paths,
          type: string_attribute,
          string_attribute: {key: url.path, values: [\/health], enabled_regex_matching: true}
      }
    ]
  }
}

probabilistic_30_percent Samples 30% of the remaining traces. This helps retain a representative view of normal, successful traffic. Even though these traces aren’t errors, they’re useful for:
- Monitoring overall system behavior
- Identifying performance trends
- Analyzing latency patterns

{
  name: probabilistic_30_percent,
  type: probabilistic,
  probabilistic: { sampling_percentage: 30 }
}

You can fine-tune these policies based on your system’s needs and business logic. You can find more examples of available policies here.

Exporters

Exporters are the final step in the Collector pipeline. They send the processed telemetry data like traces, metrics, and logs to a backend system where the data can be stored, visualized, and analyzed.

At least one exporter must be defined for the Collector to function.

In our setup, we use the OTLP exporter to send trace data to Jaeger using gRPC over port 4317 since Jaeger supports OTLP(OpenTelemetry Protocol):

exporters:
  otlp:
    endpoint: jaeger:4317
    tls:
      insecure: true

This exporter sends data to the jaeger service over the Docker network.
We use insecure: true because this is a local dev setup. Avoid this in production.

Service

This section defines how everything in the collector connects together.

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [tail_sampling]
      exporters: [otlp]

The service block brings all the configured pieces – receivers, processors, and exporters into action.
If you define a component but forget to include it here, it won’t be used.

Pipelines

Under pipelines, you configure how the data flows through the system. In our case, we only define a traces pipeline. Other types like metrics and logs are possible too.

Each pipeline must include:

At least one receiver (to accept data)
Zero or more processors (to modify or filter it)
At least one exporter (to send it somewhere)

Make sure each part used in the pipeline is properly defined in its corresponding top-level section.

Time to Test

To try everything out:

Run the collector and Jaeger with: docker-compose up --build
Start all three services:
- Go: go run main.go
- Ruby: bundle exec ruby server.rb
- Node.js: node index.js

Then let’s generate some traffic:

Run the hit_x_service.sh script three times to send 30 requests, just like we did with head-based sampling.
In addition, call the health check endpoint several times:
curl localhost:3000/health

This will help us verify if health checks are correctly excluded from sampling.

We’ll probably see around 9 traces (again it’s a statistical estimate) in the Jaeger UI.

Yaaay We can see 10 traces. If you look at the top in the scatter plot, you’ll notice 3 red dots, which indicate the error traces. This means:

Our probabilistic sampling of 30% is working
The 3 errors are all present, so error sampling is working
No health check traces appear, they’ve been dropped as expected

Scaling the Collector – Why and How

As our system grows and the number of instrumented services increases, the volume of telemetry data can quickly overwhelm a single collector instance.

Whether the collector is processing traces only or handling traces, metrics, and logs, running everything through one instance can lead to:

Bottlenecks in processing and exporting data
Increased latency in trace availability
Risk of dropped data during traffic spikes
Limited fault tolerance if the single collector fails

To handle larger volumes reliably, we need to scale the collector horizontally by running multiple instances and distributing the load across them — while ensuring spans from the same trace go to the same collector.

Deployment patterns

No Collector

When we used head based sampling, services sent traces straight to Jaeger. That is a direct integration – no collector in the path.

Visual from the OpenTelemetry docs

Agent

With tail based sampling, we introduced the collector next to each service. This is the agent deployment pattern.

Applications instrumented with OTLP send telemetry to a collector running with the app or on the same host.

Visual from the OpenTelemetry docs

Pros

Simple to get started
Clear 1-1 mapping between application and collector

Cons

Limited scalability, especially if the collector must handle traces, logs, and metrics
Harder to manage at scale across many hosts

Gateway

The solution is the third pattern – the gateway.

In the gateway deployment, apps or sidecar collectors send telemetry to one OTLP endpoint that fronts a pool of collectors running as a standalone service – per cluster, per data center, or per region.

Visual from the OpenTelemetry docs

Important note – collectors are stateful

Collectors hold data in memory. Tail sampling buffers spans until a decision is made.

If you scale collectors horizontally without coordination, different replicas may receive spans from the same trace. Each replica will decide on sampling independently. Results can diverge. You may end up with traces missing spans, which misrepresents what happened.

How to scale correctly

Place a load balancing layer of collectors in front of the tail sampling collectors.

Use the load-balancing exporter to route all spans of the same trace to the same backend collector.

It does this by hashing the trace id, or the service name, and consistently sending related spans to the same target.

OpenTelemetry provides this load-balancing exporter out of the box. Next, we will see how to configure it in code.

Docker changes for scaling with a gateway

We renamed the otel-collector service to otel-collector-1, and duplicated it as otel-collector-2 and otel-collector-3
These three collectors run tail based sampling, and do not expose host ports, since services will not talk to them directly
We added an otel-gateway service that runs the load balancing exporter, and exposes ports 4317 and 4318
All app services send OTLP traffic to the gateway, and the gateway consistently routes each trace to one of the collectors

services:
  otel-collector-1:
    image: otel/opentelemetry-collector-contrib:0.130.0
    command: ["--config=/etc/otel-collector.yaml"]
    volumes:
      - ./otel-collector.yaml:/etc/otel-collector.yaml
    ports:
      - "4317"        # OTLP gRPC receiver

  otel-collector-2:
    image: otel/opentelemetry-collector-contrib:0.130.0
    command: ["--config=/etc/otel-collector.yaml"]
    volumes:
      - ./otel-collector.yaml:/etc/otel-collector.yaml
    ports:
      - "4317"        # OTLP gRPC receiver

  otel-collector-3:
    image: otel/opentelemetry-collector-contrib:0.130.0
    command: ["--config=/etc/otel-collector.yaml"]
    volumes:
      - ./otel-collector.yaml:/etc/otel-collector.yaml]
    ports:
      - "4317"        # OTLP gRPC receiver

  # Otel gateway running load balancing exporter
  otel-gateway:
    image: otel/opentelemetry-collector-contrib:0.130.0
    command: ["--config=/etc/otel-gateway.yaml"]
    volumes:
      - ./otel-gateway.yaml:/etc/otel-gateway.yaml
    ports:
      - 4317:4317     # OTLP gRPC
      - 4318:4318     # OTLP HTTP
    depends_on:
      - otel-collector-1
      - otel-collector-2
      - otel-collector-3

otel-gateway.yaml – what it does

This file defines a gateway collector that accepts OTLP traffic from services and forwards spans to a pool of tail sampling collectors using the load balancing exporter.

receivers:
  otlp:
    protocols:
      http:
        endpoint: 0.0.0.0:4318

exporters:
  loadbalancing:
    routing_key: traceID
    protocol:
      otlp:
        tls:
          insecure: true
    resolver:
      static:
        hostnames:
          - otel-collector-1:4317
          - otel-collector-2:4317
          - otel-collector-3:4317

service:
  telemetry:
    logs:
      level: debug
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [loadbalancing]

Receivers

otlp listens on 4318 for HTTP since our services send traces via HTTP to 0.0.0.0:4318

Exporters – loadbalancing

resolver.static.hostnames lists the downstream collectors to send to
We use Docker Compose service names since within the Docker Compose network, service names act as DNS hostnames. so otel-collector-1:4317, otel-collector-2:4317, otel-collector-3:4317
routing_key: traceID means all spans that share the same trace id are routed to the same downstream collector, avoiding cases where different collectors sample parts of the same trace and cause incomplete or misleading results.

Service

telemetry.logs.level: debug helps with debugging the gateway behavior. We’ve also added the same telemetry configuration to otel-collector.yaml so that all three collectors produce debug-level logs, making it easier to verify that everything is working correctly.
pipelines.traces wires the otlp receiver to the loadbalancing exporter

Time to Test

If you still have Docker running, run:

docker-compose down
docker-compose up --build

Then execute the ./hit_x_service.sh script three times, just like we did when testing tail-based sampling without scaling the collector. This will generate 30 requests. We’d expect to see around 9 traces in Jaeger.

After checking Jaeger UI, we’ve received 10 traces (by coincidence, same as last test), 3 error traces and 7 normal traces. All traces show complete spans, meaning nothing was dropped. This confirms that spans from the same trace were routed to the same collector.

Now we need to confirm that traces were actually distributed across collectors, and not all sent to a single one. Since we enabled telemetry debug logs in every collector, we can run the following command for each collector service to filter useful logs:

docker-compose logs {{SERVICE_NAME}} 2>&1 | grep '"batch.len": [1-9]'

This filters out noisy logs, showing only batches where spans were sent.

Log results:

otel-collector-1 → 2 logs – total traces: 10 (sampled: 5, notSampled: 5). Here, total traces = sum of "batch.len" values (2 from first log + 8 from second log).
otel-collector-2 → 3 logs – total traces: 11 (sampled: 2, notSampled: 9)
otel-collector-3 → 2 logs – total traces: 9 (sampled: 3, notSampled: 6)

This means all collectors together received exactly 30 traces, matching the requests sent.

Total sampled traces = 10, which matches what we see in Jaeger UI.

Total not-sampled traces = 20, as expected.

Final Architecture

And here is our final architecture:

This setup allows us to build a robust distributed tracing system that can absorb millions of traces efficiently, while keeping costs lower and reducing noise as much as possible.

By combining tail-based sampling, load balancing across multiple collectors, and selective sampling policies, we ensure that we capture the most valuable traces without overloading our backend.

This content originally appeared on DEV Community and was authored by taman9333

Tracing at Scale Isn’t Free

This is where sampling comes in

Head based Sampling

How to Configure Head based Sampling

Time to test:

The Catch with Head Based Sampling

Tail based sampling

Tail-based Sampling Rules

How to Implement Tail-based Sampling

What is the OpenTelemetry Collector?

Installing OpenTelemetry Collector

Receivers

Processors

Understanding the Processor Configuration

decision_wait

When to increase decision_wait:

Potential downsides of increasing decision_wait:

num_traces

expected_new_traces_per_sec

Why it matters:

decision_cache

Sampling Policies

Exporters

Service

Time to Test

Scaling the Collector – Why and How

Deployment patterns

No Collector

Agent

Gateway

Important note – collectors are stateful

How to scale correctly

Docker changes for scaling with a gateway

otel-gateway.yaml – what it does

Time to Test

Final Architecture

`decision_wait`

When to increase `decision_wait`:

Potential downsides of increasing `decision_wait`:

`num_traces`

`expected_new_traces_per_sec`

`decision_cache`