This content originally appeared on DEV Community and was authored by taman9333
Tracing at Scale Isn’t Free
In the previous article, we got everything working – traces flowed between services, we visualized them in Jaeger, and we saw end-to-end visibility in action.
But here’s the catch:
In production, things look very different.
Your system might generate millions of traces every day, and I’ve seen companies that run with 100% sampling, storing every single trace, but only keeping them for a very short period, like 1 to 2 days.
Even if the company can afford the storage cost for a longer period than 1 or 2 days, this setup is inefficient:
- Developers can’t investigate incidents that happened more than a couple of days ago
- A lot of traces are just noise, health checks, fast 200 OKs, and routine traffic
- The traces that actually matter (slow requests, failures, edge cases) get lost in the crowd
- High cost for exporting and storing all spans – especially when using hosted platforms
Let me quote this visual from the OpenTelemetry documentation:
This image perfectly illustrates the problem. Sampling would solve the above issues
This is where sampling comes in
Sampling helps you reduce volume while still capturing the traces that matter, the ones that help you debug real problems and improve performance.
In this article, we’ll cover:
- The difference between head-based and tail-based sampling
- How to configure tail-based sampling using OpenTelemetry
- And how to scale your collector setup to handle production traffic
Let’s get into it.
Head based Sampling
Head based sampling means deciding whether to keep or drop a trace right at the start, as soon as the first span is created. The decision is made without knowing how the full trace will look.
A common example of this is probability sampling. It uses the trace ID and a set percentage to decide which traces to keep. For example. you might keep 30 percent of all traces. If a trace is selected, all its spans are kept together, so you do not end up with missing spans.
In OpenTelemetry, we usually combine a parent-based sampler with a probability-based sampler. This means if the parent span was sampled, all child spans will be sampled as well. If not, the entire trace will be dropped.
How to Configure Head based Sampling
Head based sampling is simple to set up. You do not need to change your system architecture or add extra components.
You can configure it directly in your application’s code using the OpenTelemetry SDK.
In our case, we are going to use a 30 percent sampling rate in all of the services.
For example, in our Go service, here is how you can enable head based sampling by combining a parent-based sampler with a trace ID ratio-based sampler:
+ sampler := sdktrace.ParentBased(
+ sdktrace.TraceIDRatioBased(0.3)
+ )
tp := sdktrace.NewTracerProvider(
sdktrace.WithBatcher(exp),
+ sdktrace.WithSampler(sampler),
sdktrace.WithResource(resource.NewWithAttributes(
semconv.SchemaURL,
semconv.ServiceName("service-x"),
In a Ruby service you will need to add a single line
OpenTelemetry::SDK.configure do |c|
c.service_name = 'service-y'
c.use 'OpenTelemetry::Instrumentation::Sinatra'
c.use 'OpenTelemetry::Instrumentation::Faraday'
end
+ OpenTelemetry.tracer_provider.sampler = OpenTelemetry::SDK::Trace::Samplers::TraceIdRatioBased.new(0.3)
In the Node service, you will need to add the following:
+ const { ParentBasedSampler, TraceIdRatioBasedSampler } = require('@opentelemetry/sdk-trace-base');
const provider = new NodeTracerProvider({
resource: new resourceFromAttributes({
[ATTR_SERVICE_NAME]: "service-z",
}),
spanProcessors: [new SimpleSpanProcessor(exporter)],
+ sampler: new ParentBasedSampler({
+ root: new TraceIdRatioBasedSampler(0.3),
+ })
});
With these changes in place, all three services now use head based sampling by combining a parent-based strategy with a probability sampler set to 30 percent.
Time to test:
I will run the hit_x_service.sh script three times which will generate 30 requests, so we will probably see around 9 traces (It’s a statistical estimate, the actual count may vary but trends align with the percentage as requests grow) in the Jaeger UI.
Yaaay it is working, as you can see in the screenshot we have 9 traces sampled out of 30 requests.
The Catch with Head Based Sampling
If you look again at the screenshot, you will notice the problem with head based sampling. The nine sampled traces do not include any of the error requests. You can also confirm this by observing the scatter plot in the top. The 9 dots represent the 9 sampled traces and the time they were captured. All of them are blue, which means they are successful requests. If an error trace was captured, it would appear as a red dot.
This shows a major limitation. Although head based sampling is simple to understand and configure, it makes the sampling decision before the request is fully processed. That means it can miss important spans such as failures or high latency cases.
In our case, all three errors were dropped. This makes head based sampling unreliable when your goal is to capture anomalies or debug edge cases.
Tail based sampling
Tail based sampling works differently from head-based sampling, instead of making a decision right when a trace starts, the decision to sample a trace takes place by considering all or most of the spans within the trace.
This means you can make smarter choices by looking at the full picture, like checking if any span had an error or latency.
Let me show you a visual from the OpenTelemetry docs that explains it well:
Tail-based Sampling Rules
With tail based sampling, you can define smart rules to decide which traces to keep. For example:
- Always keep traces that include an error
- Keep traces with high latency
- Sample based on specific span attributes – like keeping more traces from a newly deployed service
- Drop traces that match certain paths – like health check endpoints
And many other policies you can define based on your needs or business logic.
How to Implement Tail-based Sampling
To use tail based sampling, you’ll need to introduce a new component into your infrastructure, the OpenTelemetry Collector.
But wait… what the heck is that?
Let’s break it down.
What is the OpenTelemetry Collector?
The OpenTelemetry Collector is a vendor-agnostic implementation of how to receive, process and export telemetry data(logs, metrics & traces) to an observability backend.
It simplifies your setup by removing the need to run different agents for each type of telemetry. Instead, it acts as a single, unified point to collect and forward all your data.
Looking at the photo makes it much clearer.
On the left side, we have a typical cluster or host with different services running. These services continuously produce logs, metrics, and traces.
All of this data is sent to the OpenTelemetry Collector. The collector performs three steps for each telemetry type:
- Receive the data
-
Process it based on your pipeline configuration before it’s exported. These processors can:
- Filter unnecessary data
- Transform or enrich spans with additional metadata
- Batch data to improve performance and reduce backend load
- Export it to your observability backend
On the right, you can see popular observability platforms where the data can be exported, such as Prometheus, Grafana, Datadog, Loki and others.
The real power of using the OpenTelemetry Collector is that it acts as a central hub for all your telemetry data. Instead of asking every single service in your system to know where to send logs, metrics, or traces, or how to talk to different backends like Prometheus, Grafana, or Datadog, you let the collector handle it all in one place.
This means:
- Your services stay lightweight and simple
- You can change or add observability backends without touching the code in your services
- You gain more control over processing and filtering data before it gets stored
In short, the collector decouples your application code from your observability tooling, which makes your system more flexible, maintainable, and scalable.
Installing OpenTelemetry Collector
To keep things simple, we won’t include the changes we made earlier for head-based sampling. That code lives in the head-based-sampling branch.
Instead, we’ll use the same instrumentation setup from the first article, which is available in the main
branch.
All the changes required for tail-based sampling will be done in a new branch called tail-based-sampling. These changes include:
- Updating the docker-compose.yml file to add the OpenTelemetry Collector
version: '3'
services:
+ otel-collector:
+ image: otel/opentelemetry-collector-contrib:0.130.0
+ command: ["--config=/etc/otel-collector.yaml"]
+ volumes:
+ - ./otel-collector.yaml:/etc/otel-collector.yaml
+ ports:
+ - 4317:4317
+ - 4318:4318
+ depends_on:
+ - jaeger
jaeger:
image: jaegertracing/all-in-one:1.71.0
- command:
- - "--collector.otlp.grpc.tls.enabled=false"
ports:
- "16686:16686" # Jaeger UI
- - "4317:4317" # OTLP gRPC
- - "4318:4318" # OTLP HTTP
Notice that we removed ports 4317
and 4318
from the Jaeger service. That’s because we won’t send traces from our services directly to Jaeger anymore.
Instead, we’ll route all trace data to the otel-collector
service first. To enable that, we exposed ports 4317 and 4318 in the otel-collector service, where our x, y, and z services send traces via HTTP to port 4318.
Then, the collector will export traces to Jaeger internally via OTLP gRPC on port 4317
inside the Docker network.
- Creating a new file named otel-collector.yaml in the root directory for the collector configuration
This file defines how the OpenTelemetry Collector will receive, process, and export trace data using tail-based sampling.
Let’s break down what each section in the otel-collector.yaml file is doing.
Receivers
receivers:
otlp:
protocols:
http:
endpoint: 0.0.0.0:4318
grpc:
endpoint: 0.0.0.0:4317
Receivers are the entry point to the OpenTelemetry Collector. They collect telemetry data from your services and pass it into the processing pipeline.
In our case, we are only dealing with traces. All three services, Go, Ruby, and Node.js, are configured to send traces via the OTLP HTTP protocol.
The OTLP receiver starts both HTTP and gRPC servers, listening on ports 4318 and 4317 respectively. But since all our services send traces using HTTP, we do not need the gRPC server but we will not remove the code of grpc endpoint since we will use it later when scaling the collector.
The HTTP server on port 4318 will continue receiving traces, we configured all of our 3 services to send data through http to this port.
Receivers are mandatory in every collector configuration, without at least one, the collector will not function.
Processors
processors:
tail_sampling:
decision_wait: 10s
num_traces: 100
expected_new_traces_per_sec: 10
policies:
[
{
name: errors,
type: status_code,
status_code: { status_codes: [ERROR] }
},
{
name: drop-health-checks,
type: drop,
drop: {
drop_sub_policy:
[
{
name: drop-health-paths,
type: string_attribute,
string_attribute: {key: url.path, values: [\/health], enabled_regex_matching: true}
}
]
}
},
{
name: probabilistic_30_percent,
type: probabilistic,
probabilistic: { sampling_percentage: 30 }
}
]
Processors sit in the middle of the pipeline between collecting data and exporting it. They handle tasks like filtering out noise, enriching spans with more context, transforming data formats, or batching data together to improve performance. This step ensures your telemetry data is optimized before being sent to your backend.
Understanding the Processor Configuration
Let’s go through the processors
section of the otel-collector.yaml
file.
We’re using the tail_sampling processor, one of many available in the OpenTelemetry Collector.
decision_wait
The decision_wait
option sets how long the collector should wait (starting from the first span of a trace) before making a sampling decision.
By default, it’s set to 30s
. In our config, we’ve reduced it to 10s
to speed things up for local development.
When to increase decision_wait
:
- Long-running traces – If your system involves operations that take time to complete (e.g., async workflows, retries, or background jobs), a longer wait ensures all spans are included.
- Retry and backoff logic – If some spans are delayed due to retries, a short wait might cause them to be missed in the sampling decision.
Potential downsides of increasing decision_wait
:
- Increased memory usage – The collector needs to buffer spans in memory for a longer period.
- More latency – Traces will be processed and exported later, as the collector waits before deciding.
num_traces
-
Default:
50000
- Purpose: Controls how many traces are kept in memory at a time.
- If your services generate a high number of traces, you might need to increase this to avoid dropping spans before a sampling decision is made.
expected_new_traces_per_sec
-
Default:
0
- Purpose: An estimate of how many new traces the collector expects per second.
- This helps the collector allocate memory and data structures more efficiently.
Why it matters:
- Too low → Frequent reallocations, hurting performance.
- Too high → Wasted memory.
decision_cache
Even though we haven’t included decision_cache
in our config, it’s a useful tuning parameter for systems with delayed or long-lived traces (async, retry & backoff logic).
- Purpose: Controls the number of traces for which a sampling decision is cached.
- Even after a sampling decision is made for a trace, spans might continue to arrive. This cache ensures those late spans are handled properly.
- Use case: In systems where spans can arrive out of order or with delays (e.g., async processing or retries), setting this appropriately helps avoid dropping important spans that arrive after the decision.
It keeps a short-term memory of sampling decisions:
-
sampled_cache_size
: remembers trace IDs that were sampled → accepts late spans -
non_sampled_cache_size
: remembers trace IDs that were dropped → drops late spans
Sampling Policies
The policies
section defines how we want to sample traces based on specific criteria. Here’s what each policy does:
-
errors Samples any trace that contains a span with
status.code = ERROR
. This ensures we always keep traces that highlight problems.
{
name: errors,
type: status_code,
status_code: { status_codes: [ERROR] }
},
-
drop-health-checks Drops traces where the span path matches
/health
. Health checks are frequent and usually not helpful for debugging, so we exclude them to reduce noise.
{
name: drop-health-checks,
type: drop,
drop: {
drop_sub_policy:
[
{
name: drop-health-paths,
type: string_attribute,
string_attribute: {key: url.path, values: [\/health], enabled_regex_matching: true}
}
]
}
}
-
probabilistic_30_percent Samples 30% of the remaining traces. This helps retain a representative view of normal, successful traffic. Even though these traces aren’t errors, they’re useful for:
- Monitoring overall system behavior
- Identifying performance trends
- Analyzing latency patterns
{
name: probabilistic_30_percent,
type: probabilistic,
probabilistic: { sampling_percentage: 30 }
}
You can fine-tune these policies based on your system’s needs and business logic. You can find more examples of available policies here.
Exporters
Exporters are the final step in the Collector pipeline. They send the processed telemetry data like traces, metrics, and logs to a backend system where the data can be stored, visualized, and analyzed.
At least one exporter must be defined for the Collector to function.
In our setup, we use the OTLP exporter to send trace data to Jaeger using gRPC over port 4317
since Jaeger supports OTLP(OpenTelemetry Protocol):
exporters:
otlp:
endpoint: jaeger:4317
tls:
insecure: true
- This exporter sends data to the
jaeger
service over the Docker network. - We use
insecure: true
because this is a local dev setup. Avoid this in production.
Service
This section defines how everything in the collector connects together.
service:
pipelines:
traces:
receivers: [otlp]
processors: [tail_sampling]
exporters: [otlp]
- The
service
block brings all the configured pieces – receivers, processors, and exporters into action. - If you define a component but forget to include it here, it won’t be used.
Pipelines
Under pipelines
, you configure how the data flows through the system. In our case, we only define a traces
pipeline. Other types like metrics
and logs
are possible too.
Each pipeline must include:
- At least one receiver (to accept data)
- Zero or more processors (to modify or filter it)
- At least one exporter (to send it somewhere)
Make sure each part used in the pipeline is properly defined in its corresponding top-level section.
Time to Test
To try everything out:
- Run the collector and Jaeger with:
docker-compose up --build
- Start all three services:
- Go:
go run main.go
- Ruby:
bundle exec ruby server.rb
- Node.js:
node index.js
- Go:
Then let’s generate some traffic:
Run the hit_x_service.sh script three times to send 30 requests, just like we did with head-based sampling.
In addition, call the health check endpoint several times:
curl localhost:3000/health
This will help us verify if health checks are correctly excluded from sampling.
We’ll probably see around 9 traces (again it’s a statistical estimate) in the Jaeger UI.
Yaaay We can see 10 traces. If you look at the top in the scatter plot, you’ll notice 3 red dots, which indicate the error traces. This means:
Our probabilistic sampling of 30% is working
The 3 errors are all present, so error sampling is working
No health check traces appear, they’ve been dropped as expected
Scaling the Collector – Why and How
As our system grows and the number of instrumented services increases, the volume of telemetry data can quickly overwhelm a single collector instance.
Whether the collector is processing traces only or handling traces, metrics, and logs, running everything through one instance can lead to:
- Bottlenecks in processing and exporting data
- Increased latency in trace availability
- Risk of dropped data during traffic spikes
- Limited fault tolerance if the single collector fails
To handle larger volumes reliably, we need to scale the collector horizontally by running multiple instances and distributing the load across them — while ensuring spans from the same trace go to the same collector.
Deployment patterns
No Collector
When we used head based sampling, services sent traces straight to Jaeger. That is a direct integration – no collector in the path.
Visual from the OpenTelemetry docs
Agent
With tail based sampling, we introduced the collector next to each service. This is the agent deployment pattern.
Applications instrumented with OTLP send telemetry to a collector running with the app or on the same host.
Visual from the OpenTelemetry docs
Pros
- Simple to get started
- Clear 1-1 mapping between application and collector
Cons
- Limited scalability, especially if the collector must handle traces, logs, and metrics
- Harder to manage at scale across many hosts
Gateway
The solution is the third pattern – the gateway.
In the gateway deployment, apps or sidecar collectors send telemetry to one OTLP endpoint that fronts a pool of collectors running as a standalone service – per cluster, per data center, or per region.
Visual from the OpenTelemetry docs

Important note – collectors are stateful 
Collectors hold data in memory. Tail sampling buffers spans until a decision is made.
If you scale collectors horizontally without coordination, different replicas may receive spans from the same trace. Each replica will decide on sampling independently. Results can diverge. You may end up with traces missing spans, which misrepresents what happened.
How to scale correctly
Place a load balancing layer of collectors in front of the tail sampling collectors.
Use the load-balancing exporter to route all spans of the same trace to the same backend collector.
It does this by hashing the trace id, or the service name, and consistently sending related spans to the same target.
OpenTelemetry provides this load-balancing exporter out of the box. Next, we will see how to configure it in code.
Docker changes for scaling with a gateway
- We renamed the
otel-collector
service tootel-collector-1
, and duplicated it asotel-collector-2
andotel-collector-3
- These three collectors run tail based sampling, and do not expose host ports, since services will not talk to them directly
- We added an
otel-gateway
service that runs the load balancing exporter, and exposes ports4317
and4318
- All app services send OTLP traffic to the gateway, and the gateway consistently routes each trace to one of the collectors
services:
otel-collector-1:
image: otel/opentelemetry-collector-contrib:0.130.0
command: ["--config=/etc/otel-collector.yaml"]
volumes:
- ./otel-collector.yaml:/etc/otel-collector.yaml
ports:
- "4317" # OTLP gRPC receiver
otel-collector-2:
image: otel/opentelemetry-collector-contrib:0.130.0
command: ["--config=/etc/otel-collector.yaml"]
volumes:
- ./otel-collector.yaml:/etc/otel-collector.yaml
ports:
- "4317" # OTLP gRPC receiver
otel-collector-3:
image: otel/opentelemetry-collector-contrib:0.130.0
command: ["--config=/etc/otel-collector.yaml"]
volumes:
- ./otel-collector.yaml:/etc/otel-collector.yaml]
ports:
- "4317" # OTLP gRPC receiver
# Otel gateway running load balancing exporter
otel-gateway:
image: otel/opentelemetry-collector-contrib:0.130.0
command: ["--config=/etc/otel-gateway.yaml"]
volumes:
- ./otel-gateway.yaml:/etc/otel-gateway.yaml
ports:
- 4317:4317 # OTLP gRPC
- 4318:4318 # OTLP HTTP
depends_on:
- otel-collector-1
- otel-collector-2
- otel-collector-3
otel-gateway.yaml – what it does
This file defines a gateway collector that accepts OTLP traffic from services and forwards spans to a pool of tail sampling collectors using the load balancing exporter.
receivers:
otlp:
protocols:
http:
endpoint: 0.0.0.0:4318
exporters:
loadbalancing:
routing_key: traceID
protocol:
otlp:
tls:
insecure: true
resolver:
static:
hostnames:
- otel-collector-1:4317
- otel-collector-2:4317
- otel-collector-3:4317
service:
telemetry:
logs:
level: debug
pipelines:
traces:
receivers: [otlp]
exporters: [loadbalancing]
Receivers
-
otlp
listens on4318
for HTTP since our services send traces via HTTP to0.0.0.0:4318
Exporters – loadbalancing
-
resolver.static.hostnames
lists the downstream collectors to send to - We use Docker Compose service names since within the Docker Compose network, service names act as DNS hostnames. so
otel-collector-1:4317
,otel-collector-2:4317
,otel-collector-3:4317
-
routing_key: traceID
means all spans that share the same trace id are routed to the same downstream collector, avoiding cases where different collectors sample parts of the same trace and cause incomplete or misleading results.
Service
-
telemetry.logs.level: debug
helps with debugging the gateway behavior. We’ve also added the same telemetry configuration to otel-collector.yaml so that all three collectors produce debug-level logs, making it easier to verify that everything is working correctly. -
pipelines.traces
wires theotlp
receiver to theloadbalancing
exporter
Time to Test
If you still have Docker running, run:
docker-compose down
docker-compose up --build
Then execute the ./hit_x_service.sh
script three times, just like we did when testing tail-based sampling without scaling the collector. This will generate 30 requests. We’d expect to see around 9 traces in Jaeger.
After checking Jaeger UI, we’ve received 10 traces (by coincidence, same as last test), 3 error traces and 7 normal traces. All traces show complete spans, meaning nothing was dropped. This confirms that spans from the same trace were routed to the same collector.
Now we need to confirm that traces were actually distributed across collectors, and not all sent to a single one. Since we enabled telemetry debug logs in every collector, we can run the following command for each collector service to filter useful logs:
docker-compose logs {{SERVICE_NAME}} 2>&1 | grep '"batch.len": [1-9]'
This filters out noisy logs, showing only batches where spans were sent.
Log results:
-
otel-collector-1 → 2 logs – total traces: 10 (sampled: 5, notSampled: 5). Here, total traces = sum of
"batch.len"
values (2 from first log + 8 from second log). - otel-collector-2 → 3 logs – total traces: 11 (sampled: 2, notSampled: 9)
- otel-collector-3 → 2 logs – total traces: 9 (sampled: 3, notSampled: 6)
This means all collectors together received exactly 30 traces, matching the requests sent.
Total sampled traces = 10, which matches what we see in Jaeger UI.
Total not-sampled traces = 20, as expected.
Final Architecture
And here is our final architecture:
This setup allows us to build a robust distributed tracing system that can absorb millions of traces efficiently, while keeping costs lower and reducing noise as much as possible.
By combining tail-based sampling, load balancing across multiple collectors, and selective sampling policies, we ensure that we capture the most valuable traces without overloading our backend.
This content originally appeared on DEV Community and was authored by taman9333