Ephemeral Autoscaling Runners in GitLab CI/CD



This content originally appeared on Level Up Coding – Medium and was authored by Dobri Kostadinov

A deep Theoretical guide to how GitLab Runner, AWS EC2, and ephemeral autoscaling truly work behind the scenes.

Modern DevOps teams need pipelines that are fast, scalable, and cost-efficient. Traditional CI/CD setups rely on always-on runners — servers that stay alive 24/7 even when no code is being built. This leads to:

  • wasted compute
  • long job queues
  • unpredictable performance
  • high AWS bills

The solution?
Ephemeral autoscaling runners — temporary compute machines that exist only while a CI job is running.

This article explains all the theory behind ephemeral GitLab runners, including how autoscaling works, what the GitLab Runner service actually is, why it needs its own EC2 instance, and how the entire lifecycle is orchestrated.

No configs.
No YAML.
Just pure theory.

🧩 1. So What Are Ephemeral Runners?

An ephemeral runner is an independent, temporary compute environment — usually an EC2 instance on AWS — created just-in-time to run a CI job, and destroyed immediately after.

Characteristics:

  • It exists only during job execution
  • Runs exactly one CI job
  • Contains no shared state
  • Leaves no leftovers (clean environment every time)
  • Costs you only per second of use
  • Provides massive parallelism

Ephemeral runners are the opposite of always-on runners, which stay alive 24/7 whether they run CI jobs or not.

🧱 2. The Three Key Components in GitLab Autoscaling

To understand autoscaling, you need to know the three building blocks:

🔹 1. GitLab CI Coordinator (GitLab.com)

GitLab schedules jobs and places them in a queue.

But GitLab does not create EC2 instances.
It only hands out CI jobs to available runners.

🔹 2. GitLab Runner Service (Controller Node)

This is the brain of the entire system.

It is:

  • a standalone program written in Go
  • developed by GitLab
  • installed by YOU
  • runs 24/7 on a small EC2
  • communicates with GitLab
  • launches and destroys ephemeral EC2s
  • tracks job status
  • performs cleanup
  • manages autoscaling logic

This runner service is the central controller.
All ephemeral EC2 instances are “workers” it supervises.

🔹 3. Ephemeral EC2 Instances (Job Runners)

These are disposable VMs that:

  • boot → run job → shut down
  • exist only for minutes
  • are created and destroyed by the GitLab Runner service
  • cost only for the time they run

🖥 3. Why the GitLab Runner Service Needs Its Own EC2 Instance

Many developers misunderstand this point — so let’s be absolutely clear:

The GitLab Runner service must run on a permanent server that never shuts down.

Why?

Because this service is responsible for:

  • polling GitLab for new jobs
  • deciding when to create EC2 instances
  • calling AWS APIs to launch/terminate VMs
  • tracking which VM belongs to which job
  • streaming logs back to GitLab
  • cleaning up idle or stuck instances
  • enforcing autoscaling rules
  • managing concurrency
  • handling artifacts and caching

If the runner service disappears — even for a minute:

  • no new EC2 instances can be created
  • jobs get stuck in “pending”
  • orphan EC2 instances might remain running
  • autoscaling collapses entirely

That’s why ephemeral runners still require one permanent controller runner.

🔍 4. How the GitLab Runner Service Works Internally

The GitLab Runner is a long-running daemon.
When started, it enters an infinite loop:

1⃣ Poll GitLab for jobs

It calls the GitLab API approximately every second:

GET /api/v4/jobs/request

If a job matches its tags, GitLab assigns it to that runner.

2⃣ Decide how to execute the job

Depending on the executor:

  • docker+machine → launch EC2
  • kubernetes → create pod
  • docker → create container
  • custom → execute prepare script

For AWS autoscaling:

The runner directly calls AWS via Docker Machine or the EC2 API.

But Wait. How the above statement works? Let clarify this. Let me open a bracket here and quickly explain it:

{

🧠 “The runner directly calls AWS via Docker Machine or the EC2 API” — What does this actually mean?

When people hear “GitLab Runner creates an EC2 instance,” they often imagine:

  • GitLab.com is calling AWS, or
  • AWS is somehow automatically creating machines, or
  • Some magic autoscaling group is involved

None of this is true.

The truth is:

The GitLab Runner service itself (the controller) is the component that talks to AWS and creates/destroys EC2 instances.

It does this using one of two mechanisms:

  1. via Docker Machine (most common)
  2. directly via AWS EC2 API (if using custom executors or certain setups)

Let’s explain both.

🔹 1. Using Docker Machine (MOST COMMON in GitLab autoscaling)

This is the default way GitLab autoscaling works with AWS.

✔ What is Docker Machine?

Docker Machine is an open-source CLI tool (originally from Docker) that can:

  • create virtual machines on cloud providers
  • install Docker on those machines
  • manage their lifecycle (start/stop/terminate)

GitLab Runner has a built-in integration with Docker Machine.

✔ How the Runner Uses Docker Machine

When a new job arrives, the runner executes commands like:

docker-machine create --driver amazonec2 ...

This command internally:

  1. Talks to the AWS EC2 API
  2. Launches an EC2 instance
  3. Attaches disks
  4. Configures SSH keys
  5. Installs Docker on the instance
  6. Prepares the VM for CI execution

When the job finishes, GitLab Runner runs:

docker-machine rm runner-abc123

And Docker Machine calls:

TerminateInstances

through the AWS API and deletes the VM.

✔ Important takeaway

Docker Machine is the middle layer between GitLab Runner and AWS.
GitLab Runner → Docker Machine → AWS EC2

GitLab Runner doesn’t call EC2 directly in this mode.
It delegates all VM management to Docker Machine.

🔹 2. Calling AWS EC2 API Directly (Custom/Advanced Setups)

In more advanced setups (like custom executors), you can bypass Docker Machine entirely.

This is when:

  • GitLab Runner triggers your own script
  • Your script calls AWS APIs (e.g., using AWS CLI, Terraform, or SDK)
  • Your script creates the EC2 instance manually

For example, your script might run:

aws ec2 run-instances \
--instance-type t3.medium \
--image-id ami-123456789 \
--tag-specifications ...

And when job finishes:

aws ec2 terminate-instances --instance-ids i-123456789

GitLab Runner in this setup doesn’t manage EC2 directly — it just tells your script to do it.

But the key idea is still:

The runner is responsible for triggering EC2 creation and destruction, not GitLab.com and not AWS itself.

🔹 3. Why This Matters

Because it clarifies the architecture:

  • GitLab.com does not manage AWS
  • AWS does not auto-create these instances
  • The GitLab Runner service — running on YOUR machine — handles everything

Therefore the autoscaling logic is:

  • configured by YOU
  • executed by YOUR runner
  • powered by Docker Machine or AWS CLI
  • fully maintained on YOUR infrastructure

This makes GitLab CI extremely flexible — it can scale across any cloud you have credentials for.

I am Closing the bracket here

}

3⃣ Launch an Ephemeral EC2 Instance

This includes:

  • choosing AMI
  • launching instance
  • attaching root disk
  • installing GitLab build script
  • connecting via SSH or cloud-init

The runner waits until the VM is ready.

4⃣ Execute the CI Job

Inside the ephemeral VM:

  • project is cloned
  • scripts run
  • tests are executed
  • logs stream back to GitLab
  • artifacts are uploaded

The GitLab Runner bridges communication between GitLab and the VM.

5⃣ Job Finishes — Runner Cleans Up

After success/failure:

  • runner tells GitLab the job is complete
  • runner calls AWS API to terminate EC2
  • EBS volumes are removed
  • internal state is cleaned

This happens automatically.

🔄 5. Full Lifecycle of an Ephemeral Runner

Below is the cleanest conceptual flow.

Lifecycle Diagram (ASCII for Medium)

┌─────────────────────────────────────────┐
│ GitLab.com │
│ (Job Queue / CI Coordinator) │
└──────────────────────┬──────────────────┘
│ polls API

┌─────────────────────────────────────────┐
│ GitLab Runner Service │
│ (Permanent Controller Node on EC2) │
└───────┬─────────────────────────────────┘
│ launches EC2 instance

┌─────────────────────────────────────────┐
│ Ephemeral EC2 Instance │
│ (Runs ONE job → terminates itself) │
└─────────────────────────────────────────┘

📘 6. Why Ephemeral Runners Are Superior

✔ Cost Efficiency

No idle machines = no wasted money.
Teams often save 50–80% over always-on runners.

✔ Unlimited Scalability

If 20 developers push code at the same time,
you can have 20 EC2 instances running in parallel.

✔ Perfect Isolation

Every job starts on a clean VM.

No:

  • polluted caches
  • leftover dependencies
  • hidden file conflicts

✔ Secure

Secrets don’t leak between jobs because no environment persists.

✔ Predictable Performance

No noisy neighbors sharing the same runner.

💸 7. Autoscaling vs Always-On Runners (Theory)

FeatureEphemeral AutoscalingAlways-On RunnersBillingPay only for usagePay 24/7SecurityPerfect isolationShared stateScalingInfiniteFixed # of machinesSpeedNo job queueQueue delays under loadCost~$30–50/month (typical)$100–300/month

🔮 8. What Actually Triggers Autoscaling?

Autoscaling is triggered by:

  • number of pending jobs
  • concurrency value (concurrent = N)
  • IdleCount (how many runners kept alive)
  • IdleTime (delay before termination)

The GitLab Runner service evaluates the load and makes decisions every few seconds.

🧬 9. The Golden Rule of Autoscaling

Your ephemeral EC2 instances are disposable workers.
Your GitLab Runner service is the permanent boss.

Jobs come and go.
The EC2 workers come and go.
The runner service coordinates everything.

🧨 10. Final Summary (Use This as Your Conclusion)

Ephemeral autoscaling transforms GitLab CI/CD into a highly efficient, scalable, and secure system. Instead of running fixed runners that are online 24/7, GitLab Runner dynamically creates EC2 instances only when jobs need to run, and destroys them immediately afterward. This approach gives you perfect isolation, unlimited parallelism, predictable performance, and drastically lower AWS costs.

The key idea is that everything is coordinated by the GitLab Runner service — a permanent controller node you install on its own EC2 instance. This service polls GitLab for jobs, provisions ephemeral machines, handles execution, streams logs, and tears down compute resources as soon as jobs complete.

Ephemeral runners are not just an optimization — they are the modern standard for large-scale CI/CD on AWS.

📬 Stay Updated

Thanks for reading till the end. If you enjoy my DevOps + Android + GitLab content, follow me on Medium to get notified when the next article is published:

👉 https://medium.com/@dobri.kostadinov

Dobri Kostadinov
Android Consultant | Trainer
Email me | Follow me on LinkedIn | Follow me on Medium | Buy me a coffee


🚀 Ephemeral Autoscaling Runners in GitLab CI/CD was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.


This content originally appeared on Level Up Coding – Medium and was authored by Dobri Kostadinov