What is the GIL in Python and Why Should You Care? – ██FR█████ █INTELL███████████

This content originally appeared on DEV Community and was authored by Sushant Gaurav

What is the GIL?

GIL (Global Interpreter Lock) is a mutex (mutual exclusion lock) in CPython (the default Python implementation). It ensures that only one thread executes Python bytecode at a time (even if there are multiple CPU cores). So, at any moment, only one thread holds the GIL and runs the Python code.

Why does it exist?

Memory management in CPython is not thread-safe (Python uses reference counting to manage memory). GIL makes it simpler and faster to manage memory without complex thread synchronisation.

Note:

GIL is not part of the Python language; it is specific to CPython.

Other Python interpreters like Jython, IronPython, PyPy may not have a GIL (or have a different design).

At any instant, one OS thread holds the GIL and runs Python bytecode. Threads time-slice (the interpreter periodically releases the GIL so another thread can run), but not in parallel.

Memory management in CPython is not thread-safe

CPython uses reference counting (plus a cyclic GC). Every Python object has a refcount (how many references point to it). Incrementing or decrementing this counter must be atomic. When two operations modify a reference counter incorrectly, problems arise. If both decrement the counter from 1 to 0 at the same time, it can lead to a double free or memory corruption. On the other hand, if an increment is missed, the object may be released too early, causing a premature free while it is still in use (a classic use-after-free scenario).

With the GIL, CPython can do Py_INCREF / Py_DECREF without per-field locks, keeping the interpreter simple and fast for the single-threaded common case.

Pseudo Code for race

// Two threads both decrement the same object's refcount concurrently.

void thread1() {
    // read-modify-write not atomic:
    int tmp = obj->ob_refcnt;     // suppose 1
    obj->ob_refcnt = tmp - 1;     // becomes 0
    if (obj->ob_refcnt == 0) free(obj);  // frees memory
}

void thread2() {
    int tmp = obj->ob_refcnt;     // also reads 1 (raced)
    obj->ob_refcnt = tmp - 1;     // becomes 0 again
    if (obj->ob_refcnt == 0) free(obj);  // double free -> corruption
}

How does the GIL affect multithreaded and multiprocessing?

Multithreading

In CPython, multiple threads cannot run Python bytecode in parallel on multiple CPU cores because of the GIL. The GIL allows only one thread to run at a time. So, if 10 threads re-spawned to do CPU-heavy work (for example, calculating factorials), only one thread runs at a time, and the others wait.

Impact on multithreading

Since GIL prevents multiple threads from executing Python code in parallel, threads cannot fully utilise multiple CPU cores for CPU-heavy tasks.

The result: No speedup, or even slower performance due to context-switching overhead.

Example

In the above example:

Main program spawns 4 threads
Each thread requests the GIL
Only one thread runs at a time
After finishing, it releases the GIL
Finally, all threads are done sequentially, not in parallel

Multiprocessing

Each process has its own GIL. So, in the case of multiprocessing, each process runs independently on separate CPU cores (no GIL contention).

That’s why multiprocessing is used for CPU-bound tasks in Python.

Note: It leads to more memory usage, and inter-process communication (IPC) can be slower than threads.

Example

Difference in behavior for I/O-bound vs CPU-bound code

Type	Description	Effect of GIL
I/O-bound	Waiting for I/O (disk, network, etc.)	GIL is released during I/O → other threads can run in between
CPU-bound	Doing heavy computation (math, parsing, image processing, etc.)	GIL blocks other threads → no true parallelism

What to do?

For CPU-bound tasks: use multiprocessing.
For I/O-bound tasks: use threads (or asyncio) are fine.

Alternatives to the GIL

Multiprocessing

Instead of threads, Python can use multiple processes, each with its own Python interpreter and GIL.

Python implementations without GIL

Jython (Python on JVM): It does not have GIL. Threads map to JVM threads and can run in true parallel.
IronPython (Python on .NET CLR): It does not have GIL and it uses .NET threading model.
PyPy (alternative Python implementation): Still has GIL (currently), but is faster due to JIT compilation. Some experimental branches tried removing the GIL.

Examples

1. CPU-bound Task (Prime Numbers)

No speedup with threads due to the GIL effect.

import threading
import time

# A CPU-heavy function: check if numbers are prime
def is_prime(n):
    if n < 2:
        return False
    for i in range(2, int(n**0.5) + 1):
        if n % i == 0:
            return False
    return True

def count_primes(limit):
    count = 0
    for num in range(1, limit):
        if is_prime(num):
            count += 1
    return count

def worker(limit):
    start = time.time()
    count_primes(limit)
    print(f"Done in {time.time() - start:.2f} sec")

if __name__ == "__main__":
    LIMIT = 20000

    # Run sequentially
    start = time.time()
    count_primes(LIMIT)
    count_primes(LIMIT)
    print("Sequential:", time.time() - start, "sec")

    # Run with 2 threads
    start = time.time()
    t1 = threading.Thread(target=worker, args=(LIMIT,))
    t2 = threading.Thread(target=worker, args=(LIMIT,))
    t1.start(); t2.start()
    t1.join(); t2.join()
    print("With threads:", time.time() - start, "sec")

Output:

Sequential: 0.022531747817993164 sec
Done in 0.01 sec
Done in 0.01 sec
With threads: 0.02266693115234375 sec

2. I/O-bound Task (Downloading URLs)

Threads gives speedup.

import threading
import time
import requests

# I/O-heavy function: download a URL
def download(url):
    resp = requests.get(url)
    print(f"{url} done, size={len(resp.content)}")

if __name__ == "__main__":
    urls = [
        "https://httpbin.org/delay/2",  # waits 2 sec before responding
        "https://httpbin.org/delay/2",
        "https://httpbin.org/delay/2",
        "https://httpbin.org/delay/2",
        "https://httpbin.org/delay/2"
    ]

    # Sequential downloads
    start = time.time()
    for url in urls:
        download(url)
    print("Sequential:", time.time() - start, "sec")

    # Parallel with threads
    start = time.time()
    threads = []
    for url in urls:
        t = threading.Thread(target=download, args=(url,))
        t.start()
        threads.append(t)
    for t in threads:
        t.join()
    print("With threads:", time.time() - start, "sec")

Output:

https://httpbin.org/delay/2 done, size=357
https://httpbin.org/delay/2 done, size=357
https://httpbin.org/delay/2 done, size=357
https://httpbin.org/delay/2 done, size=357
https://httpbin.org/delay/2 done, size=357
Sequential: 43.5544159412384 sec
https://httpbin.org/delay/2 done, size=357
https://httpbin.org/delay/2 done, size=357
https://httpbin.org/delay/2 done, size=357
https://httpbin.org/delay/2 done, size=357
https://httpbin.org/delay/2 done, size=357
With threads: 14.384175062179565 sec

3. I/O-bound task example with threads (downloading multiple URLs)

This will help you see how threads can actually speed things up when the task is I/O heavy.

import requests
import time
import threading

# List of URLs to download (example sites)
urls = [
    "https://www.example.com",
    "https://www.python.org",
    "https://httpbin.org/delay/2",  # artificial delay
    "https://www.github.com",
    "https://www.wikipedia.org",
]

# Function to download content from a URL
def download_url(url):
    print(f"Starting download: {url}")
    response = requests.get(url)
    print(f"Finished downloading {url} (size={len(response.text)} characters)")

# Run sequentially (one after another)
def run_sequential():
    start = time.time()
    for url in urls:
        download_url(url)
    end = time.time()
    print(f"\nSequential time: {end - start:.2f} seconds\n")

# Run with threads (concurrently)
def run_with_threads():
    start = time.time()
    threads = []

    for url in urls:
        thread = threading.Thread(target=download_url, args=(url,))
        threads.append(thread)
        thread.start()

    # Wait for all threads to finish
    for thread in threads:
        thread.join()

    end = time.time()
    print(f"\nThreaded time: {end - start:.2f} seconds\n")

# Run the two scenarios
if __name__ == "__main__":
    print("=== Sequential ===")
    run_sequential()

    print("=== With Threads ===")
    run_with_threads()

Output:

=== Sequential ===
Starting download: https://www.example.com
Finished downloading https://www.example.com (size=1256 characters)
Starting download: https://www.python.org
Finished downloading https://www.python.org (size=50199 characters)
Starting download: https://httpbin.org/delay/2
Finished downloading https://httpbin.org/delay/2 (size=357 characters)
Starting download: https://www.github.com
Finished downloading https://www.github.com (size=554068 characters)
Starting download: https://www.wikipedia.org
Finished downloading https://www.wikipedia.org (size=92 characters)

Sequential time: 40.75 seconds

=== With Threads ===
Starting download: https://www.example.comStarting download: https://www.python.org

Starting download: https://httpbin.org/delay/2
Starting download: https://www.github.com
Starting download: https://www.wikipedia.org
Finished downloading https://www.python.org (size=50199 characters)
Finished downloading https://www.example.com (size=1256 characters)
Finished downloading https://www.wikipedia.org (size=92 characters)
Finished downloading https://httpbin.org/delay/2 (size=357 characters)
Finished downloading https://www.github.com (size=553994 characters)

Threaded time: 6.18 seconds

4. CPU-bound in Processes (true parallelism means speedup)

Processes scale across cores (noticeable speedup)

from multiprocessing import Pool, cpu_count
import time

N = 50_000_00
P = min(4, cpu_count())

def cpu_task(n):
    s = 0
    for i in range(n):
        s += i % 5
    return s

if __name__ == "__main__":
    # Baseline single-process
    start = time.perf_counter()
    cpu_task(N)
    print(f"Single process took: {time.perf_counter() - start:.2f}s")

    # Parallel across processes
    start = time.perf_counter()
    chunk = N // P
    with Pool(P) as pool:
        pool.map(cpu_task, [chunk]*P)
    print(f"Processes ({P}) took: {time.perf_counter() - start:.2f}s")

Output:

Single process took: 0.14s
Processes (4) took: 0.15s

Summary to remember all easily

GIL = a big lock around Python bytecode execution in CPython.
CPU-bound + threads: serialized – no multicore speedup.
I/O-bound + threads: GIL released while waiting – great concurrency.
Multiprocessing: separate processes, separate GILs – true multicore.
C extensions that release the GIL: threads can achieve true parallelism inside native code.

This content originally appeared on DEV Community and was authored by Sushant Gaurav