This content originally appeared on DEV Community and was authored by Sushant Gaurav
What is the GIL?
GIL (Global Interpreter Lock) is a mutex (mutual exclusion lock) in CPython (the default Python implementation). It ensures that only one thread executes Python bytecode at a time (even if there are multiple CPU cores). So, at any moment, only one thread holds the GIL and runs the Python code.
Why does it exist?
Memory management in CPython is not thread-safe (Python uses reference counting to manage memory). GIL makes it simpler and faster to manage memory without complex thread synchronisation.
Note:
- GIL is not part of the Python language; it is specific to CPython.
- Other Python interpreters like Jython, IronPython, PyPy may not have a GIL (or have a different design).
- At any instant, one OS thread holds the GIL and runs Python bytecode. Threads time-slice (the interpreter periodically releases the GIL so another thread can run), but not in parallel.
Memory management in CPython is not thread-safe
CPython uses reference counting (plus a cyclic GC). Every Python object has a refcount
(how many references point to it). Incrementing or decrementing this counter must be atomic. When two operations modify a reference counter incorrectly, problems arise. If both decrement the counter from 1 to 0 at the same time, it can lead to a double free or memory corruption. On the other hand, if an increment is missed, the object may be released too early, causing a premature free while it is still in use (a classic use-after-free scenario).
With the GIL, CPython can do Py_INCREF
/ Py_DECREF
without per-field locks, keeping the interpreter simple and fast for the single-threaded common case.
Pseudo Code for race
// Two threads both decrement the same object's refcount concurrently.
void thread1() {
// read-modify-write not atomic:
int tmp = obj->ob_refcnt; // suppose 1
obj->ob_refcnt = tmp - 1; // becomes 0
if (obj->ob_refcnt == 0) free(obj); // frees memory
}
void thread2() {
int tmp = obj->ob_refcnt; // also reads 1 (raced)
obj->ob_refcnt = tmp - 1; // becomes 0 again
if (obj->ob_refcnt == 0) free(obj); // double free -> corruption
}
How does the GIL affect multithreaded and multiprocessing?
Multithreading
In CPython, multiple threads cannot run Python bytecode in parallel on multiple CPU cores because of the GIL. The GIL allows only one thread to run at a time. So, if 10 threads re-spawned to do CPU-heavy work (for example, calculating factorials), only one thread runs at a time, and the others wait.
Impact on multithreading
Since GIL prevents multiple threads from executing Python code in parallel, threads cannot fully utilise multiple CPU cores for CPU-heavy tasks.
The result: No speedup, or even slower performance due to context-switching overhead.
Example
In the above example:
- Main program spawns 4 threads
- Each thread requests the GIL
- Only one thread runs at a time
- After finishing, it releases the GIL
- Finally, all threads are done sequentially, not in parallel
Multiprocessing
Each process has its own GIL. So, in the case of multiprocessing, each process runs independently on separate CPU cores (no GIL contention).
That’s why multiprocessing is used for CPU-bound tasks in Python.
Note: It leads to more memory usage, and inter-process communication (IPC) can be slower than threads.
Example
Difference in behavior for I/O-bound vs CPU-bound code
Type | Description | Effect of GIL |
---|---|---|
I/O-bound | Waiting for I/O (disk, network, etc.) | GIL is released during I/O → other threads can run in between |
CPU-bound | Doing heavy computation (math, parsing, image processing, etc.) | GIL blocks other threads → no true parallelism |
What to do?
- For CPU-bound tasks: use multiprocessing.
- For I/O-bound tasks: use threads (or
asyncio
) are fine.
Alternatives to the GIL
Multiprocessing
Instead of threads, Python can use multiple processes, each with its own Python interpreter and GIL.
Python implementations without GIL
- Jython (Python on JVM): It does not have GIL. Threads map to JVM threads and can run in true parallel.
-
IronPython (Python on
.NET
CLR): It does not have GIL and it uses.NET
threading model. - PyPy (alternative Python implementation): Still has GIL (currently), but is faster due to JIT compilation. Some experimental branches tried removing the GIL.
Examples
1. CPU-bound Task (Prime Numbers)
No speedup with threads due to the GIL effect.
import threading
import time
# A CPU-heavy function: check if numbers are prime
def is_prime(n):
if n < 2:
return False
for i in range(2, int(n**0.5) + 1):
if n % i == 0:
return False
return True
def count_primes(limit):
count = 0
for num in range(1, limit):
if is_prime(num):
count += 1
return count
def worker(limit):
start = time.time()
count_primes(limit)
print(f"Done in {time.time() - start:.2f} sec")
if __name__ == "__main__":
LIMIT = 20000
# Run sequentially
start = time.time()
count_primes(LIMIT)
count_primes(LIMIT)
print("Sequential:", time.time() - start, "sec")
# Run with 2 threads
start = time.time()
t1 = threading.Thread(target=worker, args=(LIMIT,))
t2 = threading.Thread(target=worker, args=(LIMIT,))
t1.start(); t2.start()
t1.join(); t2.join()
print("With threads:", time.time() - start, "sec")
Output:
Sequential: 0.022531747817993164 sec
Done in 0.01 sec
Done in 0.01 sec
With threads: 0.02266693115234375 sec
2. I/O-bound Task (Downloading URLs)
Threads gives speedup.
import threading
import time
import requests
# I/O-heavy function: download a URL
def download(url):
resp = requests.get(url)
print(f"{url} done, size={len(resp.content)}")
if __name__ == "__main__":
urls = [
"https://httpbin.org/delay/2", # waits 2 sec before responding
"https://httpbin.org/delay/2",
"https://httpbin.org/delay/2",
"https://httpbin.org/delay/2",
"https://httpbin.org/delay/2"
]
# Sequential downloads
start = time.time()
for url in urls:
download(url)
print("Sequential:", time.time() - start, "sec")
# Parallel with threads
start = time.time()
threads = []
for url in urls:
t = threading.Thread(target=download, args=(url,))
t.start()
threads.append(t)
for t in threads:
t.join()
print("With threads:", time.time() - start, "sec")
Output:
https://httpbin.org/delay/2 done, size=357
https://httpbin.org/delay/2 done, size=357
https://httpbin.org/delay/2 done, size=357
https://httpbin.org/delay/2 done, size=357
https://httpbin.org/delay/2 done, size=357
Sequential: 43.5544159412384 sec
https://httpbin.org/delay/2 done, size=357
https://httpbin.org/delay/2 done, size=357
https://httpbin.org/delay/2 done, size=357
https://httpbin.org/delay/2 done, size=357
https://httpbin.org/delay/2 done, size=357
With threads: 14.384175062179565 sec
3. I/O-bound task example with threads (downloading multiple URLs)
This will help you see how threads can actually speed things up when the task is I/O heavy.
import requests
import time
import threading
# List of URLs to download (example sites)
urls = [
"https://www.example.com",
"https://www.python.org",
"https://httpbin.org/delay/2", # artificial delay
"https://www.github.com",
"https://www.wikipedia.org",
]
# Function to download content from a URL
def download_url(url):
print(f"Starting download: {url}")
response = requests.get(url)
print(f"Finished downloading {url} (size={len(response.text)} characters)")
# Run sequentially (one after another)
def run_sequential():
start = time.time()
for url in urls:
download_url(url)
end = time.time()
print(f"\nSequential time: {end - start:.2f} seconds\n")
# Run with threads (concurrently)
def run_with_threads():
start = time.time()
threads = []
for url in urls:
thread = threading.Thread(target=download_url, args=(url,))
threads.append(thread)
thread.start()
# Wait for all threads to finish
for thread in threads:
thread.join()
end = time.time()
print(f"\nThreaded time: {end - start:.2f} seconds\n")
# Run the two scenarios
if __name__ == "__main__":
print("=== Sequential ===")
run_sequential()
print("=== With Threads ===")
run_with_threads()
Output:
=== Sequential ===
Starting download: https://www.example.com
Finished downloading https://www.example.com (size=1256 characters)
Starting download: https://www.python.org
Finished downloading https://www.python.org (size=50199 characters)
Starting download: https://httpbin.org/delay/2
Finished downloading https://httpbin.org/delay/2 (size=357 characters)
Starting download: https://www.github.com
Finished downloading https://www.github.com (size=554068 characters)
Starting download: https://www.wikipedia.org
Finished downloading https://www.wikipedia.org (size=92 characters)
Sequential time: 40.75 seconds
=== With Threads ===
Starting download: https://www.example.comStarting download: https://www.python.org
Starting download: https://httpbin.org/delay/2
Starting download: https://www.github.com
Starting download: https://www.wikipedia.org
Finished downloading https://www.python.org (size=50199 characters)
Finished downloading https://www.example.com (size=1256 characters)
Finished downloading https://www.wikipedia.org (size=92 characters)
Finished downloading https://httpbin.org/delay/2 (size=357 characters)
Finished downloading https://www.github.com (size=553994 characters)
Threaded time: 6.18 seconds
4. CPU-bound in Processes (true parallelism means speedup)
Processes scale across cores (noticeable speedup)
from multiprocessing import Pool, cpu_count
import time
N = 50_000_00
P = min(4, cpu_count())
def cpu_task(n):
s = 0
for i in range(n):
s += i % 5
return s
if __name__ == "__main__":
# Baseline single-process
start = time.perf_counter()
cpu_task(N)
print(f"Single process took: {time.perf_counter() - start:.2f}s")
# Parallel across processes
start = time.perf_counter()
chunk = N // P
with Pool(P) as pool:
pool.map(cpu_task, [chunk]*P)
print(f"Processes ({P}) took: {time.perf_counter() - start:.2f}s")
Output:
Single process took: 0.14s
Processes (4) took: 0.15s
Summary to remember all easily
- GIL = a big lock around Python bytecode execution in CPython.
- CPU-bound + threads: serialized – no multicore speedup.
- I/O-bound + threads: GIL released while waiting – great concurrency.
- Multiprocessing: separate processes, separate GILs – true multicore.
- C extensions that release the GIL: threads can achieve true parallelism inside native code.
This content originally appeared on DEV Community and was authored by Sushant Gaurav