This content originally appeared on DEV Community and was authored by Ashutosh Sarangi
Multithreading and Multiprocessing in Python
In Python, multithreading and multiprocessing are two ways to achieve concurrency, which is the ability to handle multiple tasks at the same time. While both can run multiple tasks, they do so in fundamentally different ways. The key difference lies in how they handle processes and threads.
Multithreading: In Python, multithreading involves a single process with multiple threads. Threads within the same process share the same memory space. This makes communication between threads very fast and efficient. However, Python’s Global Interpreter Lock (GIL) restricts the execution of only one thread at a time on a single CPU core. This means multithreading is great for I/O-bound tasks ( network requests or file reads) but not for CPU-bound tasks (tasks that require heavy computation).
Multiprocessing: Multiprocessing, on the other hand, involves multiple independent processes, each with its own memory space and its own Python interpreter. Because they are separate processes, they can run on different CPU cores simultaneously, bypassing the GIL. This makes multiprocessing ideal for CPU-bound tasks that can be split into independent sub-tasks, as it can leverage multiple CPU cores to speed up execution.
Example: Multithreading vs. Multiprocessing
Let’s use a simple example to illustrate the difference.
Multithreading Example (I/O-bound): A program that downloads multiple files from the internet. While one thread is waiting for a file to download, another can start downloading another file. This doesn’t involve heavy computation, so multithreading is a good fit. The program’s overall execution time is reduced because it’s not waiting for one download to finish before starting the next.
Multiprocessing Example (CPU-bound): A program that performs complex mathematical calculations on a large dataset. By using multiprocessing, you can split the dataset and have a separate process perform calculations on each part. This allows the program to utilize multiple CPU cores, dramatically speeding up the total calculation time.
Synchronous vs. Asynchronous Programming
Synchronous programming is the default, sequential way of executing code. Each task must wait for the previous one to complete before it can begin. It’s like a single-lane road where cars must follow one after another. If a task is slow, the entire program is blocked, and other tasks cannot proceed. This is simple and predictable but can be inefficient for tasks that involve waiting.
Asynchronous programming is a non-blocking approach that allows a program to initiate a task and then move on to other tasks without waiting for the first one to complete. It’s like a chef taking an order, starting the dish, and then starting the next dish while the first one is cooking. When the first dish is ready, the chef can return to it. In Python, this is often implemented using the asyncio
library, which uses an event loop to manage and schedule tasks.
Example: Sync vs. Async
Let’s consider a web scraper that needs to fetch data from multiple websites.
Synchronous Example: The program fetches data from website A. It waits for the download to complete. Then it fetches data from website B. It waits for that download to complete, and so on. This process is sequential and can be very slow if one of the websites is slow to respond.
Asynchronous Example: The program initiates a request to website A, and while it’s waiting for the response, it immediately sends a request to website B. It continues this process for all the websites. When a response from any of the websites arrives, the program can handle it. This approach is much more efficient because it uses the “waiting time” to do other work, allowing the program to fetch data from multiple sources concurrently without blocking.
Event loop in Python
In Python’s asyncio
, the event loop manages a single queue of tasks (coroutines) and callbacks. When a coroutine “awaits” an I/O operation (e.g., a network request), it yields control back to the event loop. The event loop then checks for completed I/O events and moves on to the next available task in its queue. The concept of “microtasks” is implicitly handled. When a task completes and schedules a callback (like the then
part of an await
operation), that callback is added directly to the event loop’s queue and will be processed very soon.
- There isn’t a separate, higher-priority “micro” queue that gets fully drained before the main queue is checked again; instead, the
asyncio
loop is designed to be highly responsive to events and resume tasks as soon as they become ready.
Where Python’s Event Loop is Located
Unlike JavaScript’s event loop, which is an integral part of the browser or Node.js runtime, Python’s event loop is part of the asyncio
library itself. It’s a key component of the asyncio
module and is run as a single-threaded process within your Python application. When you use asyncio.run()
, a new event loop is created, it runs all the tasks you’ve scheduled, and then it shuts down when they are complete.
The event loop is essentially a while True
loop that continuously monitors and dispatches events. It’s a central hub that:
- Checks if any coroutines are ready to resume execution.
- Handles completed I/O operations (e.g., from network sockets).
- Manages scheduled callbacks and timers.
By default, there’s only one event loop per thread, and it’s generally recommended to run all your asyncio
code within that one event loop in the main thread. To handle CPU-bound tasks without blocking the event loop, you would typically offload them to a separate thread or process using methods like loop.run_in_executor()
.
simple code examples to illustrate these concepts.
Multithreading vs. Multiprocessing
Multithreading is best for I/O-bound tasks (like downloading files or network requests) where the program spends most of its time waiting. The Global Interpreter Lock (GIL) limits multithreading to a single CPU core.
import threading
import time
def task(name):
print(f"Thread {name}: Starting...")
time.sleep(2) # Simulate an I/O-bound task (e.g., waiting for a network response)
print(f"Thread {name}: Finishing.")
threads = []
for i in range(3):
t = threading.Thread(target=task, args=(i,))
threads.append(t)
t.start()
for t in threads:
t.join() # Wait for all threads to complete
print("All threads have finished.")
In this example, three threads run concurrently. While one thread is sleeping (waiting), another can start, which is faster than running them sequentially.
Multiprocessing is ideal for CPU-bound tasks (like heavy mathematical calculations) because it bypasses the GIL by using separate processes, each with its own interpreter.
import multiprocessing
import time
def task(name):
print(f"Process {name}: Starting...")
time.sleep(2) # Simulate a CPU-bound task (e.g., complex calculation)
print(f"Process {name}: Finishing.")
if __name__ == "__main__":
processes = []
for i in range(3):
p = multiprocessing.Process(target=task, args=(i,))
processes.append(p)
p.start()
for p in processes:
p.join() # Wait for all processes to complete
print("All processes have finished.")
Here, three independent processes are created. They can run on different CPU cores simultaneously, offering a true speed-up for CPU-intensive work.
Synchronous vs. Asynchronous Programming
Synchronous programming is sequential and blocking. Each task must wait for the previous one to complete.
import time
def fetch_data(url):
print(f"Fetching data from {url}...")
time.sleep(2) # Simulate a network request
print(f"Finished fetching data from {url}.")
return "Data from " + url
start_time = time.time()
data1 = fetch_data("website A")
data2 = fetch_data("website B")
data3 = fetch_data("website C")
print(f"All data fetched in {time.time() - start_time:.2f} seconds.")
The total time for this script to run will be approximately 6 seconds (2 seconds for each function call) because each call blocks the program until it finishes.
Asynchronous programming is non-blocking and event-driven. It allows the program to switch to other tasks while waiting for a long-running operation to complete. This is typically done with Python’s asyncio
library.
import asyncio
import time
async def fetch_data(url):
print(f"Fetching data from {url}...")
await asyncio.sleep(2) # Simulate a network request
print(f"Finished fetching data from {url}.")
return "Data from " + url
async def main():
start_time = time.time()
tasks = [
fetch_data("website A"),
fetch_data("website B"),
fetch_data("website C")
]
await asyncio.gather(*tasks) # Run tasks concurrently
print(f"All data fetched in {time.time() - start_time:.2f} seconds.")
if __name__ == "__main__":
asyncio.run(main())
In this asyncio
example, the await asyncio.sleep(2)
tells the program to pause this specific task but not block the entire program. Instead, the event loop can switch to another task. The total execution time will be around 2 seconds, as all three tasks are initiated and run concurrently. This is highly efficient for I/O-bound operations.
- The primary benefit of
asyncio
is in managing concurrency, specifically for I/O-bound tasks. If you were making a hundred API calls, usingasyncio
would allow you to initiate all the requests concurrently, saving a significant amount of time by not waiting for each one to finish before starting the next. For a single call, there’s no concurrency to manage, so the overhead of setting up anasync
function and an event loop provides no performance benefit.
Here’s a comparison:
Synchronous Approach (Recommended for a Single Call)
This approach is straightforward and easy to read. The code executes one line at a time.
import requests
def get_data_sync(url):
response = requests.get(url)
return response.json()
data = get_data_sync("https://api.example.com/data/1")
print(data)
Asynchronous Approach (Unnecessary for a Single Call)
While you can write a function with async
and await
, it’s overkill for a single call. You need to use an async
library like aiohttp
and run the function within an event loop.
import asyncio
import aiohttp
async def get_data_async(url):
async with aiohttp.ClientSession() as session:
async with session.get(url) as response:
return await response.json()
async def main():
data = await get_data_async("https://api.example.com/data/1")
print(data)
if __name__ == "__main__":
asyncio.run(main())
You don’t put async
and await
in the get_data_sync
function because it’s a synchronous function, not an asynchronous one. It’s using the requests
library, which is a synchronous library.
Here’s why:
-
requests
library is synchronous: Therequests.get()
function is designed to be blocking. When you call it, your program stops and waits for the entire HTTP request to complete (sending the request, waiting for the server’s response, and receiving the data) before it proceeds to the next line of code. -
async
andawait
require anasyncio
ecosystem: The keywordsasync
andawait
are part of Python’sasyncio
framework. They are used to define and manage coroutines, which are functions designed for non-blocking, asynchronous I/O operations. For these keywords to work, the function must be run within an event loop and use an asynchronous library likeaiohttp
that is compatible withasyncio
.
Since your function uses requests
, which is a synchronous library and doesn’t communicate with an event loop, adding async
and await
would be incorrect and would lead to syntax errors or unexpected behavior.
Similarity between multithreading and asyncio
in Python
Shared Goal: Concurrency in I/O-Bound Tasks
Both multithreading and asyncio
are excellent for I/O-bound tasks. This is any task that spends most of its time waiting for an external operation to complete, like:
- Making network requests: Waiting for a web server to respond.
- Reading/writing files: Waiting for data to be read from or written to a disk.
- Database queries: Waiting for a database to execute a query.
In these scenarios, the CPU is largely idle. Both multithreading and asyncio
prevent your program from freezing while it waits, allowing it to work on other tasks in the meantime.
The Fundamental Difference: How They Handle Concurrency
The similarity ends at their core mechanism.
Multithreading uses preemptive multitasking. The operating system (OS) and the Python interpreter decide when to switch between threads. The threads themselves don’t give up control voluntarily. This is what makes it “preemptive”—the OS preempts a running thread to give another one a turn. Because of Python’s Global Interpreter Lock (GIL), only one thread can execute Python bytecode at a time, so it’s not true parallelism but a form of interleaved concurrency.
asyncio
uses cooperative multitasking. The tasks themselves (coroutines) voluntarily give up control to the event loop when they encounter anawait
keyword. The event loop is a single-threaded loop that checks for completed tasks and schedules the next one. This is what makes it “cooperative”—each task must cooperate by yielding control. Because it’s single-threaded, it doesn’t face the GIL limitations that multithreading does.
Metaprogramming in Python
- It is the creation of programs that write or manipulate other programs. In essence, it’s about making code that can inspect, generate, or modify itself at runtime. It’s a powerful technique often used to reduce code duplication, create flexible APIs, and build frameworks.
Key Concepts
Metaprogramming in Python is primarily achieved through:
- Decorators: Functions that wrap other functions or classes to extend their behavior without permanent modification. They’re a form of syntactic sugar for wrapping a callable.
- Class Decorators: Similar to function decorators, but they modify the behavior of a class.
- Metaclasses: The most advanced form of metaprogramming. A metaclass is the “class of a class.” When you define a class, its metaclass is responsible for creating it. By creating a custom metaclass, you can control how classes are defined and how they behave.
Decorators in Python
A decorator in Python is a design pattern that allows you to dynamically extend or modify the behavior of a function or class without changing its source code. It’s essentially a callable (like a function) that takes another callable as an argument and returns a new callable.
Decorators use a special syntax with the @
symbol, which is just syntactic sugar for a more verbose function call.
# This:
@my_decorator
def my_function():
# ...
# Is equivalent to this:
def my_function():
# ...
my_function = my_decorator(my_function)
The core idea is to “wrap” a function or class to add new functionality before or after the original code runs.
Function Decorators
A function decorator is a function that takes another function as an argument, adds some new functionality, and returns the modified function. This is most commonly used for tasks like logging, timing, authentication, or validation.
Example: A Simple Timer Decorator
This decorator measures how long a function takes to execute.
import time
def timing_decorator(func):
"""A decorator that prints the execution time of a function."""
def wrapper(*args, **kwargs):
start_time = time.time()
result = func(*args, **kwargs)
end_time = time.time()
print(f"'{func.__name__}' ran in {end_time - start_time:.4f} seconds.")
return result
return wrapper
@timing_decorator
def complex_calculation(n):
"""Simulates a complex calculation."""
sum_val = 0
for i in range(n):
sum_val += i
return sum_val
# Calling the decorated function
result = complex_calculation(10000000)
print(f"Calculation result: {result}")
In this example, timing_decorator
is the decorator function. It defines a new function wrapper
that encapsulates the original complex_calculation
function. When you call complex_calculation(10000000)
, you are actually calling the wrapper
function. The wrapper executes its own timing logic, calls the original function, and then prints the elapsed time before returning the result.
Class Decorators
A class decorator is a callable that takes a class object as an argument and returns a new class object. Class decorators are used to modify or extend the behavior of an entire class. Common use cases include enforcing a class structure, adding attributes to all instances of a class, or automatically registering a class with a registry.
Example: A Class Decorator for Enforcing an Interface
This decorator checks if a class implements certain methods and raises an error if it doesn’t.
def enforce_interface(cls):
"""A class decorator to ensure 'start' and 'stop' methods exist."""
if not hasattr(cls, 'start') or not callable(getattr(cls, 'start')):
raise TypeError(f"Class '{cls.__name__}' must have a 'start' method.")
if not hasattr(cls, 'stop') or not callable(getattr(cls, 'stop')):
raise TypeError(f"Class '{cls.__name__}' must have a 'stop' method.")
return cls
@enforce_interface
class Car:
def __init__(self, model):
self.model = model
def start(self):
print(f"Starting the {self.model}.")
def stop(self):
print(f"Stopping the {self.model}.")
# This class will raise a TypeError when it's defined because it lacks a 'stop' method
try:
@enforce_interface
class Motorcycle:
def __init__(self, model):
self.model = model
def start(self):
print(f"Starting the {self.model}.")
except TypeError as e:
print(f"\nCaught an error: {e}")
In this example, @enforce_interface
is the class decorator. When the Car
class is defined, the enforce_interface
function is called with Car
as an argument. The decorator checks for the required methods and returns the class. When Motorcycle
is defined, the check fails, and the TypeError
is raised immediately, providing static analysis at runtime.
This content originally appeared on DEV Community and was authored by Ashutosh Sarangi