This content originally appeared on DEV Community and was authored by Harshil Jani
NumPy’s SIMD-Friendly Design Boosts Performance Over Python Lists
When optimizing Python code for speed, especially in data-heavy applications like machine learning or analytics, the choice of data structure matters a lot.
Python lists are slow in comparison to NumPy arrays for numerical tasks. Its use of contiguous memory makes SIMD (Single Instruction, Multiple Data) vector processing which is a hardware feature that processes multiple data elements in parallel much faster.
In this post, we’ll explore why NumPy’s design delivers massive performance improvements over Python lists, with simple explanations, a clear code example and benchmarks to prove it.
Why Memory Layout Matters
The difference between NumPy arrays and Python lists is how they store data:
- Python Lists (Scattered) : Each element is a separate Python object, stored at potentially different memory addresses.
- NumPy Arrays (Contiguous) : Data is stored in a single, continuous block of memory. This allows the CPU to grab chunks of data efficiently.
SIMD utilizes contiguous memory because it can load several values (e.g., 4 or 8 floats) into a vector register with one instruction. Scattered memory, like in Python lists, forces the CPU to access elements individually, preventing SIMD and causing:
- Cache Misses : Scattered data misses the CPU’s fast cache, fetching from slower main memory.
- No Parallelism : Individual accesses block SIMD, reducing throughput.
- Extra Overhead : Pointer chasing in scattered memory adds latency.
NumPy’s contiguous layout aligns perfectly with SIMD, enabling faster, parallel processing.
A Simple Benchmark to Show the Difference
Let’s test this with a basic operation by adding a constant to 100,000 numbers. We’ll compare a NumPy array to a Python list.
import numpy as np
import time
# Setup: 100,000 elements
size = 100_000
numpy_array = np.ones(size, dtype=np.float32) # Contiguous
python_list = [1.0] * size # Scattered
# Operation: Add 5 to each element
constant = 5.0
# NumPy (SIMD-friendly)
start = time.time()
numpy_result = numpy_array + constant
numpy_time = time.time() - start
# Python List (Scattered)
start = time.time()
python_result = [x + constant for x in python_list]
python_time = time.time() - start
# Results
print(f"NumPy Array Time: {numpy_time:.6f} seconds")
print(f"Python List Time: {python_time:.6f} seconds")
print(f"NumPy Speedup: {python_time / numpy_time:.2f}x")
print(f"NumPy result sample: {numpy_result[:5]}")
print(f"Python result sample: {python_result[:5]}")
Sample Output (varies by system):
NumPy Array Time: 0.000306 seconds
Python List Time: 0.010526 seconds
NumPy Speedup: 34.36x
NumPy result sample: [6. 6. 6. 6. 6.]
Python result sample: [6.0, 6.0, 6.0, 6.0, 6.0]
NumPy is ~34x faster here because its contiguous memory enables SIMD to process multiple elements per CPU cycle, while Python lists require slow, sequential access.
In a real-world application like processing millions of records in a data pipeline this speedup can mean the difference between a snappy service and one that struggles under load.
Is It Just for Integers? Supporting Other Data Types
NumPy is optimized for homogeneous data, so all elements in an array must be of the same type. This uniformity is what enables SIMD and contiguous memory benefits.
Python lists can hold mixed types (e.g., integers, strings, objects) but lack the performance optimization due to their scattered memory addresses.
Use NumPy Arrays When :
- Working with large, homogeneous numerical data (e.g., integers, floats) for performance-critical tasks.
- Needing vectorized operations (e.g., matrix multiplication, statistical computations).
- Integrating with numerical libraries (e.g., SciPy, Pandas).
Use Python Lists When :
- Dealing with small datasets or mixed data types (e.g., a list of [1, “text”, 3.14]).
- Prototyping non-numerical logic or needing dynamic resizing.
- Example: Storing configuration settings or a small collection of diverse objects.
For mixed-type or small-scale tasks, Python lists are more flexible. For numerical performance, especially with SIMD, NumPy’s typed arrays are the way to go. You can always convert a list to a NumPy array with np.array(my_list, dtype=desired_type) to leverage these benefits.
Encourage your team to experiment with NumPy in small tasks to see the benefits firsthand. A quick profiling session can highlight the performance wins. Do share your performance gains in the comments below.
This content originally appeared on DEV Community and was authored by Harshil Jani