Memory Management Unit (MMU) and Translation Lookaside Buffer (TLB) – ██FR█████ █INTELL███████████

This content originally appeared on DEV Community and was authored by Abdulhai Mohamed Samy

Difficulty: Advanced

Reading Time: 11 min read

Last Updated: June 30, 2025

Understanding Memory Management Unit (MMU) and Translation Lookaside Buffer (TLB)

The Memory Management Unit (MMU) is a critical hardware component in a computer system that handles memory management and address translation. It sits between the CPU and the main memory (RAM) and is responsible for translating virtual addresses (used by software) into physical addresses (used by hardware). The MMU plays a key role in enabling features like virtual memory, memory protection, and paging.

1. What Does the MMU Do?

The Memory Management Unit (MMU) primarily manages primary memory (RAM) and facilitates the translation of logical/virtual addresses into physical addresses. It plays a key role in managing the following:

Virtual Memory: Ensures processes have their own virtual address space.
RAM: Controls direct access and allocation.
Cache Memory (indirectly): While the MMU doesn’t directly manage cache, it works with it by translating addresses used in memory accesses.

Does MMU relate to and manage all memory storage types, or is it specific?

It doesn’t directly manage secondary memory (like HDD or SSD), but it works in conjunction with the operating system to handle paging/swapping between RAM and secondary storage.

So, MMU is specific to primary memory and virtual memory management.

2. Key Functions of the MMU

1-Virtual-to-Physical Address Translation

Converts virtual addresses (used by programs) into physical addresses (used by the RAM memory hardware) by using a page table, maintained by the operating system, to map virtual pages to physical pages.
MMU allows programs to use a contiguous virtual address space, even if the physical memory is fragmented.
The circuit that does this is the Translation Lookaside Buffer (TLB)

2-Memory Protection

Prevents programs from accessing unauthorized memory regions.
Ensures one program cannot corrupt another program’s or the operating system’s memory.

3-Paging and Segmentation

Divides memory into fixed-size blocks called pages.
Enables efficient memory use by swapping pages between RAM and secondary storage during paging or swapping.
If a program tries to access a page that is not in RAM (Triggers a page fault), the MMU triggers the operating system to load the required page from secondary storage into RAM. This process is called paging or swapping.

4-Cache Control

Coordinates with the CPU cache to ensure that the correct data is fetched from memory.

5-Handling Page Faults

Detects page faults and works with the OS to load required pages from secondary storage into RAM.

3. Components of the MMU

1-Page Table Base Register (PTBR):

Points to the base address of the page table in memory.

2-Translation Lookaside Buffer (TLB):

A fast cache that stores recent virtual-to-physical address translations to speed up access.

3-Page Table Entries (PTEs):

The page table contains entries for each virtual page, specifying its corresponding physical page (if it exists in RAM) or its location in secondary storage. Each entry contains:
- Physical Page Number (PPN): The physical address of the page.
- Valid Bit: Indicates if the page is in RAM.
- Access Control Bits: Specify read/write/execute permissions.

4-Page Fault Handler:

Manages page faults by loading missing pages from secondary storage.

4. Translation Lookaside Buffer (TLB)

1-What is the TLB?

A small, high-speed cache within the MMU that stores recently used virtual-to-physical address mappings.
Reduces translation time by avoiding frequent page table lookups in main memory.

2-Why is the TLB Needed?

Virtual Memory: Modern systems use virtual memory, where programs operate in a virtual address space that is mapped to physical memory.
Address Translation: Each memory access requires translating a virtual address to a physical address using a page table.
Performance Overhead: Accessing the page table in main memory for every address translation is slow.
Solution: The TLB caches recently used translations to avoid frequent page table lookups.

3-How the TLB Works

The TLB operates as follows:

Virtual Address Translation:
- When the CPU generates a virtual address, the MMU first checks the TLB for a matching translation.
- If the translation is found in the TLB (TLB hit), the physical address is used directly.
TLB Miss:
- If the translation is not found in the TLB (TLB miss), the MMU must access the page tablein main memory to find the translation.
- Once the translation is found, it is added to the TLB for future use.
Page Table Walk:
- A page table walk is the process the CPU performs to translate a virtual address into a physical address by consulting the page tables in memory.
- In case of a TLB miss, the MMU performs a page table walk to find the translation.
- This involves traversing the page table hierarchy (e.g., multi-level page tables in modern systems).
TLB Update:
- After a page table walk, the MMU updates the TLB with the new translation.
- If the TLB is full, an existing entry is replaced using a replacement policy (e.g., LRU – Least Recently Used).

4-TLB Structure

Organized as a fully associative or set-associative cache with entries containing:

Virtual Page Number (VPN): Part of the virtual address.
Physical Page Number (PPN): Corresponding physical address.
Flags:
- Valid Bit: Indicates if the entry is valid.
- Dirty Bit: Marks if the page has been modified.
- Access Permissions: Read, write, and execute permissions.
ASID (Address Space ID): For process-specific entries.

5. TLB Flush

A TLB Flush refers to clearing the Translation Lookaside Buffer (TLB) — a small, fast cache that stores recent virtual-to-physical address translations.

A TLB flush clears entries from the TLB so that:

Stale or invalid address mappings are removed.
The CPU reloads mappings from the page tables when needed.

5.1 Types of TLB Flush

Full Flush: Clears the entire TLB (e.g., on process switch in older CPUs).
Partial Flush: Invalidates specific entries (e.g., using Address Space IDs(ASIDs), or Process-Context Identifiers (PCIDs) in modern CPUs).

5.2 When Does a TLB Flush Happen?

Context Switch
- When the CPU switches to a new process, the virtual address space changes.
- TLB entries from the old process are invalid for the new one.
- So, TLB is flushed (unless using ASIDs/ PCIDs– see below).
Page Table Changes
- If the OS updates or unmaps virtual memory (e.g., via mmap, munmap, fork, exec):
- It must flush the TLB to remove outdated entries.
TLB Shootdown (Multicore CPUs)
- If one core changes a page table, the other cores’ TLBs must be flushed.
- OS sends an Inter-Processor Interrupt (IPI)to request a TLB flush on other cores.
System Calls and Kernel Actions
- System calls like mprotect, fork, exec, exit, or remap_file_pages may flush the TLB.
- Some syscalls affect memory permissions or layout.

5.3 Performance Impact

Overhead: TLB flush is costly (e.g., 100–1000 cycles), as subsequent memory accesses require a slow page table
Mitigation: Modern CPUs use ASIDs or PCIDs (e.g., Intel’s Process-Context Identifiers) to identify the process in the TLB entries, reducing full flushes during context switches.
Consequence: Frequent flushes (e.g., during Python forking or kernel updates) can degrade performance due to increased page table lookups.
Instead of flushing the entire TLB, the CPU ignores old entries that do not match the current ASID.

5.4 -What Happens During a Flush?

All or part of the TLB entries are invalidated.
Future memory accesses → page table walk→ updated entry loaded into TLB.

Why Are TLB Flushes Expensive?

A flushed TLB means more page table walks, which are slower.
Especially bad if it happens frequently(e.g., during forking in Python or during kernel page table updates).

6. TLB Replacement Policies

When the TLB becomes full and a new translation must be inserted, the MMU must choose which existing entry to evict. Common replacement policies include:

LRU (Least Recently Used)
- Evicts the entry that hasn’t been used for the longest time.
- Balances simplicity and effectiveness, but can be complex to implement exactly in hardware.
Pseudo-LRU / Approximate LRU
- Cheaper, hardware-friendly approximations of true LRU.
Random Replacement
- Randomly selects an entry to replace; used in some CPUs because it’s very simple and fast.
FIFO (First-In, First-Out)
- Replaces the oldest entry; simple but doesn’t always match access patterns well.

Modern processors often use pseudo-LRU or hybrid policies to keep hardware complexity low while maintaining good performance.

7. TLB Associativity and Sizes

The structure of the TLB significantly affects its performance:

Fully Associative TLB
- Any entry can go into any slot.
- Maximizes flexibility and minimizes conflicts.
- Expensive and complex to build in hardware.
Set-Associative TLB
- Compromise: TLB is divided into several sets, and each virtual page can only map into a small number of slots within a set.
- Common choice (e.g., 4-way, 8-way associativity) in modern CPUs.
Direct-Mapped TLB
- Each virtual page number maps to exactly one slot.
- Fastest and simplest, but high risk of conflicts.

Size:

The size of a TLB refers to the number of entries it can hold.

A larger TLB can store more page table entries, which reduces the likelihood of a TLB miss. This generally leads to better performance because it avoids the costly operation of walking the page tables in main memory.
Increasing TLB size also increases hardware complexity, power consumption, and the time it takes to search the TLB. Typical TLB sizes range from dozens to thousands of entries (e.g., 12 bits to 4,096 entries).
Typical sizes: 32 to a few hundred entries.
Separate instruction TLB (iTLB) and data TLB (dTLB) may each have different sizes.
Larger TLBs reduce the chance of misses but add lookup complexity and slightly increase access latency.

Example: Intel Skylake CPUs have 64-entry iTLBs and 64-entry dTLBs (L1), plus a larger unified L2 TLB.

8. Split vs Unified TLB

Modern processors may use either:

Split TLB
- Separate TLBs for instructions (iTLB) and data (dTLB).
- Allows the CPU to look up instruction and data addresses in parallel, improving throughput.
- Helps avoid contention between instruction and data accesses.
- Pros: Eliminates conflicts between instruction and data accesses, potentially improving overall performance, especially in pipelined processors where instruction fetches and data accesses can occur in parallel.
- Cons: Increases hardware complexity and power consumption. Each TLB is typically smaller than a comparable unified TLB, which could lead to more misses if the working set for either instructions or data alone is large
Unified TLB
- A single TLB shared by both instruction fetches and data loads/stores.
- Simpler to design and saves die area, but may create conflicts between instruction and data accesses.
- Pros: Simpler design, potentially better utilization of TLB entries if one type of access (instruction or data) is dominant at a given time.
- Cons: Conflicts can arise between instruction and data accesses, potentially leading to more misses if both are frequently accessing different pages.

Many modern CPUs combine both: split L1 TLBs (iTLB, dTLB) plus a shared L2 TLB.

9. TLB Hierarchy

Just like CPU caches, TLBs can be organized in a hierarchy to balance speed, size, and cost.

L1 TLB (Level 1 TLB):
- Characteristics: Small, very fast, typically located very close to the CPU core. Often split into ITLB and DTLB.
- Purpose: To provide the fastest possible address translation for frequently accessed pages, minimizing latency for the most critical memory operations.
- Hit Time: Very low (e.g., 0.5-1 clock cycle).
L2 TLB (Level 2 TLB) / Last-Level TLB (LLTLB or STLB – Second-Level TLB):
- Characteristics: Larger and slower than L1 TLBs, but still much faster than accessing page tables in main memory. It might be unified or shared among multiple cores.
- Purpose: To handle TLB misses from the L1 TLB, providing a larger capacity to store translations for a wider range of pages, thus reducing the number of times the system needs to walk the page tables in main memory.
- Miss Penalty: When an L1 TLB miss occurs, the L2 TLB is checked. If found, it’s still a TLB hit, but with slightly higher latency than an L1 hit.
- Hit Rate: The L2 TLB aims to have a very high hit rate to prevent page table walks.

How it works:

When the CPU generates a virtual address, it first checks the L1 TLB.
If there’s an L1 TLB hit, the physical address is returned quickly, and the memory access proceeds.
If there’s an L1 TLB miss, the request is forwarded to the L2 TLB.
If there’s an L2 TLB hit, the physical address is retrieved from the L2 TLB, and the L1 TLB is updated with this translation (often, the newly accessed entry is brought into L1).
If there’s an L2 TLB miss (a “full TLB miss”), then the MMU (Memory Management Unit) or the operating system (depending on whether it’s hardware-managed or software-managed) must walk the page tables in main memory to find the translation. This is the slowest scenario, incurring a significant “miss penalty” (10-100 clock cycles or more). Once the translation is found, it’s typically loaded into both the L1 and L2 TLBs for faster future access.

10. Example of MMU and TLB Operation

Program Issues a Memory Access Request
- A user-space application executes an instruction (e.g., LOAD R1, [0x1000]) to read data from the virtual address 0x1000.
- This address is in virtual address space, not directly mapped to physical memory.
MMU Receives the Virtual Address
- The Memory Management Unit (MMU) intercepts this request.
- It is responsible for translating virtual addresses to physical addresses using paging structures (page tables).
TLB Lookup
- The MMU checks the Translation Lookaside Buffer (TLB)—a small, fast cache that stores recently used virtual-to-physical address mappings.
- If the virtual page number (VPN) of 0x1000 is found in the TLB (TLB hit), the MMU retrieves the corresponding physical frame number (PFN) immediately.
TLB Miss Handling
- If the entry is not found in the TLB:
  1. The MMU must perform a page table walk:
    - It uses a base address from the CR3 register (x86) or TTBR0/TTBR1 (ARM).
    - It traverses the multi-level page table hierarchy to resolve the physical address.
  2. Once the Page Table Entry (PTE) is found:
    - The physical frame number is extracted.
    - The MMU checks:
      - Valid bit – Is the page in memory?
      - Permission bits – Does the process have read/write/execute access?
  3. The resolved translation is inserted into the TLB for faster future access.
Access Control Check
- The MMU verifies that the access type (e.g., read/write/execute) is allowed by the permissions encoded in the PTE.
- If access is not permitted, a protection fault (e.g., segmentation fault) is triggered.
Page Fault (if required)
- If the PTE indicates the page is not in RAM (e.g., the valid bit is clear), a page fault interrupt is generated:
  - The OS page fault handler:
    - Locates the page on disk (e.g., swap file).
    - Allocates a free frame in RAM.
    - Loads the page into memory.
    - Updates the page table entry (sets valid bit, updates PFN).
    - Optionally flushes the TLB entry if invalidated.
Physical Address Construction
- The MMU combines:
  - The physical frame number (PFN) from the TLB/page table.
  - With the page offset from the virtual address.
- This results in a complete physical address.
Memory Access
- The memory subsystem receives the physical address.
- It completes the data fetch or write.
- The CPU continues execution with the retrieved data.

11. Conclusion

The Memory Management Unit (MMU) and the Translation Lookaside Buffer (TLB) are at the heart of modern operating systems and processor architectures.

They make virtual memory practical by efficiently translating virtual addresses to physical addresses, enforcing memory protection, and supporting paging mechanisms.

The TLB, as a specialized cache, dramatically improves performance by avoiding frequent, slow page table walks. Advanced designs, like multi-level TLB hierarchies, split vs. unified TLBs, and intelligent replacement policies, reflect the engineering trade-offs between speed, complexity, and scalability in modern CPUs.

Understanding how the MMU and TLB operate internally not only deepens your knowledge of operating systems and low-level system design but also explains why certain performance bottlenecks (like TLB flushes or page faults) can arise in real-world applications.

12. Key Takeaways

MMU translates virtual to physical addresses
- Uses page tables to manage virtual memory.
- Ensures memory protection and process isolation.
TLB accelerates address translation
- Caches recently used translations.
- Avoids costly page table walks.
Virtual memory enables efficient, isolated execution
- Each process gets its own address space.
- Swapping extends usable memory beyond RAM.
TLB misses and flushes affect performance
- TLB misses cause page table walks.
- TLB flushes clear mappings (e.g., on context switch or memory remap).
- Modern CPUs use ASIDs/PCIDs to minimize full flushes.
Page faults trigger OS intervention
- The OS loads missing pages into RAM from disk.
- MMU resumes execution once translation is resolved.
Page table walks are expensive
- Multi-level page tables add overhead.
- Mitigated by caching translations in the TLB.
MMU and TLB are key to OS-level efficiency and security
- Underpin virtual memory, sandboxing, and resource control.
Replacement Policies
- Decide which TLB entry to evict on new insertions, balancing speed and hardware complexity.
Associativity & Size
- Affect performance, hit rates, and hardware cost; modern CPUs often use set-associative TLBs.
Split vs Unified
- Trade-offs between parallelism and complexity; many CPUs use split L1 TLBs and a unified L2 TLB.
TLB Hierarchy
- Multi-level TLBs improve hit rates and reduce page table walk frequency.
TLB Flushes
- Essential to ensure correctness, but can be costly; mitigated by ASIDs/PCIDs and careful OS design.

13. References and Further Reading

Abraham Silberschatz, Peter B. Galvin, Greg Gagne, Operating System Concepts (10th Edition)
MMU in RISC ARM architecture
TLB Wikipedia
Ulrich Drepper: What Every Programmer Should Know About Memory
Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3A: System Programming Guide, Part 1, 4.10.2 Translation Lookaside Buffers (TLBs)

About the Author

Abdul-Hai Mohamed | Software Engineering Geek’s.

Writes in-depth articles about Software Engineering and architecture.

Follow on GitHub and LinkedIn.

This content originally appeared on DEV Community and was authored by Abdulhai Mohamed Samy