SoC Design
Understanding Memory Hierarchy in SoCs: Caches, TLBs, and Coherence
Introduction
Modern System-on-Chips (SoCs) are designed to deliver high performance, low latency, and energy-efficient computation. One of the most important contributors to SoC performance is the memory hierarchy’s multi-level organisation that ensures the processor can access data quickly while keeping overall system cost and power low.
This blog explains key components of the memory hierarchy in SoCs: caches, Translation Lookaside Buffers (TLBs), and cache coherence mechanisms, along with simple example snippets.
Why Memory Hierarchy Matters in SoCs?
CPUs operate much faster than memory. A typical core may run at GHz frequencies, while accessing DRAM takes hundreds of cycles. If every memory access went to DRAM, performance would collapse.
To solve this, SoCs rely on a hierarchical memory structure:
- Registers (fastest, few cycles)
- L1/L2/L3 Caches
- On-chip SRAM / scratchpads
- Off-chip DRAM
Each level acts as a filter, holding recently accessed data close to the processor. The deeper the hierarchy, the better the performance:-provided data is managed efficiently.
Cache Architecture in SoCs
Caches store small chunks of data called cache lines (typically 32–128 bytes) fetched from the main memory. Most SoCs use multi-level caches:
L1 Cache
- Split into I-cache and D-cache
- Very small (16–64 KB)
- Fastest and placed close to the core
L2 / L3 Cache
- Larger and shared among cores
- Acts as a buffer between L1 and DRAM
Cache Mapping Techniques
- Direct-mapped: simple but prone to conflicts
- Set-associative: balance of speed and conflict reduction
- Fully-associative: best flexibility but expensive
Example: Simple Cache Access Flow (Pseudocode)
![]()
This illustrates how the hierarchy reduces average memory access latency.
Understanding TLBs (Translation Lookaside Buffers)
Virtual memory systems translate virtual addresses to physical addresses. A naive page table lookup takes several memory accesses, so SoCs introduce the Translation Lookaside Buffer (TLB):-a small cache dedicated to storing recent address translations.
Key properties of TLBs
- Similar to cache, but store page table entries (PTEs)
- Typically 32–128 entries in L1 TLBs
- Multi-level: L1 TLB + shared L2 TLB
- Miss penalties are very high (walk page tables → multiple memory accesses)
TLB Hit vs Miss
- Hit: translation returned in 1 cycle
- Miss: hardware or software page-table walk
TLB Miss Example (Pseudocode)
![]()
Efficient TLB design is critical in SoCs running Linux or real-time OSes where context switching happens frequently
Cache Coherence in Multi-Core SoCs
Modern SoCs house multiple CPU cores sharing memory. When each core has its own cache, we face a problem:
If multiple caches store copies of the same memory location, then it leads to inconsistency unless the system maintains cache coherence.
Common Coherence Protocols
- MESI – Modified, Exclusive, Shared, Invalid
- MOESI – adds Owned state
- MESIF – Intel variant with Forward state
- Directory-based coherence – scalable for many cores
Why Coherence Matters?
Without coherence:
- One core may read stale data
- Writes may not propagate
- Parallel programs behave unpredictably
MESI Protocol Example
Core0 write X 🡪 moves its cache line to M state
Core1 read X 🡪 Core0 supplies updated data, Core1 gets S state
Core0 read X 🡪 stays in M or moves to S depending on the protocol
Putting It All Together: End-to-End Memory Access
Below is a simplified view of what happens when a core reads memory in a modern SoC:
- TLB checks if the virtual address is cached
- If miss → page-table walk
- Physical address sent to L1 cache
- If miss → check L2 cache
- If other cores may hold the line → coherence protocol ensures correct version
- If all miss → fetch from DRAM
- Return data up the hierarchy and update caches
Each layer either accelerates access or ensures correctness.
Practical Example: RISC-V SoC Memory Access Test
A small C snippet that demonstrates how memory locality impacts cache/TLB behavior:
![]()
Sequential access runs significantly faster due to cache line and TLB locality
Conclusion
The memory hierarchy in SoCs plays a pivotal role in enabling modern processors to achieve high performance while keeping power consumption reasonable. Caches reduce the time to fetch data, TLBs accelerate address translation, and coherence protocols maintain consistency across multiple cores.
Together, they form the backbone of efficient system design. Engineers building SoCs, firmware, or device drivers must deeply understand these concepts to optimize performance and ensure correctness
75,221
SUBSCRIBERS
Subscribe to our Blog
Get the latest VLSI news, updates, technical and interview resources



