SoC Design

Understanding Memory Hierarchy in SoCs: Caches, TLBs, and Coherence

Introduction

Modern System-on-Chips (SoCs) are designed to deliver high performance, low latency, and energy-efficient computation. One of the most important contributors to SoC performance is the memory hierarchy’s multi-level organisation that ensures the processor can access data quickly while keeping overall system cost and power low.

This blog explains key components of the memory hierarchy in SoCs: caches, Translation Lookaside Buffers (TLBs), and cache coherence mechanisms, along with simple example snippets.

Why Memory Hierarchy Matters in SoCs?

CPUs operate much faster than memory. A typical core may run at GHz frequencies, while accessing DRAM takes hundreds of cycles. If every memory access went to DRAM, performance would collapse.

To solve this, SoCs rely on a hierarchical memory structure:

  • Registers (fastest, few cycles)
  • L1/L2/L3 Caches
  • On-chip SRAM / scratchpads
  • Off-chip DRAM

Each level acts as a filter, holding recently accessed data close to the processor. The deeper the hierarchy, the better the performance:-provided data is managed efficiently.

Cache Architecture in SoCs

Caches store small chunks of data called cache lines (typically 32–128 bytes) fetched from the main memory. Most SoCs use multi-level caches:

L1 Cache

  • Split into I-cache and D-cache
  • Very small (16–64 KB)
  • Fastest and placed close to the core

L2 / L3 Cache

  • Larger and shared among cores
  • Acts as a buffer between L1 and DRAM

Cache Mapping Techniques

  • Direct-mapped: simple but prone to conflicts
  • Set-associative: balance of speed and conflict reduction
  • Fully-associative: best flexibility but expensive

Example: Simple Cache Access Flow (Pseudocode)

This illustrates how the hierarchy reduces average memory access latency.

Understanding TLBs (Translation Lookaside Buffers)

Virtual memory systems translate virtual addresses to physical addresses. A naive page table lookup takes several memory accesses, so SoCs introduce the Translation Lookaside Buffer (TLB):-a small cache dedicated to storing recent address translations.

Key properties of TLBs

  • Similar to cache, but store page table entries (PTEs)
  • Typically 32–128 entries in L1 TLBs
  • Multi-level: L1 TLB + shared L2 TLB
  • Miss penalties are very high (walk page tables → multiple memory accesses)

TLB Hit vs Miss

  • Hit: translation returned in 1 cycle
  • Miss: hardware or software page-table walk

TLB Miss Example (Pseudocode)

Efficient TLB design is critical in SoCs running Linux or real-time OSes where context switching happens frequently

Cache Coherence in Multi-Core SoCs

Modern SoCs house multiple CPU cores sharing memory. When each core has its own cache, we face a problem:
If multiple caches store copies of the same memory location, then it leads to inconsistency unless the system maintains cache coherence.

Common Coherence Protocols

  • MESI – Modified, Exclusive, Shared, Invalid
  • MOESI – adds Owned state
  • MESIF – Intel variant with Forward state
  • Directory-based coherence – scalable for many cores

Why Coherence Matters?

Without coherence:

  • One core may read stale data
  • Writes may not propagate
  • Parallel programs behave unpredictably

MESI Protocol Example

Core0 write X 🡪 moves its cache line to M state

Core1 read X  🡪 Core0 supplies updated data, Core1 gets S state

Core0 read X  🡪 stays in M or moves to S depending on the protocol

Putting It All Together: End-to-End Memory Access

Below is a simplified view of what happens when a core reads memory in a modern SoC:

  1. TLB checks if the virtual address is cached
  2. If miss → page-table walk
  3. Physical address sent to L1 cache
  4. If miss → check L2 cache
  5. If other cores may hold the line → coherence protocol ensures correct version
  6. If all miss → fetch from DRAM
  7. Return data up the hierarchy and update caches

Each layer either accelerates access or ensures correctness.

Practical Example: RISC-V SoC Memory Access Test

A small C snippet that demonstrates how memory locality impacts cache/TLB behavior:

Sequential access runs significantly faster due to cache line and TLB locality

Conclusion

The memory hierarchy in SoCs plays a pivotal role in enabling modern processors to achieve high performance while keeping power consumption reasonable. Caches reduce the time to fetch data, TLBs accelerate address translation, and coherence protocols maintain consistency across multiple cores.

Together, they form the backbone of efficient system design. Engineers building SoCs, firmware, or device drivers must deeply understand these concepts to optimize performance and ensure correctness

  • Raghavendra H

    Raghavendra Havaldar focuses on delivering high-quality training in VLSI design and RTL development at Maven Silicon. He has over 18 years of combined industry and academic experience and strong expertise in Verilog, RISC-V architecture, FPGA, GPIO, and AHB-APB protocols. He has played a key role in developing RTL for RISC-V cores and building self-checking testbenches, while also training hundreds of engineering graduates and professionals in frontend VLSI technologies

Loading Popular Posts...

Loading categories...

Download the

Maven Learning App

LEARN ANYTIME, ANYWHERE

Get trained online as a VLSI Professional

FLAT

40% OFF

On all Blended Courses

maven-silicon

Have Doubts?
Read Our FAQs

Don't see your questions answered here?