VLSI Industry

Common CDC Bugs That Pass Simulation but Fail on Silicon

Introduction

Clock Domain Crossing (CDC) is one of the most critical challenges in digital hardware design, especially in modern SoCs, ASICs, and FPGA-based systems. Many complex systems contain multiple clock domains to support different subsystems such as processors, memory controllers, DMA engines, and peripheral interfaces.

Although CDC issues may appear correct during RTL simulation, they often fail after silicon fabrication due to metastability, timing uncertainties, and asynchronous signal behaviour. This happens because simulation models assume ideal timing and cannot fully represent real hardware conditions.

In this blog, we discuss two common CDC issues that frequently escape simulation but cause failures in silicon:

  • Reset Crossing Problems
  • Pulse Stretching Issues

Understanding these problems helps RTL designers build more reliable multi-clock domain systems.

Reset Crossing Issues

What is Reset Crossing?

A reset crossing occurs when a reset signal generated in one clock domain affects logic operating in another clock domain.

In many designs, a global reset signal is distributed to multiple blocks running on different clocks. If this reset is asynchronously released, each clock domain may exit reset at different times. This can lead to unstable states or metastability.

Why Reset Crossing Bugs Pass Simulation

In RTL simulation:

  • Reset signals often change perfectly aligned with clock edges
  • Flip-flops respond instantaneously
  • No metastability is modelled

However, in real hardware:

  • Reset release may occur between clock edges
  • Different flip-flops observe reset release at slightly different times
  • This can cause illegal FSM states or corrupted registers

Example Problem

Consider a design with two clock domains:

  • clk_A domain  🡪 generates a reset
  • clk_B domain  🡪 consumes reset

If reset deassertion is asynchronous: reset_n 🡪 clk_B logic

Some flip-flops may release earlier than others.

Possible consequences:

  • FSM enters unknown state
  • Counters start with invalid values
  • Handshake protocols break

Example RTL (Problematic)

If reset_n is asynchronous to clk_B, the deassertion may cause metastability.

Correct Reset Synchronization

A common solution is synchronous reset release using a two-flop synchronizer.

Advantages:

  • Prevents metastability
  • Ensures reset release aligns with the destination clock

Best Practices for Reset CDC

  • Use asynchronous assert, synchronous deassert reset.
  • Synchronize reset in every clock domain.
  • Avoid distributing raw reset signals across domains.
  • Verify using CDC tools (Spyglass, Questa CDC).

Pulse Stretching Issues

What is Pulse Crossing?

A pulse crossing occurs when a single-cycle pulse from one clock domain must be detected in another clock domain.

Example:

clk_A domain 🡪 generates an interrupt pulse
clk_B domain 🡪 must detect it

Why This Fails on Silicon

Suppose:

  • clk_A is fast
  • clk_B is slow

If the pulse width is shorter than one clk_B cycle, the receiving domain may completely miss the pulse.

Simulation may still pass because:

  • clocks are perfectly aligned
  • pulses appear visible

In real silicon:

  • clocks have phase differences
  • pulse may fall entirely between clock edges

Example Problem

Pulse generated in source domain: clk_A, Destination clock: clk_B 

If the pulse occurs between two clk_B edges, it will not be captured.

Result:

  • event lost
  • interrupt missed
  • data transfer failure

Example RTL (Problematic)

Directly synchronizing this pulse using a two-flop synchronizer does not guarantee detection.

Correct Solution: Pulse Stretching

The pulse must be extended long enough so that the destination clock can detect it.

Pulse Stretching Method

Convert the pulse into a level signal or toggle signal.

Example using toggle synchronizer:

Source domain:

Destination domain:

Advantages:

  • Guarantees pulse detection
  • Works across any clock ratio

Why CDC Bugs Are Dangerous

CDC bugs are particularly problematic because they:

  • Pass RTL simulation
  • May pass basic verification
  • Only fail in silicon or FPGA prototype

These bugs are difficult and expensive to debug because they depend on:

  • clock skew
  • process variation
  • temperature
  • metastability resolution time

Conclusion

Clock Domain Crossing errors remain one of the most frequent causes of post-silicon failures in digital systems. Among them, reset crossing and pulse stretching issues are particularly dangerous because they often escape traditional simulation.

Key lessons for designers:

  • Always synchronize reset release in each clock domain.
  • Never transfer single-cycle pulses directly across clock domains.
  • Use CDC-safe structures like synchronizers, toggles, or FIFOs.
  • Perform static CDC verification during the design phase.

By applying these techniques, designers can significantly improve the reliability of complex multi-clock systems and avoid costly silicon re-spins.

  • Raghavendra H

    Raghavendra Havaldar focuses on delivering high-quality training in VLSI design and RTL development at Maven Silicon. He has over 18 years of combined industry and academic experience and strong expertise in Verilog, RISC-V architecture, FPGA, GPIO, and AHB-APB protocols. He has played a key role in developing RTL for RISC-V cores and building self-checking testbenches, while also training hundreds of engineering graduates and professionals in frontend VLSI technologies

Loading Popular Posts...

Loading categories...

Download the

Maven Learning App

LEARN ANYTIME, ANYWHERE

Get trained online as a VLSI Professional

FLAT

40% OFF

On all Blended Courses

maven-silicon

Have Doubts?
Read Our FAQs

Don't see your questions answered here?