⏸🔁 Pause / Disturb Failures in PSRAM

This document describes how Pause and Disturb failures manifested in PSRAM, and why Disturb-related degradation became a dominant reliability concern only when the operating envelope was exceeded.

PSRAM is a representative boundary-case technology: mass-producible up to the guaranteed limit, yet rapidly exposing physical limits beyond it.


🧠 Background: PSRAM Operating Characteristics

Compared to conventional DRAM, PSRAM exhibited:

As a result, cell stress accumulated under real usage conditions, not only during explicit wafer-level test modes.

Critically, these stresses were suppressed within the guaranteed region, but became visible once temperature and usage exceeded design assumptions.


⏸ Pause Stress in PSRAM

Characteristics

Pause stress alone weakened marginal cells, but pause-only stress did not cause failures within the guaranteed operating range.


🔁 Disturb Stress as a Dominant Failure Accelerator

Usage-Driven Disturb

In PSRAM, disturb stress accumulated during normal operation:

📌 Disturb was no longer a test condition
→ it became a usage condition.

However, disturb-induced degradation remained fully suppressed until the operating envelope was exceeded.


⚛️ Device-Level Origin of Disturb Degradation

Fig. Device-level leakage paths and disturb mechanisms observed in PSRAM cell structures

Disturb stress caused:

These effects were negligible in standard DRAM usage and remained suppressed in PSRAM up to the guaranteed region.


🌡 Temperature Expansion Impact

PSRAM required an expanded operating guarantee:

To ensure manufacturability and field reliability, a guardband region beyond the shipment specification was also evaluated.

At excessive temperature:

📌 Internal refresh alone was insufficient to compensate for combined pause + disturb stress once the guaranteed envelope was exceeded.


📈 Temperature vs Fail Bit Count (Conceptual)

The following conceptual representation summarizes fail bit population behavior before and after countermeasures.

⚠️ Values are illustrative, intended to show manufacturing logic and boundaries,
not exact measurement data.

Fail Bit Count (conceptual) — BEFORE countermeasures

25℃    | 0
50℃    | 0
80℃    | 0
85℃    | ***
90℃*   | ********        ← FAIL at guarantee temp → NOT shippable
95℃    | *****************
100℃   | ************************************
Fail Bit Count (conceptual) — AFTER countermeasures (mass-production level)

25℃    | 0
50℃    | 0
80℃    | 0
85℃    | 0
90℃*   | 0                ← Shipment guarantee
95℃    | **               ← Guardband screening
100℃   | ***********      ← Physical limit begins to appear

🧭 Interpretation (Manufacturing-Consistent)

Temperature Interpretation
≤ 85 °C Fail Bit Count = 0 (fully suppressed, safe margin)
90 °C Guaranteed operating point for shipment
95 °C Guardband region (Fail = 0 after countermeasures)
≥ 100 °C Failure population becomes observable (physical boundary)

📌 Key point:
Fail bits did not appear gradually from low temperature.

They were completely suppressed up to the guaranteed and guardbanded region,
and became observable only after the operating envelope was exceeded.

This behavior indicates a boundary-driven failure mechanism,
not a continuous wear-out process.


🔗 Combined Effect: Pause × Disturb Coupling

Beyond the guaranteed region:

This explains intermittent failures observed only beyond specification, such as stress evaluation near 100 °C.


🛠 Countermeasures Implemented

Achieving mass production required process and design co-optimization.

Plasma Damage Reduction

Bias Engineering

Device & Layout Optimization

These measures shifted the failure boundary upward, enabling both 90 °C shipment guarantee and 95 °C guardband margin.


🧠 Key Insight (Legacy)

PSRAM was mass-produced not because failures were absent,
but because failures were successfully pushed
beyond the guaranteed operating envelope.

Pause / Disturb failures were not test artifacts, but boundary phenomena revealed only when physics was exceeded.

This lesson directly informed later system-aware reliability design, including always-on domains and low-power SoC architectures.