⏸🔁 Pause / Disturb Failures in PSRAM
This document describes how Pause and Disturb failures manifested in PSRAM, and why Disturb-related degradation became a dominant reliability concern only when the operating envelope was exceeded.
PSRAM is a representative boundary-case technology: mass-producible up to the guaranteed limit, yet rapidly exposing physical limits beyond it.
🧠 Background: PSRAM Operating Characteristics
Compared to conventional DRAM, PSRAM exhibited:
- Internally managed refresh with limited charge recovery margin
- Long effective pause time during system standby
- High-frequency access patterns during active operation
As a result, cell stress accumulated under real usage conditions, not only during explicit wafer-level test modes.
Critically, these stresses were suppressed within the guaranteed region, but became visible once temperature and usage exceeded design assumptions.
⏸ Pause Stress in PSRAM
Characteristics
- Long standby periods mapped directly to intrinsic retention stress
- Junction leakage accelerated at elevated temperature
- Residual plasma damage inherited from DRAM-era process flow
Pause stress alone weakened marginal cells, but pause-only stress did not cause failures within the guaranteed operating range.
🔁 Disturb Stress as a Dominant Failure Accelerator
Usage-Driven Disturb
In PSRAM, disturb stress accumulated during normal operation:
- Repeated word-line activation driven by application behavior
- No explicit disturb test required
- Failures emerged after resume from standby
📌 Disturb was no longer a test condition
→ it became a usage condition.
However, disturb-induced degradation remained fully suppressed until the operating envelope was exceeded.
⚛️ Device-Level Origin of Disturb Degradation
Fig. Device-level leakage paths and disturb mechanisms observed in PSRAM cell structures
Disturb stress caused:
- Increased cell transistor subthreshold leakage
- Enhanced n⁺ / p⁻ junction leakage
- Leakage across isolation regions
- Progressive charge loss from storage nodes
These effects were negligible in standard DRAM usage and remained suppressed in PSRAM up to the guaranteed region.
🌡 Temperature Expansion Impact
PSRAM required an expanded operating guarantee:
- Conventional DRAM limit: 80 °C
- PSRAM shipment guarantee: 90 °C
To ensure manufacturability and field reliability, a guardband region beyond the shipment specification was also evaluated.
At excessive temperature:
- Junction leakage increased exponentially
- Transistor Ioff rose sharply
- Isolation leakage between adjacent cells increased
- Disturb-assisted charge loss became observable
📌 Internal refresh alone was insufficient to compensate for combined pause + disturb stress once the guaranteed envelope was exceeded.
📈 Temperature vs Fail Bit Count (Conceptual)
The following conceptual representation summarizes fail bit population behavior before and after countermeasures.
⚠️ Values are illustrative, intended to show manufacturing logic and boundaries,
not exact measurement data.
Fail Bit Count (conceptual) — BEFORE countermeasures
25℃ | 0
50℃ | 0
80℃ | 0
85℃ | ***
90℃* | ******** ← FAIL at guarantee temp → NOT shippable
95℃ | *****************
100℃ | ************************************
Fail Bit Count (conceptual) — AFTER countermeasures (mass-production level)
25℃ | 0
50℃ | 0
80℃ | 0
85℃ | 0
90℃* | 0 ← Shipment guarantee
95℃ | ** ← Guardband screening
100℃ | *********** ← Physical limit begins to appear
🧭 Interpretation (Manufacturing-Consistent)
| Temperature | Interpretation |
|---|---|
| ≤ 85 °C | Fail Bit Count = 0 (fully suppressed, safe margin) |
| 90 °C | Guaranteed operating point for shipment |
| 95 °C | Guardband region (Fail = 0 after countermeasures) |
| ≥ 100 °C | Failure population becomes observable (physical boundary) |
📌 Key point:
Fail bits did not appear gradually from low temperature.
They were completely suppressed up to the guaranteed and guardbanded region,
and became observable only after the operating envelope was exceeded.
This behavior indicates a boundary-driven failure mechanism,
not a continuous wear-out process.
🔗 Combined Effect: Pause × Disturb Coupling
Beyond the guaranteed region:
- Pause stress weakened cells through leakage
- Subsequent disturb accesses accelerated charge loss
- Failures became system-visible, not just test-visible
This explains intermittent failures observed only beyond specification, such as stress evaluation near 100 °C.
🛠 Countermeasures Implemented
Achieving mass production required process and design co-optimization.
Plasma Damage Reduction
- Plasma condition softening
- Ashing damage suppression
- Junction defect density reduction
Bias Engineering
- Back-bias adjustment: −1 V → −3 V
- Improved suppression of subthreshold leakage
Device & Layout Optimization
- Memory cell transistor Vth increase
- Gate length center-value management
- CD shrink variability reduction
- Enhanced isolation robustness
These measures shifted the failure boundary upward, enabling both 90 °C shipment guarantee and 95 °C guardband margin.
🧠 Key Insight (Legacy)
PSRAM was mass-produced not because failures were absent,
but because failures were successfully pushed
beyond the guaranteed operating envelope.
Pause / Disturb failures were not test artifacts, but boundary phenomena revealed only when physics was exceeded.
This lesson directly informed later system-aware reliability design, including always-on domains and low-power SoC architectures.