Real-time voice applications face a fundamental challenge: IP networks drop packets, yet human conversation demands continuity. When a 20ms audio frame vanishes in transit, the receiving endpoint must decide in microseconds whether to introduce silence, synthesize plausible audio, or rely on redundant data transmitted earlier. This decision shapes perceived call quality more than codec bit rate or sample rate.
Two dominant strategies exist: Forward Error Correction (FEC) transmits redundant payload preemptively, while Packet Loss Concealment (PLC) synthesizes missing audio using signal processing. Each approach carries distinct architectural consequences for bandwidth, latency, CPU utilization, and subjective quality under varying loss patterns.
Forward Error Correction: Bandwidth for Resilience
FEC encodes redundant audio data within the RTP stream itself. The most common implementation in VoIP is RFC 2198 RED (Redundant Audio Data), where each packet carries the current frame plus one or more previous frames at lower fidelity. For example, a packet might contain:
- Primary payload: Opus at 32kbps, frame N
- Redundant payload: Opus at 16kbps, frame N-1
- Redundant payload: Opus at 8kbps, frame N-2
When frame N+1 arrives but frame N is lost, the receiver extracts the 16kbps version of frame N from the N+1 packet. This approach provides deterministic recovery with no synthesis artifacts, but increases bandwidth by 40-80% depending on redundancy depth.
The latency impact is subtle but critical. FEC introduces no additional decoding delay—the redundant frame is already present. However, it constrains jitter buffer strategy. If your adaptive jitter buffer targets 60ms depth but FEC provides two-frame redundancy (40ms at 20ms frame size), you cannot shrink below that threshold without negating FEC's value.
Bandwidth Overhead and Adaptive FEC
Static FEC wastes bits during stable network conditions. Adaptive FEC implementations monitor packet loss ratio over sliding windows (typically 2-5 seconds) and modulate redundancy depth. A practical heuristic from production WebRTC deployments:
- Loss < 1%: No redundancy
- Loss 1-3%: Single frame at -6dB quality
- Loss 3-7%: Two frames at -6dB and -12dB
- Loss > 7%: Fall back to PLC, FEC overhead unsustainable
This adaptive layer requires careful hysteresis to avoid oscillation. Ramping up redundancy too aggressively after a single loss burst wastes bandwidth; ramping down too quickly leaves the stream vulnerable to subsequent bursts. A two-threshold system with 3-second averaging windows provides stable behavior in most cellular scenarios.
Packet Loss Concealment: Synthesis Under Constraint
PLC algorithms reconstruct missing frames using only previously decoded audio. The challenge: synthesize 20ms of speech that maintains pitch continuity, spectral envelope, and energy trajectory without access to future context. Modern codecs embed PLC directly—Opus, for instance, uses a combination of LPC extrapolation and pitch-based waveform repetition.
The Opus PLC implementation operates in three stages:
- Pitch detection on the last 40ms of decoded audio using autocorrelation
- LPC analysis (order 16) to capture spectral envelope
- Pitch-synchronous overlap-add to generate the concealment frame
For a single lost frame, Opus PLC achieves PESQ scores within 0.2 MOS of the original for most speech content. The CPU cost is approximately 0.8ms on a modern ARM core—negligible compared to decoding overhead. However, PLC degrades rapidly with consecutive losses. Three consecutive frames (60ms) produce audible artifacts; five frames result in near-silence or tonal artifacts.
Hybrid PLC: When Codec State Helps
Stateful codecs like Opus maintain internal predictive models. When a packet arrives after a loss, the decoder can leverage the next frame's initial state to refine the concealment retroactively. This "look-ahead" PLC improves quality by 0.3-0.5 MOS for isolated losses but requires buffering the subsequent frame—adding 20ms to mouth-to-ear latency.
In applications where latency budgets exceed 150ms (most conference calling, gaming voice chat), this trade-off is worthwhile. For sub-100ms use cases like music collaboration or remote procedure guidance, the latency penalty outweighs the quality gain.
Bursty Loss: Where FEC Dominates
Packet loss rarely arrives uniformly distributed. Cellular handoffs, Wi-Fi contention, and congestion produce bursts—consecutive losses spanning 60-200ms. PLC fails catastrophically here; even sophisticated waveform synthesis cannot bridge 100ms gaps without audible discontinuities.
FEC with sufficient redundancy depth handles bursts gracefully up to its coverage window. Two-frame redundancy (40ms) conceals most cellular handoff events. Three-frame redundancy (60ms) survives Wi-Fi channel switches. Beyond that, the bandwidth cost becomes prohibitive—a 60ms FEC window at reasonable quality consumes 2.5x the baseline codec rate.
Empirical data from a WebRTC deployment handling 2M+ daily calls shows FEC reduces user-reported audio quality issues by 40% in LTE environments, where loss bursts dominate. In fiber/cable scenarios with uniform low loss (