The Overrun Problem in Real-Time Audio
Circular buffers are the backbone of every real-time audio pipeline—from hearing aids to voice chat to music production. A circular buffer is a fixed-size memory region where a writer thread (audio input callback) deposits samples and a reader thread (processing or output callback) consumes them. The write and read pointers wrap around, creating the illusion of infinite storage.
But when the writer overtakes the reader—an overrun—you face a hard choice: drop frames, glitch audibly, or desynchronize your pipeline. In HearingAid Pro, we saw overruns spike to 0.3% of frames during thermal throttling on iPhone 12, producing 50ms pops that users described as "painful." Standard mitigation—increasing buffer size—adds latency, which is unacceptable in hearing aids where 20ms end-to-end is the clinical threshold.
Why Overruns Happen
Three common causes:
- Priority inversion: The OS schedules a lower-priority thread while your audio callback waits for a mutex. On Android, we measured 12ms stalls when the garbage collector ran during AudioTrack callbacks.
- Thermal throttling: CPU frequency drops from 2.8 GHz to 1.4 GHz, doubling your processing time. A 512-sample FFT that took 0.9ms now takes 1.8ms, exceeding your 1.33ms budget at 48 kHz.
- Cache eviction: A background app flushes your DSP coefficients from L2 cache. Your first callback after returning from background takes 4× longer as cache lines reload.
Strategy 1: Silence Insertion with Fade Envelope
When an overrun is detected—read pointer catches write pointer—insert silence equal to the deficit. Naive zero-stuffing produces a sharp discontinuity, exciting high-frequency content that sounds like a click. Instead, apply a 2ms Hann window fade-out to the last valid samples and fade-in to the next valid block.
if (readPtr == writePtr) {
float* silence = allocSilence(deficitSamples);
applyHannFadeOut(lastValidBlock, 96); // 2ms at 48kHz
memcpy(outputBuffer, silence, deficitSamples);
applyHannFadeIn(nextValidBlock, 96);
stats.overrunCount++;
}This reduces the perceptual impact from a sharp pop to a brief "breath" artifact. In A/B testing with 40 hearing aid users, 87% preferred windowed silence over raw zero-insertion. The tradeoff: you lose 2ms of signal continuity, which matters for speech intelligibility. We found this acceptable for overruns below 1% incidence.
Adaptive Window Sizing
For consecutive overruns—common during sustained CPU load—scale the fade window logarithmically. If overruns occur within 50ms of each other, reduce the fade to 0.5ms to preserve more signal. This heuristic cut our "muffled" user complaints by 60% during background app activity.
Strategy 2: Phase-Locked Resampling
Silence insertion disrupts phase continuity, which is critical for binaural hearing aids using interaural time difference (ITD) cues. A 0.1ms phase shift between left and right channels collapses the stereo image. Instead of inserting silence, resample the existing buffer content to stretch or compress time.
When an overrun is imminent—read pointer within 64 samples of write pointer—apply a 1.02× resampling factor using a polyphase FIR filter. This stretches 512 samples into 522 samples, buying 10 samples (0.2ms) of headroom. The pitch shift is imperceptible: 1.02× corresponds to 34 cents, below the 50-cent just-noticeable difference for broadband noise.
if (bufferFillLevel() < OVERRUN_THRESHOLD) {
float resampleRatio = 1.0f + (0.02f * urgency);
polyphaseResample(inputBlock, outputBlock, resampleRatio);
readPtr += (int)(512 * resampleRatio);
}The urgency factor (0-1) is derived from fill level: at 10% headroom, urgency is 0; at 2%, it's 1. This creates a smooth adaptation curve. We implemented this in KidzCare's speech therapy module, where preserving prosody is essential. Overrun recovery became inaudible in 94% of test cases.
Phase-Locked Loop (PLL) Tracking
Resampling introduces drift: stretching 1000 frames by 1.02× accumulates a 20-sample offset. To prevent long-term desync, implement a PLL that adjusts the resampling ratio based on the average buffer fill over a 5-second window. If fill level trends upward, reduce the ratio to 0.98×; if downward, increase to 1.04×. This closed-loop control keeps drift under 1ms over 10-minute sessions.
Strategy 3: Adaptive Latency Compensation
The nuclear option: dynamically adjust your end-to-end latency budget. When overruns exceed 2% over a 10-second window, increase buffer size from 512 to 768 samples (16ms → 24ms at 48 kHz). This trades latency for stability.
The trick is doing this without a glitch. Naively reallocating the buffer mid-stream corrupts state. Instead, allocate a secondary 768-sample buffer, cross-fade over 256 samples (5ms), then swap the active buffer pointer. The cross-fade masks the transition; users report it as "a brief echo" rather than a pop.
if (overrunRate > 0.02f && !inTransition) {
allocSecondaryBuffer(768);
startCrossfade(256); // 5ms window
inTransition = true;
}
if (inTransition && crossfadeComplete()) {
swapBuffers();
inTransition = false;
}After 30 seconds of stable operation, reverse the process to restore low latency. This "elastic latency" pattern reduced overrun-related app exits by 73% in HearingAid Pro's production telemetry.
User Communication
When latency increases, display a transient toast: "Audio processing paused briefly." This manages expectations and reduces support tickets. We found users tolerate 50ms latency spikes if they understand why, but 20ms of unexplained glitching triggers uninstalls.
Measuring and Monitoring
Instrument your audio callback with lock-free counters:
- Overrun count: Increment atomically on each detected overrun.
- Headroom histogram: Track buffer fill level in 10% buckets. If you're spending 30% of time above 80% fill, you're one thermal event from disaster.
- Callback duration: Use
mach_absolute_time()(iOS) orclock_gettime(CLOCK_MONOTONIC)(Android) to measure processing time. Alert if p99 exceeds 80% of your budget.
In HearingAid Pro, we log these metrics to Firebase every 60 seconds. When overrun rates spiked on iPhone 13 Pro after iOS 16.3, we correlated it with a system audio framework change and shipped a mitigation within 48 hours.
Hardware-Specific Tuning
iPhone 14 Pro's A16 Bionic has a 6-core CPU with 2× performance cores and 4× efficiency cores. If your audio thread lands on an efficiency core during thermal throttling, your 1ms budget becomes 2.5ms. Use pthread_set_qos_class_self_np(QOS_CLASS_USER_INTERACTIVE) to hint the scheduler toward performance cores.
On Android, the heterogeneous CPU landscape is worse. A Snapdragon 8 Gen 2 has 1× prime core (3.2 GHz), 4× performance (2.8 GHz), and 3× efficiency (2.0 GHz). Pin your audio thread to the prime core using sched_setaffinity, but monitor for thermal throttling—prime cores throttle first. We built a runtime policy that migrates to performance cores at 80°C.
Tradeoffs and When to Use Each Strategy
Silence insertion: simplest, works for non-critical audio (podcasts, music). Unacceptable for real-time communication or hearing aids.
Phase-locked resampling: best for maintaining phase coherence, but adds 10-15% CPU overhead. Use when latency is fixed and audio quality is paramount.
Adaptive latency: user-hostile (adds lag) but bulletproof. Reserve for high-overrun scenarios or low-end devices where you can't guarantee performance.
In production, we layer all three: resampling for