Mobile audio applications—hearing aids, speech therapy tools, real-time translation—often require sample rate conversion (SRC) between hardware I/O rates and processing pipelines. A hearing aid app might capture at 48kHz but apply DSP at 16kHz for power efficiency, then upsample back to 48kHz for output. Naive resampling introduces aliasing, imaging artifacts, and phase distortion that degrade speech intelligibility. This article walks through polyphase FIR filter design and fractional delay interpolation techniques that achieve transparent resampling within the 5–10ms latency budgets of interactive audio apps.
Why Resampling Matters in Clinical Audio
In HearingAid Pro, we route AirPods microphone input through a cascade of filters: noise reduction at 16kHz, frequency shaping at 24kHz, and spatial audio synthesis at 48kHz. Each stage runs at its optimal rate—lower rates save CPU and battery, higher rates preserve ultrasonic harmonics for natural timbre. The challenge: transitioning between rates without introducing perceptible artifacts. A poorly designed resampler causes metallic ringing, loss of sibilants, or rhythmic pumping that users immediately notice in speech.
Traditional approaches fall short. Zero-order hold (nearest-neighbor) creates staircase waveforms with harsh aliasing above Nyquist/2. Linear interpolation smooths the staircase but still aliases high frequencies and introduces ~1ms of smear per octave. For speech, where formant transitions happen in 20–40ms windows, this smear collapses phoneme boundaries. Clinical-grade audio demands better.
Polyphase FIR Architecture
A polyphase filter bank decomposes a long FIR filter into P parallel subfilters, where P is the upsampling factor. For 16→48kHz (3× upsampling), we split a 96-tap lowpass FIR into three 32-tap phases. Each output sample uses exactly one phase, eliminating redundant computation. The algorithm:
- Design a prototype lowpass FIR with cutoff at min(f_in, f_out)/2 and stopband attenuation >80dB to suppress aliasing.
- Partition coefficients: phase 0 gets taps [0,3,6,...], phase 1 gets [1,4,7,...], phase 2 gets [2,5,8,...].
- For each output sample at rate f_out, select the phase corresponding to the fractional sample position, convolve with input history, emit result.
This structure maps cleanly to SIMD. On ARM NEON, we load four input samples and four coefficients per phase into 128-bit registers, multiply-accumulate in parallel, then horizontal-add. A 96-tap polyphase filter processes 48kHz output in ~2.1µs per sample on an A15 Bionic, leaving 18.7µs of the 20.8µs budget (48kHz period) for downstream DSP.
Coefficient Design Trade-offs
Filter length trades off transition bandwidth, stopband attenuation, and latency. A 64-tap filter achieves 0.05× transition width (e.g., passband to 7.2kHz, stopband from 8.8kHz at 16kHz Nyquist) with 75dB attenuation, introducing 2ms group delay. Doubling to 128 taps narrows transition to 0.025× and pushes attenuation to 90dB, but group delay doubles to 4ms. For hearing aids targeting