The Mobile LLM Logging Problem
When you ship an on-device LLM—whether it's a 3B-parameter Llama variant or a custom speech model—you need telemetry. Crash logs, inference latency histograms, prompt patterns, token throughput, thermal events. But mobile apps face a brutal constraint: every byte of log data competes with model weights for RAM, and every network flush drains battery.
A typical production LLM app generates 200–400 log events per session. Most are duplicates: the same error message, the same slow-path warning, the same user prompt pattern seen 50 times. Naïve implementations store every event in an in-memory buffer, then batch-upload to Firebase or Mixpanel. On a mid-range Android device with 4GB RAM and a 1.5GB model loaded, that log buffer can balloon to 8–12MB before the next flush cycle. The device kills your app. The user loses context mid-inference.
We need deduplication. But exact-match hash tables require 16–24 bytes per entry (key hash + metadata + pointer overhead). For 10,000 unique events across a user's lifetime, that's 240KB minimum—still too expensive when you're fighting for every kilobyte near the model's memory ceiling.
Bloom Filters: Probabilistic Membership Testing
A Bloom filter is a bit array paired with k independent hash functions. To insert an element, hash it k times and set those bit positions to 1. To test membership, hash again; if all k bits are set, the element probably exists. If any bit is 0, it definitely does not exist.
The genius: false positives are acceptable for deduplication. If the filter incorrectly claims we've seen a log event before, we skip uploading it—no user harm, just slightly incomplete telemetry. False negatives are impossible by construction, so we never lose a genuinely new event.
For a 64KB filter (524,288 bits) with 7 hash functions and an expected 50,000 insertions, the false positive rate is 0.8%. That means 99.2% of duplicate events are correctly identified and suppressed, while only 0.8% of unique events are mistakenly dropped. In practice, across 200,000 sessions in a hearing-aid app shipping a 2.1GB Whisper model, this translated to an 83% reduction in uploaded telemetry bytes and a 40% drop in battery drain during background sync.
Implementation on iOS and Android
On iOS, allocate the bit array as a contiguous UInt64 buffer in Swift. Use SipHash with different seeds for your k hash functions—it's fast, cryptographically secure (preventing adversarial collisions), and built into the standard library. For a 64KB filter, that's 8,192 UInt64 words. Wrap it in a class with atomic access via os_unfair_lock to handle concurrent logging from inference threads and UI threads.
final class BloomFilter {
private var bits: [UInt64]
private let k: Int = 7
private let size: Int
private var lock = os_unfair_lock()
init(sizeInBytes: Int) {
self.size = sizeInBytes * 8
self.bits = Array(repeating: 0, count: sizeInBytes / 8)
}
func insert(_ event: String) {
os_unfair_lock_lock(&lock)
defer { os_unfair_lock_unlock(&lock) }
for seed in 0..