The Mobile LLM Logging Problem

When you ship an on-device LLM—whether it's a 3B-parameter Llama variant or a custom speech model—you need telemetry. Crash logs, inference latency histograms, prompt patterns, token throughput, thermal events. But mobile apps face a brutal constraint: every byte of log data competes with model weights for RAM, and every network flush drains battery.

A typical production LLM app generates 200–400 log events per session. Most are duplicates: the same error message, the same slow-path warning, the same user prompt pattern seen 50 times. Naïve implementations store every event in an in-memory buffer, then batch-upload to Firebase or Mixpanel. On a mid-range Android device with 4GB RAM and a 1.5GB model loaded, that log buffer can balloon to 8–12MB before the next flush cycle. The device kills your app. The user loses context mid-inference.

We need deduplication. But exact-match hash tables require 16–24 bytes per entry (key hash + metadata + pointer overhead). For 10,000 unique events across a user's lifetime, that's 240KB minimum—still too expensive when you're fighting for every kilobyte near the model's memory ceiling.

Bloom Filters: Probabilistic Membership Testing

A Bloom filter is a bit array paired with k independent hash functions. To insert an element, hash it k times and set those bit positions to 1. To test membership, hash again; if all k bits are set, the element probably exists. If any bit is 0, it definitely does not exist.

The genius: false positives are acceptable for deduplication. If the filter incorrectly claims we've seen a log event before, we skip uploading it—no user harm, just slightly incomplete telemetry. False negatives are impossible by construction, so we never lose a genuinely new event.

For a 64KB filter (524,288 bits) with 7 hash functions and an expected 50,000 insertions, the false positive rate is 0.8%. That means 99.2% of duplicate events are correctly identified and suppressed, while only 0.8% of unique events are mistakenly dropped. In practice, across 200,000 sessions in a hearing-aid app shipping a 2.1GB Whisper model, this translated to an 83% reduction in uploaded telemetry bytes and a 40% drop in battery drain during background sync.

Implementation on iOS and Android

On iOS, allocate the bit array as a contiguous UInt64 buffer in Swift. Use SipHash with different seeds for your k hash functions—it's fast, cryptographically secure (preventing adversarial collisions), and built into the standard library. For a 64KB filter, that's 8,192 UInt64 words. Wrap it in a class with atomic access via os_unfair_lock to handle concurrent logging from inference threads and UI threads.

final class BloomFilter {
    private var bits: [UInt64]
    private let k: Int = 7
    private let size: Int
    private var lock = os_unfair_lock()

    init(sizeInBytes: Int) {
        self.size = sizeInBytes * 8
        self.bits = Array(repeating: 0, count: sizeInBytes / 8)
    }

    func insert(_ event: String) {
        os_unfair_lock_lock(&lock)
        defer { os_unfair_lock_unlock(&lock) }
        for seed in 0..