Trie-Based Autocomplete: 10ms P99 on 100K Entries

The Autocomplete Problem at Scale

Autocomplete is table stakes in modern apps—users expect instant feedback as they type product names, addresses, or medical terms. But naive implementations fall apart past 10,000 entries. Linear scans are too slow. Hash tables can't do prefix matching. Binary search on sorted arrays works but requires custom implementations and careful memory layout.

The trie (prefix tree) is the classic solution, but textbook tries are memory hogs. A naive trie for 100,000 English words can consume 50+ MB due to pointer overhead. On mobile, where memory is contested and garbage collection pauses matter, we need better.

This article walks through building a production autocomplete system that handles 100,000+ entries with sub-10ms P99 latency on mid-tier Android and iOS devices. We'll cover compressed tries, pruning strategies, and the surprising cases where simpler structures win.

Trie Fundamentals: Why Prefix Trees

A trie stores strings by sharing common prefixes. The word "car" and "cart" share the first three characters, so they share nodes in the tree. Each node holds a character and pointers to children. Leaf nodes (or marked nodes) indicate complete words.

Search complexity is O(m) where m is the query length—independent of dataset size. This makes tries ideal for autocomplete: typing "glu" walks three nodes to find "glucose", "gluten", "glue".

But the memory cost is brutal. Each node needs:

Character storage (1-4 bytes depending on encoding)
Child pointers (typically 8 bytes × branching factor)
Metadata: end-of-word flag, frequency score, payload pointer

For English text with 26 letters, that's 26 × 8 = 208 bytes per node in the worst case. A 100K-word dictionary might have 500K nodes = 100 MB before compression.

Compressed Trie: Radix Tree Optimization

The first optimization is collapsing single-child chains. In a standard trie, "algorithm" creates nine nodes: a→l→g→o→r→i→t→h→m. A radix tree (compressed trie) stores "algorithm" as a single node with the full string.

This cuts node count by 60-80% for natural language. Implementation requires storing variable-length strings per node, but modern allocators handle this well. In Dart/Flutter:

class RadixNode {
  String edge;  // "algo" instead of 'a'
  Map<String, RadixNode> children;
  bool isTerminal;
  double score;  // for ranking results
}

The Map uses the first character as key for O(1) child lookup. For high branching factors, a HashMap beats array-based storage. For low branching (≤4 children), a flat List with linear scan is faster due to cache locality.

Memory Layout: Struct-of-Arrays Pattern

Object overhead in managed runtimes is significant. Each Dart object has a header (16 bytes on 64-bit). A naive node with five fields = 16 + (5 × 8) = 56 bytes before payload.

Struct-of-arrays inverts the layout: instead of 100K node objects, allocate parallel typed arrays:

class CompactTrie {
  List<String> edges;        // edge labels
  List<int> childStart;      // index into children array
  List<int> childEnd;
  List<int> childIndices;    // flat child list
  List<bool> terminals;
  List<double> scores;
}

This reduces per-node overhead to ~32 bytes and improves cache performance—walking the trie reads contiguous memory. The cost is index arithmetic and less ergonomic code. For 100K entries, this saves 20-30 MB.

Ranking and Pruning: Top-K Results

Autocomplete must rank results. Frequency-based scoring is simplest: store usage counts during indexing, normalize to 0-1 scores. During search, collect all matching entries and sort by score.

But returning all matches is wasteful. Users see 5-10 results. Pruning during traversal is key:

Track a min-heap of size K (e.g., 10)
During DFS traversal, compare node score to heap minimum
If score ≤ min, prune entire subtree (no descendants can rank higher if scores are monotonic)
If score > min, insert and continue

This cuts search time by 40-60% for typical queries. The trick is ensuring scores are monotonic: child scores must be ≤ parent scores. For frequency-based scoring, this holds if children inherit parent frequency as an upper bound.

Fuzzy Matching: Levenshtein Distance

Exact prefix matching fails on typos. "glukose" should suggest "glucose". Levenshtein distance (edit distance) allows 1-2 character differences.

Naive approach: compute edit distance for every entry. Too slow. Better: bound the search space using trie structure. At each node, track remaining edit budget. If budget exhausted, prune.

void fuzzySearch(RadixNode node, String query, int pos, int budget, List<Result> out) {
  if (budget < 0) return;
  if (node.isTerminal) out.add(...);
  
  for (var child in node.children.values) {
    // exact match: no cost
    if (query[pos] == child.edge[0]) {
      fuzzySearch(child, query, pos + 1, budget, out);
    }
    // substitution: cost 1
    fuzzySearch(child, query, pos + 1, budget - 1, out);
    // insertion: cost 1
    fuzzySearch(child, query, pos, budget - 1, out);
  }
}

This is still exponential in edit distance, but practical for budget ≤ 2 and pruning low-score branches early. For 100K entries, fuzzy search with budget=1 adds ~3ms to P99 latency.

Serialization: Protobuf for Cold Start

Building the trie at app launch is too slow (200-500ms for 100K entries). Pre-build and serialize.

JSON is bulky and slow to parse. Protobuf or FlatBuffers are better. FlatBuffers has an edge: zero-copy deserialization. The serialized buffer is mmap'd and accessed in-place—no parsing step. For a 5 MB trie, this cuts load time from 150ms (JSON) to 8ms (FlatBuffers).

Schema:

table TrieNode {
  edge: string;
  children: [int];  // indices into node array
  terminal: bool;
  score: float;
}
table Trie {
  nodes: [TrieNode];
}

The flat node array enables struct-of-arrays layout naturally. On iOS, use mmap() with MAP_PRIVATE for copy-on-write safety.

When Hash Tables Win

Tries aren't always the answer. For exact-match lookups (no prefix search), a hash table is simpler and faster. For datasets under 1,000 entries, a sorted array with binary search is competitive—simpler code, better cache behavior, and most mobile CPUs have optimized strcmp.

For multilingual datasets (Arabic, Chinese), the branching factor explodes. A trie node might have 10,000 children. Hash tables or ternary search trees (TST) scale better. TSTs are a hybrid: each node has three children (less-than, equal, greater-than), reducing branching to 3 while keeping prefix-search capability.

Real-World Numbers

Benchmarks on a mid-tier Android device (Snapdragon 680) with 100K medical terms:

Naive trie: 62 MB, 18ms P99 search
Radix trie: 38 MB, 12ms P99
Struct-of-arrays radix: 22 MB, 9ms P99
With top-10 pruning: 18 MB, 6ms P99
FlatBuffers cold start: 8ms vs 180ms JSON

For comparison, a HashMap with linear scan over matches: 15 MB, 42ms P99. The trie's advantage grows with dataset size and query complexity.

Practical Considerations

In production apps, autocomplete often pulls from multiple sources: local cache, server API, recent searches. Merging results requires a priority queue across sources. Network results arrive async—use a debounced stream that cancels in-flight requests on new keystrokes.

For large datasets (1M+ entries), consider a two-tier design: a small in-memory trie for frequent terms, and a larger on-disk trie (SQLite FTS or custom B-tree) for long-tail queries. The in-memory tier handles 90% of queries; disk tier adds 20-50ms for rare terms.

Conclusion

Autocomplete is a solved problem in theory but full of trade-offs in practice. Compressed tries offer the best balance of speed, memory, and flexibility for 10K-1M entry datasets on mobile. Struct-of-arrays layout and top-K pruning are force multipliers. For smaller datasets or exact-match needs, simpler structures often win.

The key is measuring. Profile with realistic data and device tiers. P99 latency matters more than averages—one slow query ruins perceived performance. Tools like Flutter DevTools Timeline and Android Profiler show where time goes. Optimize the hot path, accept complexity where it pays off, and keep fallback strategies for edge cases.