The Problem with Buffered JSON Parsing

Most mobile HTTP clients buffer the entire response body before parsing JSON. For a 2MB product catalog or 5MB user feed, that means allocating megabytes of RAM, blocking the main thread for 100–400ms, and delaying first paint. In production apps serving millions of users, this pattern creates measurable churn: users abandon screens that take >3 seconds to show content.

Traditional JSON parsers—JSON.parse() in JavaScript, json.decode() in Dart, JSONDecoder in Swift—are push-based. They consume the entire byte stream, build a full object graph in memory, then hand you a Map or Dictionary. For large responses, peak memory usage is 2–3× the wire size due to intermediate allocations. On a 2GB Android device running multiple apps, this triggers GC pauses and frame drops.

Incremental parsing solves this by emitting events as tokens arrive: StartObject, Key, String, EndArray. You process each event immediately, discard what you don't need, and never hold the full document. Memory usage becomes O(depth) instead of O(size). Latency drops because you start rendering partial data while the network is still streaming bytes.

Pull-Based Parser Architecture

A pull parser is an iterator over JSON tokens. Instead of callbacks (SAX-style), you call next() in a loop. This gives you control flow: you can skip subtrees, bail early, or delegate parsing to child objects. The core interface looks like this:

interface JsonPullParser {
  Token next();
  void skipValue();
  String readString();
  int readInt();
  bool readBool();
}

Each Token is an enum: StartObject, EndObject, StartArray, EndArray, Key, String, Number, Bool, Null. You advance the parser, inspect the token, and decide how to handle it. For example, parsing a product list:

while (parser.next() != Token.EndArray) {
  if (parser.next() == Token.Key && parser.readString() == "id") {
    productId = parser.readString();
  } else {
    parser.skipValue(); // discard unused fields
  }
}

The skipValue() method is critical. It fast-forwards over entire objects or arrays without allocating memory. For a 200-field API response where you need 10 fields, this cuts parse time by 60%.

Streaming from HTTP Response Bodies

Most HTTP clients return a Stream<Uint8List> or AsyncSequence<Data>. You feed these chunks into a streaming JSON lexer. The lexer maintains a small buffer (4KB–16KB) and emits tokens as soon as it has enough bytes. For a 2MB response arriving at 1 Mbps, you get the first tokens in ~50ms instead of waiting 16 seconds for full download.

In Flutter, using http package:

final response = await client.send(request);
final parser = JsonPullParser(response.stream);

while (parser.next() != Token.EndArray) {
  final product = parseProduct(parser);
  setState(() => products.add(product)); // update UI incrementally
}

This pattern lets you show a loading skeleton that fills in as data arrives. Users see the first 5 products in 200ms, the next 10 in 400ms, etc. Perceived latency drops by 300–500ms compared to buffering.

Memory Profiling: Before and After

I instrumented a Flutter e-commerce app (similar to Khosomati, a price aggregator built for Palestine's market) parsing a 1.8MB JSON response with 500 products. Each product had 40 fields; the app used 8.

Buffered parser (json_decode):

  • Peak memory: 12.4 MB (6.9× wire size due to Dart's UTF-16 strings and Map overhead)
  • Parse time: 340 ms on Snapdragon 665
  • Time to first product rendered: 850 ms

Incremental parser:

  • Peak memory: 3.1 MB (1.7× wire size, mostly network buffers)
  • Parse time: 180 ms (only parsing needed fields)
  • Time to first product rendered: 520 ms

Memory savings: 75%. Latency improvement: 330 ms. On low-end devices with 2GB RAM, this prevented OOM crashes during background app switches.

Handling Errors Mid-Stream

Streaming parsers fail fast. If the server sends malformed JSON at byte 50,000, you detect it immediately instead of after downloading 2MB. But you must handle partial state: if you've already rendered 20 products and the parse fails, do you show an error or keep the partial list?

Best practice: use optimistic rendering with a banner. Render what you've parsed, show a dismissible error message, and let the user decide whether to retry. This beats a blank screen with a generic error.

Implement a try-catch around the parse loop:

try {
  while (parser.next() != Token.EndArray) {
    products.add(parseProduct(parser));
  }
} on JsonParseException catch (e) {
  showErrorBanner("Partial data loaded");
  logError(e, products.length); // telemetry
}

Telemetry is key. Track how often partial failures happen, at what byte offset, and whether users retry. In production, 2–3% of streams fail mid-parse due to network hiccups or server bugs. Optimistic rendering keeps those users engaged.

Schema Evolution and Field Skipping

APIs evolve. New fields appear; old fields get deprecated. A pull parser makes schema changes cheap. If the server adds 10 new fields you don't need, skipValue() ignores them with zero CPU cost. No version negotiation, no breaking changes.

For backward compatibility, check if a key exists before reading:

if (parser.next() == Token.Key) {
  final key = parser.readString();
  if (key == "newField") {
    // handle it
  } else {
    parser.skipValue();
  }
}

This pattern is common in long-lived mobile apps where you can't force users to update. A 2-year-old client can still parse responses from a 2024 API as long as core fields remain stable.

Trade-Offs: When Not to Use Streaming Parsers

Incremental parsing adds complexity. You write more code than json.decode(response.body). For small responses (