Viewport Culling for Mobile LLM Token Streams

Streaming LLM interfaces—chatbots, document generators, code assistants—emit tokens at 20–60 per second. On desktop, browsers handle thousands of DOM nodes without breaking a sweat. On mobile, appending 2,000+ tokens to a ScrollView while maintaining 60fps becomes a CPU bottleneck. Users report scroll jank, thermal throttling, and battery drain during long generations. The culprit: rendering everything, even offscreen content.

Viewport culling—rendering only visible UI elements—is standard in game engines and virtual lists. Applying it to token streams requires rethinking how React, Flutter, and SwiftUI handle incremental text. This article dissects the problem, walks through a production implementation from a multi-platform LLM chat app, and shares measured performance wins.

The Problem: Unbounded DOM Growth

A typical streaming LLM UI appends each token to a TextView or Text component. After 10 seconds at 40 tokens/sec, you have 400 nodes. After two minutes, 4,800. Mobile layout engines recompute styles, reflow ancestors, and repaint on every append. Profiling a React Native chat app generating a 5,000-token response showed:

CPU: 78% sustained on iPhone 13, triggering thermal throttle at 90 seconds
Frame drops: 18fps average during scroll + generation
Battery: 12% drain over 3-minute session

The app was rendering 4,200 offscreen tokens. Users scrolling to review earlier messages experienced stuttering because React reconciled the entire subtree on every state update.

Virtualization Meets Streaming

Virtual lists (react-window, FlutterListView) render a sliding window of items. But LLM tokens arrive one at a time, not as a fixed array. We need:

A buffer holding all tokens (for search, copy, scroll-to-top)
A viewport slice rendered to DOM/widget tree
Automatic expansion as new tokens stream in
Scroll position preservation when culling/unculling chunks

Standard virtual list libraries assume static item heights and batch updates. Tokens vary in width (wrapping affects height), and updates happen at 20–60Hz. Custom solution required.

Architecture: Token Buffer + Render Window

We maintain two data structures:

interface TokenBuffer {
  tokens: string[];
  startIndex: number;  // first visible token index
  endIndex: number;    // last visible token index
  viewportHeight: number;
  scrollTop: number;
}

On each token arrival, append to tokens. Compute which slice is visible based on scrollTop and estimated row height. Render only tokens.slice(startIndex, endIndex). Use CSS transforms or Flutter offsets to position the visible chunk correctly within a scrollable container.

Estimating Token Heights

We can't measure every token's rendered height without defeating the purpose. Instead:

Sample 100 tokens during first render, measure average line height (typically 18–24px)
Track wrap events: if a token contains spaces and viewport width is known, estimate wrapping via character count and average char width
Store a sparse height map for chunks of 50 tokens, updating on scroll

This gives ±10% accuracy, acceptable because we over-render by 50 tokens above/below viewport as a buffer.

Scroll Anchoring

When tokens stream in while user is scrolled up, the viewport shouldn't jump. We calculate the pixel offset of the first visible token before adding new tokens, then adjust scrollTop to maintain that anchor. In React:

const anchorOffset = 
  scrollTop - (startIndex * avgLineHeight);
// append tokens
setScrollTop(
  (newStartIndex * avgLineHeight) + anchorOffset
);

Flutter's ScrollController and SwiftUI's ScrollViewReader provide similar primitives. The key is separating logical token index from pixel position.

React Implementation

We use a custom hook wrapping useState and useEffect:

function useViewportTokens(
  allTokens: string[],
  scrollTop: number,
  viewportHeight: number
) {
  const avgLineHeight = 22; // measured
  const buffer = 50;
  const startIdx = Math.max(
    0,
    Math.floor(scrollTop / avgLineHeight) - buffer
  );
  const endIdx = Math.min(
    allTokens.length,
    Math.ceil(
      (scrollTop + viewportHeight) / avgLineHeight
    ) + buffer
  );
  return {
    visibleTokens: allTokens.slice(startIdx, endIdx),
    offsetY: startIdx * avgLineHeight
  };
}

The parent component renders a container with height: totalTokens * avgLineHeight, then positions the visible slice at transform: translateY(offsetY). This creates a virtual scrollable area without rendering thousands of nodes.

Handling Token Arrival

On WebSocket message, append to allTokens array. If user is scrolled to bottom (within 50px of end), auto-scroll. Otherwise, maintain anchor. Throttle re-renders to 16ms (60fps) using requestAnimationFrame to batch rapid token bursts.

Flutter Implementation

Flutter's ListView.builder is lazy but doesn't handle streaming well. We use a CustomScrollView with a SliverList and a custom SliverChildBuilderDelegate:

SliverChildBuilderDelegate(
  (context, index) {
    final adjustedIndex = startIndex + index;
    return Text(tokens[adjustedIndex]);
  },
  childCount: endIndex - startIndex,
)

Wrap in a SliverPadding with top padding of startIndex * lineHeight to offset the visible slice. Update indices in a StreamBuilder listening to token events. Flutter's rendering pipeline handles the rest.

Measured Impact

Testing on iPhone 13 Pro, Galaxy S22, and Pixel 7 with a 6,000-token generation:

CPU usage: 78% → 26% average (68% reduction)
Frame rate during scroll: 18fps → 58fps
Memory: 340MB → 180MB (fewer layout nodes)
Battery drain: 12% → 5% over 3 minutes

Scroll jank eliminated. Thermal throttling delayed by 4× (360 seconds vs 90). Users could now generate 10,000+ token documents without performance degradation.

Edge Cases and Tradeoffs

Search: Ctrl+F or in-app search must scan the full token buffer, not just visible slice. We maintain a separate index (trie or simple string array) updated on token append.

Copy/paste: Selection must work across culled boundaries. We render invisible anchor nodes at cull points or use a hidden textarea with full text for clipboard operations.

Markdown rendering: If tokens contain markdown, parsing happens per visible token. We cache parsed AST chunks keyed by token range to avoid re-parsing on scroll.

Accessibility: Screen readers need the full content. We provide an aria-live region with the last 200 tokens and a "read full response" button that expands everything.

When Not to Use This

If your LLM responses are