Monotonic Timestamps: LLM Streaming UI Jitter Fix

The Problem: Wall Clock Lies

When streaming LLM tokens to a mobile UI at 60fps, visual jitter appears unpredictably. Users see tokens pause mid-sentence, then rush to catch up. The culprit isn't network latency or model speed—it's Date.now() or System.currentTimeMillis() jumping backward when NTP corrects the system clock.

Wall clocks measure real-world time. They're subject to leap seconds, DSR transitions, and network time synchronization. A typical NTP adjustment might shift the clock backward 200ms. If your frame scheduler uses wall time to compute deltas, that 200ms becomes negative. Your animation logic clamps it to zero, skipping frames. The result: visible stutter in what should be fluid text rendering.

This matters acutely in LLM streaming interfaces where tokens arrive at 15-40 tokens/second. Each token triggers a layout pass, scroll adjustment, and potential syntax highlighting update. Frame budgets are tight—16.67ms at 60fps, 8.33ms at 120fps. A single bad timestamp delta can cascade into dropped frames and perceived lag.

Monotonic Clocks: Time That Never Rewinds

Monotonic clocks measure elapsed time since an arbitrary epoch—typically system boot. They're immune to clock adjustments. On iOS, mach_absolute_time() provides nanosecond-resolution monotonic timestamps. Android offers SystemClock.elapsedRealtimeNanos(). Web platforms expose performance.now(), which is monotonic and high-resolution.

The guarantee: monotonic time never decreases. It may pause during system sleep on some platforms, but it never jumps backward. This makes it ideal for measuring intervals, scheduling animations, and computing frame deltas.

In practice, switching from wall time to monotonic time eliminates an entire class of timing bugs. Frame schedulers become deterministic. Profiling data becomes reliable. Rate limiters work correctly across clock adjustments.

Platform-Specific Implementations

In Swift for iOS, use DispatchTime.now() or raw Mach APIs:

let start = DispatchTime.now()
let elapsed = DispatchTime.now().uptimeNanoseconds - start.uptimeNanoseconds
let ms = Double(elapsed) / 1_000_000.0

In Kotlin for Android:

val startNs = SystemClock.elapsedRealtimeNanos()
val elapsedNs = SystemClock.elapsedRealtimeNanos() - startNs
val ms = elapsedNs / 1_000_000.0

In JavaScript for web or React Native:

const startMs = performance.now();
const elapsedMs = performance.now() - startMs;

Flutter exposes monotonic time via Stopwatch, which internally uses platform monotonic APIs. For frame callbacks, SchedulerBinding.instance.currentFrameTimeStamp is already monotonic.

Frame-Perfect Token Rendering

Consider a typical LLM streaming pipeline: tokens arrive over WebSocket, queue in a buffer, then render to a TextView or Text widget. The naive approach appends tokens immediately, triggering synchronous layout. At high token rates, this creates layout thrashing—multiple layouts per frame, each invalidating the previous.

A better approach batches tokens within frame boundaries. Collect all tokens that arrive during a frame interval, then render them atomically. This requires precise frame timing:

class TokenRenderer {
  private val tokens = mutableListOf<String>()
  private var lastFrameNs = SystemClock.elapsedRealtimeNanos()
  
  fun onTokenReceived(token: String) {
    tokens.add(token)
    scheduleRenderIfNeeded()
  }
  
  private fun scheduleRenderIfNeeded() {
    val nowNs = SystemClock.elapsedRealtimeNanos()
    val deltaMs = (nowNs - lastFrameNs) / 1_000_000.0
    
    if (deltaMs >= 16.67) {
      renderBatch()
      lastFrameNs = nowNs
    } else {
      postFrameCallback { renderBatch() }
    }
  }
  
  private fun renderBatch() {
    if (tokens.isEmpty()) return
    textView.append(tokens.joinToString(""))
    tokens.clear()
  }
}

This pattern ensures one layout pass per frame, regardless of token arrival rate. Monotonic timestamps guarantee deltaMs is always positive and accurate, even during NTP adjustments.

Scroll Synchronization

Auto-scrolling to the latest token introduces another timing challenge. Scrolling too frequently causes motion sickness; scrolling too infrequently makes the UI feel laggy. The solution: scroll at a fixed cadence using monotonic intervals.

Maintain a scroll deadline. When tokens render, check if the current monotonic time exceeds the deadline. If so, scroll and set a new deadline 100ms in the future. This produces smooth, predictable scrolling independent of token rate:

private var scrollDeadlineNs = 0L
private val scrollIntervalNs = 100_000_000L // 100ms

private fun renderBatch() {
  textView.append(tokens.joinToString(""))
  tokens.clear()
  
  val nowNs = SystemClock.elapsedRealtimeNanos()
  if (nowNs >= scrollDeadlineNs) {
    scrollView.smoothScrollTo(0, textView.bottom)
    scrollDeadlineNs = nowNs + scrollIntervalNs
  }
}

Users perceive this as fluid, natural scrolling. The monotonic clock ensures scroll timing remains consistent across device sleep, app backgrounding, and system clock changes.

Profiling and Debugging

Monotonic timestamps enable accurate performance profiling. Measure end-to-end latency from WebSocket receive to pixel paint:

data class TokenMetrics(
  val receiveNs: Long,
  val renderNs: Long,
  val paintNs: Long
) {
  val receiveToRenderMs get() = (renderNs - receiveNs) / 1_000_000.0
  val renderToPaintMs get() = (paintNs - renderNs) / 1_000_000.0
  val totalMs get() = (paintNs - receiveNs) / 1_000_000.0
}

Aggregate these metrics to identify bottlenecks. In production LLM chat apps, typical p50 latencies are 8-12ms receive-to-render, 4-6ms render-to-paint. P99 spikes above 30ms indicate layout thrashing or garbage collection pressure.

When building HearingAid Pro's real-time DSP interface, similar monotonic timing discipline kept audio processing jitter below 2ms even during heavy UI updates. The same principles apply to LLM streaming: use monotonic clocks for all interval measurements, wall clocks only for user-facing timestamps.

Cross-Platform Abstractions

In cross-platform frameworks like Flutter or React Native, abstract monotonic time behind a platform interface. Flutter's Stopwatch already does this. In React Native, create a native module:

// iOS
@objc(MonotonicClock)
class MonotonicClock: NSObject {
  @objc func now(_ resolve: RCTPromiseResolveBlock, rejecter reject: RCTPromiseRejectBlock) {
    let ns = DispatchTime.now().uptimeNanoseconds
    resolve(Double(ns) / 1_000_000.0)
  }
}

// Android
class MonotonicClockModule(reactContext: ReactApplicationContext) : ReactContextBaseJavaModule(reactContext) {
  override fun getName() = "MonotonicClock"
  
  @ReactMethod
  fun now(promise: Promise) {
    val ms = SystemClock.elapsedRealtimeNanos() / 1_000_000.0
    promise.resolve(ms)
  }
}

Expose this to JavaScript, then use it consistently for frame scheduling, animation, and profiling. This ensures timing behavior is identical across iOS, Android, and web.

Practical Tradeoffs

Monotonic clocks solve interval measurement but complicate timestamp storage. You can't serialize a monotonic timestamp and compare it across app launches—the epoch resets on reboot. For persistent timestamps, store wall time but never use it for delta calculations.

A hybrid approach: store {wallTimeMs, monotonicMs} pairs. Use wall time for display and persistence, monotonic time for scheduling and profiling. When the app resumes from background, re-anchor the monotonic clock to the current wall time.

Another consideration: monotonic clocks don't tick during system sleep on some platforms. For timeouts that must fire even after sleep, use SystemClock.elapsedRealtime() on Android (which includes sleep time) or CACurrentMediaTime() on iOS. Understand your platform's semantics.

Conclusion

Monotonic timestamps eliminate an entire class of timing bugs in LLM streaming UIs. They provide frame-perfect scheduling, accurate profiling, and smooth animations immune to system clock adjustments. The cost is minimal—a single function call per frame. The benefit is substantial: users experience fluid, stutter-free token rendering at 60fps, even during NTP sync or daylight saving transitions.

For any real-time UI work—LLM streaming, audio visualization, game rendering, video playback—monotonic clocks are the foundation. Use wall clocks for user-facing timestamps. Use monotonic clocks for everything else.