Epoch-Based Conflict Resolution in Offline-First Apps

The Last-Write-Wins Trap

Most offline-first mobile apps ship with a naive sync strategy: timestamp the write, upload when connected, let the server pick the winner. This breaks catastrophically in real-world scenarios—two field agents editing the same patient record on separate devices, both syncing hours later. The user who synced second loses their work, silently. In healthcare apps like KidzCare, where speech therapy session notes might be entered offline across multiple clinicians, this data loss is unacceptable.

The root issue is that wall-clock timestamps are not causally ordered. Device clocks drift, users travel across time zones, and NTP corrections can make timestamps jump backward. A write at 14:03:01 on Device A might represent state that logically preceded a write at 14:03:00 on Device B if B's clock was fast. Last-write-wins (LWW) conflates chronological order with causal order.

Logical Clocks and Causal History

Lamport timestamps and vector clocks solve this by tracking happens-before relationships rather than wall time. A Lamport clock is a single integer counter incremented on every local event and updated to max(local, received) + 1 on message receipt. This guarantees that if event A causally preceded event B, then timestamp(A) < timestamp(B). However, the converse is not true—concurrent events may have arbitrary Lamport ordering.

Vector clocks extend this to a map of {deviceId: counter}. Each device increments its own counter on writes and merges incoming vectors element-wise with max(). Two vector clocks V1 and V2 are concurrent if neither dominates the other—that is, there exist indices where V1[i] > V2[i] and V1[j] < V2[j]. This precisely captures the causality frontier.

Implementation Sketch

class VectorClock {
  private Map<String, int> clock;
  private String deviceId;

  void increment() {
    clock[deviceId] = (clock[deviceId] ?? 0) + 1;
  }

  void merge(VectorClock other) {
    for (var entry in other.clock.entries) {
      clock[entry.key] = max(
        clock[entry.key] ?? 0,
        entry.value
      );
    }
  }

  Ordering compare(VectorClock other) {
    bool thisAhead = false, otherAhead = false;
    var allKeys = {...clock.keys, ...other.clock.keys};
    for (var key in allKeys) {
      int t = clock[key] ?? 0;
      int o = other.clock[key] ?? 0;
      if (t > o) thisAhead = true;
      if (o > t) otherAhead = true;
    }
    if (thisAhead && !otherAhead) return Ordering.after;
    if (otherAhead && !thisAhead) return Ordering.before;
    return Ordering.concurrent;
  }
}

This Dart snippet shows the core mechanics. On every write, the local device increments its entry. On sync, the app merges the remote vector and compares: if one vector dominates, the newer state wins deterministically. If concurrent, the app must invoke a merge strategy.

Epoch Counters for Bounded Memory

Vector clocks grow unbounded as the set of devices expands. In a multi-tenant SaaS product like Khosomati (price aggregation with offline OCR capture), hundreds of field agents might contribute data. A full vector clock would balloon to kilobytes per document.

Epoch-based clocks trade precision for compactness. Instead of tracking every device, the system maintains a global epoch counter (a monotonic integer) and a per-document {epoch, deviceId, localSeq} tuple. The server increments the epoch on schema migrations, tenant resets, or periodic boundaries (e.g., daily). Within an epoch, devices use local sequence numbers. Cross-epoch comparisons fall back to epoch order; within-epoch comparisons use vector clock logic on the subset of active devices in that epoch window.

This caps metadata size at O(active_devices_per_epoch) rather than O(all_devices_ever). In practice, if an epoch spans 24 hours and at most 50 devices are concurrently online, the vector is 50 entries instead of 5,000. The tradeoff: cross-epoch conflicts default to epoch-wins rather than precise causal resolution, which is acceptable for many workflows where daily boundaries align with operational shifts.

Epoch Transition Protocol

struct EpochVersion {
  epoch: u64,
  deviceId: String,
  seq: u32,
  vectorClock: Map<String, u32>,
}

fn merge_epochs(local: EpochVersion, remote: EpochVersion) -> Resolution {
  if local.epoch > remote.epoch {
    return Resolution::LocalWins;
  }
  if remote.epoch > local.epoch {
    return Resolution::RemoteWins;
  }
  // Same epoch: use vector clock comparison
  match compare_vectors(&local.vectorClock, &remote.vectorClock) {
    Ordering::After => Resolution::LocalWins,
    Ordering::Before => Resolution::RemoteWins,
    Ordering::Concurrent => Resolution::Conflict(local, remote),
  }
}

This pseudocode (Rust-flavored for clarity) shows the decision tree. Epoch boundaries provide a cheap total order fallback, while within-epoch vector clocks preserve causal accuracy.

Merge Strategies for Concurrent Writes

When vector clocks detect concurrency, the app must merge. Three common patterns:

Operational Transform (OT): Replay operations in a canonical order. Used in collaborative text editors. Requires commutative operation definitions—complex for arbitrary data models.
CRDTs (Conflict-Free Replicated Data Types): Data structures with built-in merge semantics. A G-Counter (grow-only counter) merges by taking the max of each device's contribution. An OR-Set (observed-remove set) tracks add/remove operations with unique IDs to resolve add-wins or remove-wins. CRDTs guarantee convergence without coordination.
Application-Specific Merge: Custom logic for domain entities. For a patient record, merge might union the set of diagnoses, take the latest vitals by measurement timestamp (not sync timestamp), and flag conflicting prescription changes for manual review.

In GlucoScan AI, where PPG glucose readings are timestamped by device sensor time (not wall clock), concurrent writes from the same user session are rare but possible if the app crashes and restarts mid-session. The merge strategy unions readings by sensor timestamp, deduplicates within a 500ms window (assuming sensor jitter), and recalculates the rolling glucose estimate from the merged set. This is deterministic: both devices converge to the same result regardless of sync order.

Sync Protocol Design

A pull-based sync protocol pairs well with epoch clocks. The mobile client sends its current {epoch, deviceId, seq, vectorClock}. The server responds with:

All documents with epoch > client.epoch (full overwrite).
Within the same epoch, documents where the server's vector clock does not dominate the client's (potential conflicts).
A new epoch boundary marker if the server has advanced epochs since the last sync.

This minimizes payload size—only deltas and conflicts are transmitted. The client applies remote-wins for dominated updates, invokes merge logic for concurrent updates, and persists the new vector clock. On the next write, the client increments its sequence and merges the server's vector, ensuring causal consistency even if the write occurs offline.

Conflict Storage

Unresolved conflicts are stored as sibling versions with their vector clocks. The UI surfaces these to the user (e.g., "Session notes from Dr. A and Dr. B were edited offline—review and merge"). In OfflineAI, where users edit prompt templates offline, conflicting templates are saved as separate drafts with a "Resolve" button that opens a diff view. This design avoids silent data loss while keeping the happy path (no conflicts) seamless.

Performance Characteristics

Vector clock comparison is O(n) in the number of devices, but with epoch bounding, n is typically 10–100, making it negligible (