Zero-Downtime Schema Migration: Mobile SQLite

Every mobile app that ships SQLite eventually faces the same crisis: you need to add a column, split a table, or denormalize for performance—but 200,000 devices are already running version 1.0 of your schema. Naive ALTER TABLE statements block writes, risk data loss on interrupted migrations, and fail catastrophically when users skip app versions. After shipping schema changes across HearingAid Pro, KidzCare, and GlucoScan AI—apps where losing a user's calibration data or therapy progress is unacceptable—I've converged on a pattern that guarantees zero downtime and zero data loss, even when migrations take 30 seconds on a 2GB database.

Why ALTER TABLE Fails in Production

SQLite's ALTER TABLE is deceptively simple for adding nullable columns, but it has three fatal flaws for mobile:

Exclusive locks: Adding a column acquires an exclusive table lock. On a 500MB therapy session database, this can block writes for 8–12 seconds. If a user taps "Save" during migration, the app hangs or crashes.
Non-atomic multi-step changes: Renaming a column requires creating a new table, copying data, dropping the old table, and renaming. If the app is killed mid-migration (battery dies, OS force-quit), you're left with partial state.
No rollback on schema errors: If migration N+2 fails but N+1 succeeded, you can't safely downgrade. The old app version crashes on the new schema.

In GlucoScan AI, we once shipped a migration that added a calibration_v2 column. On iPhone 8 devices with 10,000+ PPG readings, the migration took 18 seconds. Users who opened the app during a glucose spike (when they needed it most) saw a frozen UI. Crash rate spiked 4.2× in the first 48 hours.

Shadow Table Pattern: Copy-on-Read

The fix is to treat schema changes like blue-green deployments. Instead of mutating the live table, create a shadow table with the new schema, lazily copy rows on read, and atomically swap when complete.

Step 1: Create Shadow Table

On app launch, detect schema version mismatch and create readings_v2 alongside readings_v1:

CREATE TABLE IF NOT EXISTS readings_v2 (
  id INTEGER PRIMARY KEY,
  timestamp INTEGER NOT NULL,
  glucose REAL,
  ppg_raw BLOB,
  calibration_offset REAL DEFAULT 0.0,  -- new column
  migrated_at INTEGER
);

The migrated_at column tracks which rows have been copied. Crucially, this DDL is instantaneous—no data copying yet.

Step 2: Dual-Write New Data

All new writes go to readings_v2 only. This keeps the migration logic simple and avoids write amplification. The app's data layer checks schema version and routes inserts:

if (schemaVersion >= 2) {
  db.insert('readings_v2', data);
} else {
  db.insert('readings_v1', data);
}

Reads become slightly more complex: query readings_v2 first, then fall back to readings_v1 for unmigrated rows.

Step 3: Background Batch Migration

A background thread copies rows in 500-row batches, wrapped in transactions:

BEGIN IMMEDIATE;
INSERT INTO readings_v2 (id, timestamp, glucose, ppg_raw, calibration_offset, migrated_at)
SELECT id, timestamp, glucose, ppg_raw, 0.0, strftime('%s','now')
FROM readings_v1
WHERE id NOT IN (SELECT id FROM readings_v2)
LIMIT 500;
COMMIT;

Each batch takes 80–120ms on mid-range Android. The IMMEDIATE lock prevents write starvation but releases quickly. Crucially, if the app is killed, the next launch resumes from the last committed batch—no corruption, no lost progress.

Step 4: Atomic Swap

Once SELECT COUNT(*) FROM readings_v1 WHERE id NOT IN (SELECT id FROM readings_v2) returns zero, perform the swap in a single transaction:

BEGIN EXCLUSIVE;
DROP TABLE readings_v1;
ALTER TABLE readings_v2 RENAME TO readings_v1;
PRAGMA user_version = 2;
COMMIT;

This takes under 10ms. The exclusive lock is acceptable because migration is complete—no more dual-reads.

Handling Skipped Versions

Users don't upgrade linearly. Someone might jump from v1.0 (schema 1) to v1.5 (schema 4), skipping v1.2 (schema 2) and v1.3 (schema 3). The naive approach—chaining migrations—fails if intermediate steps assume schema state that never existed.

The solution: migrations must be idempotent and order-independent. Each migration checks current schema version and applies only the delta:

void migrate(Database db, int currentVersion, int targetVersion) {
  if (currentVersion < 2 && targetVersion >= 2) {
    applyMigration2(db);  // adds calibration_offset
  }
  if (currentVersion < 3 && targetVersion >= 3) {
    applyMigration3(db);  // adds session_id FK
  }
  if (currentVersion < 4 && targetVersion >= 4) {
    applyMigration4(db);  // denormalizes user stats
  }
}

Each applyMigrationN uses CREATE TABLE IF NOT EXISTS and ALTER TABLE IF NOT EXISTS (SQLite 3.35+). On Android API 21–29, which ships SQLite 3.8, we polyfill by querying pragma_table_info before altering.

Rollback Safety: Write-Ahead Log

SQLite's WAL mode is non-negotiable for this pattern. Without WAL, the shadow table and original table compete for the same rollback journal, causing write conflicts. With WAL:

PRAGMA journal_mode = WAL;
PRAGMA synchronous = NORMAL;  // fsync on commit, not every write

This gives us two guarantees:

Concurrent reads during migration: Readers see a consistent snapshot while the background thread writes to readings_v2.
Crash recovery: If the app is killed mid-batch, the WAL is replayed on next launch. Partially written batches are discarded; completed batches are preserved.

In KidzCare, where therapists record 50+ speech samples per session, enabling WAL reduced migration-related crashes from 1.8% to 0.02%.

Performance: Real-World Numbers

Migrating 50,000 rows (GlucoScan AI's 90th percentile user) with the shadow table pattern:

Total time: 6.2 seconds (vs. 18 seconds with blocking ALTER TABLE)
UI jank: Zero frames dropped (vs. 340ms freeze with naive approach)
Write throughput during migration: 94% of baseline (vs. 0% with exclusive lock)
Battery impact: 0.8% over 10 minutes (background thread sleeps 200ms between batches)

The key tradeoff: storage overhead. During migration, disk usage is 1.6–1.9× the original table size (shadow table + WAL). For a 2GB database, that's an extra 1.2–1.8GB. On devices with under 2GB free space, we pause migration and show a gentle "Free up space to continue" prompt. In practice, this affects under 0.3% of users.

Edge Cases and Lessons

Foreign key constraints: Disable FK checks during migration (PRAGMA foreign_keys = OFF), then re-enable and validate (PRAGMA foreign_key_check). If validation fails, roll back the swap and log for manual inspection.

Indexes: Create indexes on the shadow table after migration completes, not before. Building an index on an empty table is instant; building on 50,000 rows during migration adds 2–3 seconds and blocks reads.

User-visible progress: Show a non-blocking banner: "Optimizing database (42%)… Safe to continue using the app." Users tolerate background work; they don't tolerate frozen UIs.

Testing: Simulate every upgrade path in CI—v1→v2, v1→v4, v2→v4. Use sqlite3 to inject corrupted schemas and verify rollback. Kill the test process mid-migration 1,000 times and check for orphaned tables.

When to Use This Pattern

Shadow tables are overkill for trivial migrations (adding a nullable column to a 10-row config table). Use them when:

Table has >10,000 rows
Migration will take >1 second on median hardware
Writes must remain unblocked (real-time data logging, chat, etc.)
Downtime or data loss is unacceptable (healthcare, finance, therapy apps)

For simple cases, a well-tested ALTER TABLE in a transaction is fine. But once you've shipped a blocking migration that crashes 5% of users, the shadow table pattern becomes non-negotiable.