Engineering · Apr 25, 2026 · 13 min readAll posts →
Engineering

Whatwelearnedbuildingoffline-firstforwarehouses

Three years of building a WMS that works without a network: what we learned, what we got wrong, and what we'd do differently if we started over.

Warehouses are radio-frequency hostile environments. Metal racking everywhere, concrete walls, the occasional Faraday-cage cold storage room, forklifts emitting electrical noise. WiFi works most of the time, but "most of the time" is not a viable substrate for a production system that operators rely on every second.

Early on, we made the call that every action a Nautilus operator takes (every scan, voice command, count entry, adjustment) has to work without a network. The device should never wait for a round trip before confirming an action. The expensive part of building a WMS that feels instant is not the AI or the integrations. It's the offline-first sync layer underneath.

We've now been operating this layer for about three years, including across a few customers in metal-clad facilities where the WiFi drops every couple of minutes. This is what we got wrong, what we got right, and what we'd do differently.

Why naive queueing fails

The first thing you'd reach for is action queueing: every user action gets serialized into a queue on the device, the queue flushes to the server whenever the network is up. This works for a single user. It falls apart the moment you have multiple operators making changes to overlapping state.

Picture two pickers in the same aisle, both offline, both with a pick list that includes SKU X from bin 12. Both walk up at roughly the same time, both scan the bin, both see "9 units available." Both pick "1 unit." Both devices update their local count to 8. Both queue the action.

When the network comes back, both queues flush. The server now receives two "pick 1 from bin 12" actions. If it processes them in order, the count goes 9 → 8 → 7. But the local devices both think the count is 8. The next time either device syncs down from the server, the count it shows will jump down by one in a way that looks like a glitch.

That's the simplest case. It gets much worse when the conflicting actions are different types: one device adjusts the count, another relocates the item, the third is in the middle of a count audit that assumes a specific quantity. Naive queueing produces inconsistent state across devices and unhappy users.

What we ended up with

Our sync layer treats every entity (SKU, bin, lot, count) as a state machine with a known set of transitions, and every operator action as an event with a few properties: a unique ID, a timestamp from the originating device, a vector clock of what the device knew at the time, and the actual operation.

When events arrive at the server, they're applied in an order that respects causality (you can't pick from a bin you don't know exists yet) rather than wall-clock time. Conflicts that the order-of-arrival can't resolve are sent through a rules engine that knows, for each pair of conflicting operation types, what to do.

For the "both pickers scan the same bin" case: the count operations don't actually conflict if both are decrements within available stock. We accumulate them, and the bin count ends up correct (9 → 7). What we surface to the user, optionally, is a notification: "your colleague also picked from bin 12 while you were offline." This isn't strictly necessary for correctness, but it's necessary for not freaking people out.

For relocations that conflict with picks: pick wins. Relocation gets returned to the operator as "this item is no longer at the location you intended to move it from." This matches what would happen in the physical world if both operators were online; the picker would have grabbed the item before the mover got there.

For adjustments that conflict with anything: adjustment wins, because adjustments represent the operator's belief about ground truth, and we trust the human in front of the bin over the system's belief about what should be there.

These rules took months to enumerate. We didn't get them right at first.

The data layer

We use SQLite on every device, wrapped with WatermelonDB for the reactive query layer. Every entity the operator can interact with (products, locations, lots, open work) is replicated locally. On Android and iOS, the database is around 12 to 40 MB for a typical warehouse, depending on SKU count and history depth.

We sync incrementally: the device pulls only events newer than its last successful sync, and pushes only events the server hasn't acknowledged. On a slow connection, the typical sync round trip is 200 to 400 KB. On a fresh device, the initial bootstrap is bigger (a few MB), but it's a one-time cost.

One thing we got wrong early: we tried to sync everything on a fixed interval (every 30 seconds, then every 10 seconds, then every 3 seconds). All of these felt slow when you wanted real-time, and wasteful when nothing had changed. We now use a hybrid. Events push immediately when the device is online (sub-second), and a polling sync runs every few minutes as a backstop in case something was missed. This was the obvious answer in hindsight.

The receipts problem

One subtle failure mode: an operator performs an action while offline, the action successfully syncs to the server, but the device never receives the confirmation receipt before the device crashes or restarts. On reconnect, the device might re-send the same action, producing duplicate events.

We solved this with idempotency keys on every event (a UUID generated at action time, stored locally until acknowledged) and server-side deduplication keyed on the UUID. The server is the source of truth for "has this event been processed"; the client treats the absence of a receipt as "I should try again, knowing the server will dedupe me."

This is standard distributed-systems stuff, but it bit us harder than expected because warehouse devices are tougher than warehouse software developers usually imagine. They drop from belts, get knocked off shelves by forklifts, run out of battery in inconvenient places. Hard crashes are routine. We treated transient device failure as the common case, not the exception.

Voice commands and ordering

When we added voice commands, we hit a new sync problem we hadn't anticipated. Voice actions process locally and can complete much faster than scan actions. ("Adjust count B3-441 to 47" runs as soon as the speech model returns, around 200ms.) Scan actions involve more pipeline. So in a single operator's stream, you can have voice events that originated at time T+0 and scan events that originated at time T-0.5 but completed at T+0.3.

If the server orders these by completion time, the resulting state is wrong: the scan's earlier intent loses to the voice's later intent. We had to be explicit that the device's intent timestamp (when the user spoke or pressed the button) is what matters, not when the action's processing finished.

This sounds obvious in writing. It was not obvious in code. Three production incidents and a lot of customer support tickets before we figured out the right invariant.

Battery and bandwidth

Sync chattiness costs battery. Every time the radio comes on, it costs maybe 80 mAh of power. A device that's syncing constantly on a flaky connection burns its battery in half a shift.

We backed off our sync strategy over time. Initially we treated each event as worth pushing on its own. Now we batch: events sit in the local outbox for up to 800 ms before pushing, on the assumption that an operator will likely produce another event soon. This nearly halved our average radio-on time for picking workflows without measurably affecting perceived responsiveness.

On bandwidth: warehouses are sometimes on metered connections (cellular failover when WiFi is down), so we use protobuf-encoded events instead of JSON to roughly quarter our wire size. We considered going further with a custom binary format and decided against it. The engineering cost wasn't worth the additional savings, and protobuf has decent debugging tooling.

What we'd do differently

Two things, if we were starting over.

We would not build our own reactive query layer on top of SQLite. WatermelonDB has been good to us, but maintaining the bridge between the database, the React Native UI, and the sync layer has been a constant tax. If we did it again, we'd probably reach for SQLite + Drizzle ORM + a thinner observability layer, and accept some performance loss in exchange for less custom plumbing.

We would also start with a coarser conflict model and let it grow, rather than the reverse. Our initial rules engine had distinct cases for every pair of operation types. We later collapsed many of them into three patterns (last-writer-wins, accumulate, manual-resolution) and have not noticed the loss of expressiveness. Simpler conflict models are easier to reason about, easier to debug, and easier for our support team to explain to customers when something does go wrong.

If you're building something similar

Treat the device as untrusted. Operators install random apps, drop the device, run it through hot wash cycles by accident. Anything that can't survive arbitrary client misbehavior shouldn't be on the client.

Make idempotency keys mandatory from day one. Retroactively adding them after you discover a duplicate-event bug is much more painful than starting with them.

Pay attention to perceived latency, not actual latency. A user who scans an item and waits 600ms for the green checkmark will rate the app as slow. A user who scans, sees a green checkmark in 30ms (rendered locally before any sync), and waits 600ms for the server to process the action will rate the app as fast. The local optimistic update is what they feel.

Don't trust device clocks. Use server-issued timestamps for ordering wherever possible, and treat device timestamps as ordering hints that can be corrected.

Plan for the case where two operators legitimately produced the same event ID by accident. It happens. Operators share devices, then the shared device gets cloned to a new model, the clone duplicates some local state, and now a UUID that was supposed to be unique is in fact present on two devices. Yes, really, more than once.

Test in dead zones. We didn't, originally. We thought "100% packet loss for 5 minutes" was an unrealistic test case. Then a customer in northern Idaho had it happen every hour during shift change because of a misbehaving 5 GHz repeater. Test it.

Closing

Offline-first is not glamorous work. It's slow, fiddly, and most of the wins are invisible. When it works, no one notices, because it just feels like a normal app. When it doesn't work, every operator on the floor notices immediately, and they're not shy about telling you. We've made our peace with that distribution. It's the price of building software that warehouses can actually depend on, on radios that drop more often than the vendors will admit.