# Offline-first mobile when the network is bad

> The cluster and API only matter if the field worker can capture intent, reconnect, and trust the result.

Offline-first is easy to say in a roadmap and painful to ship in the field.

The hard part is not caching a screen. The hard part is a mobile app used on a farm connection, with photos or measurements captured at the edge, operators moving between weak signal and no signal, and a backend that still has to reconcile the truth when the device comes back.

Kubernetes and GraphQL can help here. They can also become a pile of moving parts around a problem that needed a smaller contract. The useful design starts with the shape of failure.

```mermaid
sequenceDiagram
  participant App as Offline app
  participant Api as GraphQL API
  participant Queue as Worker queue
  participant Model as AI/model job
  App->>App: store local intent + idempotency key
  App->>Api: sync when signal returns
  Api->>Api: validate against server truth
  Api->>Queue: enqueue slow processing
  Queue->>Model: run image/model work
  Model-->>Api: publish result
  Api-->>App: canonical entity + conflicts
```

## The client owns intent, not truth

Offline mutations should record intent, not pretend to be the final state.

If the app captures a count, a weight estimate, a note, or an image annotation, the local write gets a client id, a timestamp, and an idempotency key. The server accepts it as a command, validates it against the current state, and returns the canonical record. That distinction keeps the client useful without letting it invent reality.

```ts title="offline-mutation.ts"
type OfflineMutation = {
  id: string;
  idempotencyKey: string;
  entityId: string;
  operation: 'recordMeasurement' | 'attachImage' | 'updateNote';
  payload: unknown;
  createdAt: string;
};
```

The queue on the phone is boring on purpose. It has three states: pending, syncing, settled. Anything more complicated belongs on the server.

<Principle title="Intent can live offline; truth has to reconcile">
  The device can preserve what the user tried to do. The server still owns the final state, conflict
  rules, timestamps, and any AI output that becomes customer-visible.
</Principle>

## GraphQL is the sync contract, not the architecture

GraphQL was useful because the client needed precise reads and clear mutations. It was not useful because "GraphQL" sounds modern.

The schema did two jobs:

- expose the smallest read model the screen needed;
- make mutation responses rich enough that the client could repair local state after sync.

That second part matters. A mutation response that returns only `ok: true` forces the client to guess. A response that returns the canonical entity, conflicts, and server timestamps lets the client settle itself.

```graphql title="schema.graphql"
type SyncResult {
  entity: FarmObservation
  acceptedAt: DateTime!
  conflicts: [SyncConflict!]!
}
```

## Kubernetes should make the platform dull

The cluster's job was not to make the app feel cloud-native. The cluster's job was to make deploys repeatable, isolate workloads, and give AI, API, and worker services the same operational surface.

For this kind of product, the useful Kubernetes work is ordinary:

- separate API, worker, and model-processing workloads;
- keep resource requests honest, especially around image and model jobs;
- make logs and traces readable by product flow, not pod trivia;
- define deployment checks and ownership before the mobile team depends on an endpoint;
- keep secrets and config boring enough that a new engineer can reason about them.

If the cluster is the most interesting part of the system, something is wrong.

<Tradeoff title="Offline-first adds product states">
  The app has to show pending, synced, conflicted, and failed work without making the field user
  feel punished for a bad network. That UI complexity is worth paying when the alternative is lost
  trust.
</Tradeoff>

## The real feature is trust after reconnect

Users do not care that your app has an offline queue. They care that work they did in bad signal does not disappear, duplicate, or come back slightly wrong.

That is the bar. Local intent, server truth, idempotent sync, rich mutation responses, boring operations. The stack can be Kubernetes and GraphQL. The product still succeeds or fails on whether the next reconnect feels uneventful.
