# MongoDB modeling mistakes that show up late

> Document shape, indexes, and the day the working set stops fitting in memory.

MongoDB does not fail at scale because documents are bad. It fails because the document shape stopped matching the way the product reads and writes data, and nobody noticed until the working set moved out of memory.

Node.js gets blamed for some of this too. Usually the real bug is simpler: too much data pulled into the app, too many per-row calls, a missing compound index, or a document that grew from "convenient" to "every screen in the business."

The database is not offended by your model. It is just executing it.

## Model around the hot path

The first question is not "embed or reference?" The first question is "what does the hot path need in one read?"

Embedding is great when the child data is small, bounded, and read with the parent. It is painful when the child data grows without limit, changes at a different cadence, or needs its own query path.

```ts title="order-shape.ts"
type OrderDocument = {
  id: string;
  buyerId: string;
  status: 'draft' | 'paid' | 'fulfilled' | 'cancelled';
  totals: { subtotal: number; tax: number; grandTotal: number };
  lines: { sku: string; qty: number; unitPrice: number }[];
};
```

That shape is fine until someone adds every shipment event, every support note, every payment attempt, and every audit trail to the same document because "it belongs to the order." Belongs is not a modeling rule. Access pattern is.

```mermaid
flowchart TD
  A[product read path] --> B{bounded child data?}
  B -->|yes| C[embed with projection]
  B -->|no| D[separate collection]
  D --> E[index for query shape]
  C --> F[watch document growth]
  E --> F
  F --> G[review p95 + working set]
```

<Decision title="Model for the path that gets hot">
  The right document shape is the one that keeps the common read narrow without hiding unbounded
  history inside a record everyone has to load.
</Decision>

## Indexes are part of the feature

An endpoint is not done when it returns the right JSON on your laptop. It is done when the query shape has an index that matches production cardinality.

For MongoDB, that means looking at the filter, sort, and projection together:

```js title="orders.indexes.js"
db.orders.createIndex({
  buyerId: 1,
  status: 1,
  createdAt: -1,
});
```

If the query filters on `buyerId`, filters on `status`, and sorts by `createdAt`, that compound index is not an optimization. It is the feature's support structure.

The index review should happen in code review. If the PR adds a query and no one can point to the matching index, the PR is not done.

## Watch memory before CPU

The nasty cliff is the working set. Everything feels fine while the hot indexes and documents fit in memory. Then the product grows, a document gets wider, a dashboard scans too much, and the database starts spending its life fetching pages.

The symptoms look like application problems:

- Node workers waiting on I/O;
- p95s drifting before p50s move;
- connection pools filling during dashboard traffic;
- one "simple" admin screen making every customer request slower.

The fix is rarely "rewrite it." The fix is usually smaller reads, better projections, a correct compound index, and moving unbounded history out of the hot document.

<Tradeoff title="Embedding buys speed until it buys coupling">
  Embedding is great when the data is small and read together. Once it grows at its own pace, every
  convenient read becomes a wider write, a harder migration, and a bigger working set.
</Tradeoff>

## The app should not repair the database shape

When the Node layer starts doing joins, filters, and sorts in memory, the model is already leaking.

You can get away with it early. A dozen records become a hundred. A hundred becomes ten thousand. Then a harmless endpoint is allocating large arrays so it can throw most of them away. That is not business logic. That is a query plan hiding in TypeScript.

I like Node for product work because it makes the path from API to UI short. That does not mean the app should compensate for a lazy database model. Keep the hot path narrow, name the indexes in the PR, and treat document growth as a production risk, not a storage detail.
