How we made editing a 20 GB JSON file in the browser possible

You cannot load 20 GB into a JavaScript string. You cannot even load it into RAM on most machines. So how do you let someone double-click a value and change it? With a data structure from 1974 and one coordinate trick.

json huge file piece table data structures editing

The problem

Fast JSON Viewer opens files up to 20 GB by refusing to ever hold the file. The File you drop stays on disk; the viewer streams it through workers once to validate it and build a sparse index, and after that it reads only the few kilobytes your viewport needs, with file.slice(start, end). Nothing in the app scales with file size. That is the entire reason it works.

Editing looks like the natural next feature, and it looks impossible under that rule. Every intuition about editing text assumes the document is a string in memory: change a character, splice the string. But a 20 GB document cannot be a string. V8 caps a single string at roughly half a gigabyte; an ArrayBuffer that size will fail to allocate on most machines; and even a machine with 64 GB of RAM would spend minutes copying bytes around for every keystroke. The naive plan is dead before the first edit.

Worse, the viewer's machinery is aggressively read-only. The sparse line index maps display lines to byte offsets in the original file. The collapse index — up to 25 million entries in typed arrays — stores byte offsets in the original file. Search results are byte offsets in the original file. Change one value from 3.14 to 99.9 and every offset after it is wrong by one byte. Rebuilding those indexes per keystroke means re-scanning 20 GB per keystroke.

So the constraints are: never materialize the document, never rewrite the indexes, and still let the user change any value and download the result. All three at once.

The technique: stop editing the file

The answer is old. It powered Bravo, the first WYSIWYG editor, in the 1970s, and it powers VS Code today: the piece table. The idea is that you never edit the original bytes at all. The original file stays frozen exactly as it was opened. The document becomes a recipe — an ordered list of pieces, where each piece says "copy this range of the original file" or "copy this range of a small buffer holding text the user typed."

Before any edit, the recipe is one line long:

// The document = the whole base file, verbatim.
pieces = [
  { src: BASE, off: 0, len: 21_474_836_480 }   // all 20 GB, by reference
]
addBuffer = []                                  // nothing typed yet

That piece does not contain 20 GB. It contains three numbers. It points at the file on disk, the same way the viewer's paint path always has.

A simple example

Say the file contains this, and the user double-clicks 3.14 and types 99.9:

{"name":"Ada Lovelace","pi":3.14,"flag":false}
                             ^^^^
                             bytes 28–32

Editing = splitting one piece into three. The typed text goes into the append-only addBuffer; the old bytes are simply no longer referenced:

// After replaceRange(28, 32, "99.9"):
addBuffer = "99.9"

pieces = [
  { src: BASE, off: 0,  len: 28 },   // {"name":"Ada Lovelace","pi":
  { src: ADD,  off: 0,  len: 4  },   // 99.9   ← the user's text
  { src: BASE, off: 32, len: 14 },   // ,"flag":false}
]

Reading the document back means walking the recipe: 28 bytes from the file, 4 bytes from the buffer, 14 bytes from the file. Concatenated, that is the edited document — and at no point did it exist as one contiguous thing. On the 20 GB file the picture is identical, just with bigger numbers:

  BASE file (20 GB, immutable, on disk)
  ┌──────────────────┬──────┬────────────────────┐
  │   bytes 0…12G    │ XXXX │   bytes 12G+4…20G  │
  └──────────────────┴──────┴────────────────────┘
            │            ✕             │
            │       (replaced — no     │
            │      longer referenced)  │
            ▼                          ▼
  pieces:
  ┌──────────────────┐  ┌────────────┐  ┌──────────────────┐
  │ BASE 0 … 12G     │→ │ ADD "99.9" │→ │ BASE 12G+4 … 20G │
  └──────────────────┘  └──────▲─────┘  └──────────────────┘
                               │
                    addBuffer (the user's bytes, in RAM)

Ten edits mean roughly twenty-one pieces. A thousand edits mean roughly two thousand pieces — about 100 KB of bookkeeping against a 20 GB file. Memory is O(edits), never O(file). That is the whole trick, and it is fifty years old.

The actual data structure

The production version (in packages/core/src/edit/piece-table.js) adds two things to the textbook picture: prefix sums for fast lookups, and snapshots for undo. Trimmed to its skeleton:

function createPieceTable(baseSize) {
  // Pieces in document order. Immutable once created — splits make new
  // objects — so an undo snapshot is just a shallow copy of this array.
  let pieces = [{ src: BASE, off: 0, len: baseSize }];

  // Append-only. Bytes are never overwritten or freed, so snapshots
  // taken before an edit stay valid forever.
  let addBuffer = new Uint8Array(4096); let addLen = 0;

  // Rebuilt after every edit (O(pieces), and pieces stay few):
  let docStarts;         // document offset where pieces[i] begins
  let baseOff, baseDoc;  // BASE pieces only: base offset → doc offset

  const undoStack = [], redoStack = [];

  function replaceRange(docStart, docEnd, bytes) {
    undoStack.push(pieces);          // snapshot = one array copy
    pieces = splice(pieces, docStart, docEnd, appendAdd(bytes));
    rebuildPrefixSums();
  }

  function undo() { redoStack.push(pieces); pieces = undoStack.pop(); ... }

  // Binary search over the BASE pieces: where does original-file
  // offset b live in the edited document?
  function docOffsetOf(b) { ... }

  // Stitch any document range from base slices + addBuffer ranges.
  async function read(docStart, docEnd, readBase) { ... }
}

Undo is worth pausing on, because the piece table makes it embarrassingly easy. Pieces are immutable and the add buffer is append-only, so "the document before this edit" is fully described by the old pieces array. Undo does not reverse anything; it just puts the old array back. A bounded stack of array snapshots gives unlimited-feeling undo/redo for kilobytes.

The coordinate trick

The piece table solves memory. It does not by itself solve the second constraint: all those indexes full of original-file byte offsets. The obvious move — translate every index entry to new offsets after each edit — is exactly the O(file) rebuild we swore off.

So we flipped the direction of translation. The indexes are never touched. They keep original-file coordinates forever. Instead, the edited document is exposed through a view that speaks original-file coordinates:

// Same contract as the File itself — but slice() answers with the
// EDITED bytes standing where that original range used to be.
const view = createEditedFileView(file, pieceTable);

view.slice(baseStart, baseEnd)
  // = document bytes from docOffsetOf(baseStart) to docOffsetOf(baseEnd)

docOffsetOf is a binary search over the surviving BASE pieces — O(log edits), a few nanoseconds. An offset before any edit maps to itself. An offset after an edit shifts by the accumulated size delta. An offset inside a replaced range maps to the replacement's position. That last rule gives the property everything else leans on:

Slicing the view by consecutive original-file blocks yields the edited document, exactly, each byte exactly once. Our test suite asserts this for block sizes 1, 3, 7, 16 and 1024 — if it holds at block size 1, no block boundary can ever land wrong.

The payoff is that the rest of the app did not change. The viewer's paint path already worked by "find the index sample for this line, slice() from its byte offset, format what comes back" — hand it the view instead of the file and it paints edited content, off the untouched index. The streaming download already worked by "read 1 MB blocks from offset 0 to EOF, format, write to disk" — hand it the view and it saves the edited 20 GB with O(1) memory. Both changed by roughly one argument.

The quirk we had to inherit

"Same contract as the File" has to be taken literally, corners included — and one corner bit us. Blob.slice() silently truncates fractional offsets: slice(0, 30.5) reads 30 bytes, no complaint. It turns out the viewer leans on that. Its line-density projection converts a scroll position into a byte offset by multiplying a fraction of the file size, so it routinely hands slice() a non-integer like 12884901.5 and trusts the File to floor it.

The first cut of the edited view forwarded those fractions straight through docOffsetOf into the read path — where the stitched output was sized with new Uint8Array(docEnd - docStart). A fractional length there is not truncated; it throws RangeError. The paint path caught the throw and drew nothing, so scrolling an edited document flashed blank rows. It shipped, the regression was spotted, and the whole feature was reverted for a day while we found it.

The fix is one word — Math.trunc — applied wherever an offset enters the view (slice, read, replaceRange), so the overlay floors its inputs exactly like Blob.slice does. A node regression test now pins it: fractional arguments must return byte-for-byte what their truncated integers return. The lesson is worth more than the diff — when you impersonate a built-in, you inherit its quirks, not just its happy path.

Why only scalars (for now)

There is one honest restriction in the first version: you can change values and rename keys, but a replacement must be a single JSON scalar — a string, number, true, false or null. Not because the piece table cares (it splices arbitrary bytes), but because of a structural invariant worth protecting:

A scalar-for-scalar swap never changes the document's shape. Every { and [ stays where it was, so the nesting depth and container type at every byte outside the edited range are untouched — which means every line-index sample and all 25 million collapse entries remain semantically valid, not just positionally translatable. The line count doesn't change either, so the scrollbar, the gutter numbers and the visible↔raw mapping all hold. Deleting or inserting whole entries breaks that (line counts shift, commas need surgery on possibly-minified bytes) — solvable, designed, and deliberately a phase two.

Validation rides the same restriction: a replacement is checked locally, in microseconds, because a well-formed scalar dropped between the same neighbors cannot invalidate anything elsewhere in 20 GB.

What it adds up to

Open: 20 GB file, streamed once, never resident. Unchanged.
Edit: double-click a value, type, Enter. Splits a piece: ~100 bytes of bookkeeping.
Repaint: viewport slice through the view, same index, one binary search extra.
Undo: put the previous piece array back.
Save: stream base ranges and typed ranges to disk in order. O(1) memory, one pass.

The lesson generalizes past JSON: when a document is too big to hold, stop thinking of editing as changing bytes and start thinking of it as describing a new arrangement of old bytes. The description is small even when the document is enormous. Everything else — painting, saving, undoing — becomes a way of reading the description.

⊹ Open the viewer Read the guide