⊹ FAST.JSON.VIEWER guide ← back to viewer

The fun bugs of building the fastest JSON viewer

Case files from a viewer that opens 40 GB JSON files in a browser tab. When line numbers pass 5 billion, byte offsets pass 4 billion, and scrollbars can't address a single line, JavaScript's sharp corners come out to play: 32-bit trapdoors, fractional bytes, and canvases lost above the viewport.

json huge file 32-bit wasm blob.slice virtual scrolling bug case files

The case of the missing 4.3 billion lines

A 39.6 GB file parsed and validated perfectly, but pressing G in the formatted viewer stopped at line 1,060,831,138 — deep inside the document — while the raw and hex tabs reached the end just fine. The true formatted line count was 5,355,798,434, and the collapse model stored it with an innocent-looking | 0, which coerces through a signed 32-bit integer and silently threw away exactly 232 lines. Every scroll target was then clamped to the wrapped total, so no amount of scrolling could ever get past it. Raw and hex never consult the collapse model, which is why they were immune. One character of source, 4,294,967,296 lines gone.

// collapse-model.js — before
setTotalLines(lines) { totalLines = lines | 0; }

5_355_798_434 | 0  // === 1_060_831_138  (5,355,798,434 − 2³²)

// after — Float64 holds integers exactly up to 2⁵³
setTotalLines(lines) { totalLines = Math.max(0, Math.floor(lines) || 0); }

The case of the time-travelling tail

With the missing lines restored, G reached the correct last line number — but the bytes painted there were from the middle of the file, not the closing-bracket cascade that actually ends the document. The WASM parser's ABI carries chunk-relative byte offsets as u32, on the documented assumption that "chunks are well under 4 GB" — and 39.6 GB split across 8 workers makes 4.95 GB chunks. So every line-index sample past the 4 GiB mark of each chunk wrapped 4 GiB low, and the viewer faithfully painted bytes from four gigabytes earlier at the bottom line numbers. Validation still passed, because stitching chunk states never needs an absolute offset — only the display path travelled back in time. The fix: cap chunks below 4 GiB at planning time (a 40 GB file now simply gets 10 workers instead of 8), and make the WASM loader throw instead of ever wrapping silently again.

one worker chunk (4.95 GB), offsets stored as u32:

0 ─────────────────────────── 4.29 GB ──────── 4.95 GB
        offsets correct       │   offsets wrap to 0…
                              │   sample.byteOffset = (real − 2³²)
                              ▼
  viewer paints the file's tail with bytes from ~4 GiB earlier
Two bugs, one moral: JavaScript numbers are fine to 253, but every | 0, >>> 0, Uint32Array and u32 ABI field is a 4-billion trapdoor. Below 4 GB none of them fire — which is exactly why a viewer built for huge files has to test past the wrap.

The case of the half byte

To place the scrollbar on a multi-gigabyte file, the viewer projects lines to bytes through a density estimate — so the byte offsets it asks for are routinely fractional, and File.slice() quietly truncates them, so for months nobody noticed. Then edit mode wrapped the file in a piece-table overlay that impersonates a Blob, and its read() passed the fraction straight into new Uint8Array(docEnd - docStart) — which throws a RangeError on a non-integer length. The paint path caught the throw and painted nothing: scroll an edited file and the viewer went blank. The whole edit-mode feature shipped, got reverted, and was only resurrected once the overlay learned to truncate exactly like the Blob it pretends to be. If you fake a platform API, you inherit its quirks — including the undocumented forgiving ones.

// piece-table.js — the overlay must truncate like Blob.slice does
// before: fractional docEnd - docStart → new Uint8Array(102.4) → RangeError
docStart = Math.max(0, Math.min(docSize, docStart));

// after: mirror Blob semantics — truncate first
docStart = Math.trunc(Math.max(0, Math.min(docSize, docStart)));
const out = new Uint8Array(docEnd - docStart); // always an integer now

The case of the blank screen at the end

In the raw tab, pressing G (or End) jumped to a perfectly blank viewport — no error, no content, just void. The byte-window engine positions its rendered canvas from a bytes-per-line estimate (RAW_BYTES_PER_LINE = 96), but real text word-wraps tighter than the estimate predicts, so at maximum scroll the entire canvas landed above the visible viewport. The end-of-file correction only handled the opposite miss — clamping a canvas that overflowed past the scroll area — never one that fell short. Best part: the existing regression test compared scroll deltas with sign-blind Math.abs, so a canvas parked off-screen passed it for months. The fix pulls the canvas down until its bottom covers the viewport bottom, capped so the file's last byte still ends exactly at the scroll area's end.

estimate says…                     reality (raw wraps tighter):

┌ spacer ───────────┐              ┌ spacer ───────────┐
│ ▓▓▓ canvas ▓▓▓    │              │ ▓▓▓ canvas ▓▓▓    │ ← canvas ends
│ ▓▓▓▓▓▓▓▓▓▓▓▓▓    │              ├───────────────────┤   up here
├───────────────────┤ ← viewport   │                   │
│ ▓▓ last lines ▓▓  │   (scrolled  │      blank        │
│ ▓▓▓▓▓▓▓▓▓▓▓▓▓    │    to end)   │      screen       │
└───────────────────┘              └───────────────────┘

More case files

Being written up next:

⊹ Open the viewer