THESIS

How Albex works inside.

Albex is a full-text search engine that lives entirely in the user's browser. That sentence is the thesis. Every other choice — the algorithms, the memory model, the absence of an allocator — is a corollary of it. This page walks through how the pieces fit together.

Four axioms follow:

Zero server. No back-end ever. Documents never transit the network.
Human latency. Searches must answer within ~100 ms — the perception threshold for "instant".
Reduced footprint. Bytes shipped to the browser are a tax on every page load.
Adapts to the machine. The same package runs on a Chromebook and a Mac Pro, picking what each can support.

SIX LAYERS

The layer map

The codebase is split into six layers. Dependencies flow strictly upward: a layer never calls down into the one below it. This is what lets the algorithm layer be testable without WASM, and the adaptive layer be evolved without touching Rust.

Pure algorithms

Rust no_std · core/

Bloom filter, Bitap matcher, light Spanish stemmer, Unicode fold tables. No heap, no allocator, no I/O. Testable on host without WASM.

Streaming parsers

Rust no_std · ingest/

XML byte state machines for DOCX and XLSX. Same constraints as layer 0. New Rust formats live here.

WASM shell

Rust · wasm/

BSS arrays, C ABI exports, EngineState bookkeeping, resumable search, tier feature flags. Pegs layers 0 and 1 to a JS-facing interface.

PDF module (separate)

Rust + std · pdf-wasm/

pdf-extract wrapper. Its own WASM binary (~1 MB) loaded lazily on first PDF — never paid by users who do not touch PDFs.

TypeScript orchestrator

TS · src/

AlbexEngine API, format indexers in TS, query parser, typed errors, persistence layer, content hashing.

Adaptive capabilities

TS · src/{profile,resource-manager,pool/,gpu/,tiered-store}.ts

Profile detection, resource awareness, worker pool, WebGPU runtime, tiered storage. Strictly opt-in; the base engine works without any of it.

INTERACTIVE MAP

The whole machine, end to end

Every box below is a real piece of Albex — from the browser drop zone, through the JS↔WASM scratchpad bridge, into the Rust core and each algorithm, out to the PDF/OCR path, persistence, workers and the GPU. The animated edges trace the hot search cascade: Bloom → trigram → Bitap → score → top-K. Click any node to read how that piece works in detail.

Loading interactive diagram…

Browser / front · PDF · OCR TypeScript · WASM core · storage JS↔WASM bridge · algorithms · GPU animated = the hot search cascade

PIPELINE

From query to ranked results in four stages

Parse query

Detect simple / phrase / OR. Up to 4 tokens. Optional stemming.

< 5 µs

Bloom filter

AND + compare per chunk against the pattern Bloom.

2 instr / chunk

Bitap fuzzy

Bit-parallel match on the chunks that passed Bloom.

~10 ns / chunk

Score + top-K

Rich scoring (accuracy + WB + TF + position + proximity + IDF) into a min-heap.

O(log K)

Why this order matters

The cost ladder is deliberate: the cheapest filter runs first on every chunk; the expensive ones only on what survives. A typical search tests 100 000 chunks but runs the Bitap matcher on under 1 % of them. The Bloom filter is the gatekeeper that keeps total search latency under 10 ms even on big corpora.

BLOOM FILTER

64-bit probabilistic gate

A Bloom filter answers "could the pattern be here?" in two machine instructions: an AND and a compare. We use a single u64 per chunk, with each character mapped to one of 64 buckets by c & 0x3F.

The function hash is intentionally trivial. A cryptographic hash would give better distribution but cost 10–100× more CPU. The Bloom runs on every chunk in the corpus — sometimes 100 000 times per query — so raw speed wins over perfect distribution.

For typical European text the filter rejects 80–95 % of chunks before the more expensive Bitap stage. False positives are unavoidable but harmless: Bitap discards them at no extra cost.

BITAP

Bit-parallel fuzzy match

The Shift-Or / Wu-Manber algorithm keeps the matching state inside a single u64 register. Each text byte advances the state with a shift + OR + AND — no backtracking, no per-character allocations, no branches dependent on the data.

We extend it to k errors (substitution, insertion, deletion) by maintaining k+1 parallel registers. Albex caps k at 3: more produces noise on short queries and multiplies the linear cost. Pattern length is capped at 64 bytes — the width of the register. Longer queries are truncated with a truncated flag the host can surface in its UI.

MEMORY MODEL

BSS-only, no allocator

The default wasm32-unknown-unknown target has no OS to call for memory. Pulling in std would drag in dlmalloc or wee_alloc, inflate the binary, add allocation overhead, and introduce a failure mode (OOM) that is hard to surface cleanly to JavaScript.

Instead, every storage region is a zero-initialised static array. Because the BSS segment is described in the binary but not stored, a .wasm with 16 MB of static arrays still weighs ~33 KB on disk. At runtime the WebAssembly linear memory grows to accommodate the layout (~20 MB for std tier), but this grow happens once at instantiation — no per-call allocation, no fragmentation, no OOM during a search.

The trade-off: capacity is a fixed ceiling. Exceeding it is a silent stop, not an error. That is acceptable because the user already chose a tier; a "give me more space" feature would mean an allocator, which would mean fragmentation and unpredictable latency.

ADAPTIVE RUNTIME

Same package, different machines

Albex ships six variants of the main WASM binary (three capacity tiers, each with and without SIMD) plus the lazy PDF module. The browser fetches one variant per init() call, picked from the device profile.

mini

32 docs · 25 000 chunks · 4 MB text

when deviceMemory ≤ 1 GB

working set ~5 MB · binary 33 KB

std

128 docs · 100 000 chunks · 16 MB text

when deviceMemory 2–7 GB

working set ~20 MB · binary 33 KB

pro

1 024 docs · 800 000 chunks · 128 MB text

when deviceMemory ≥ 8 GB

working set ~160 MB · binary 33 KB

SIMD on top of every tier

The runtime probes for WebAssembly v128 by attempting to validate a tiny module that uses it. If it validates, the engine fetches albex_wasm_<tier>_simd.wasm — a binary built with -C target-feature=+simd128. The inner loop of the Bloom batch is branchless precisely so LLVM can auto-vectorise it.

Why "8 GB" appears as a ceiling everywhere

The W3C navigator.deviceMemory spec caps reported memory at 8 GB for privacy. A Mac Pro with 192 GB reports 8 just like an iPad with 16 GB. We can't distinguish them. The pro tier (128 MB pool) is the largest default because the working set must still fit comfortably in a browser tab — Chrome starts flagging pages as "high memory" around 500 MB.

For genuine archive-scale corpora the right answer is the TieredStore (LRU eviction with OPFS-persisted blobs) or the AlbexPool sharded across N workers, not a tier with 1 GB pool.

WEBGPU PRE-FILTER

Bloom on the GPU when it pays off

The Bloom check on 100 000 chunks is bashfully parallel: each test is independent. A WGSL compute shader can run all 100 000 of them in a single dispatch. The result is a packed bitset of candidates; the CPU then runs Bitap only on those.

Engaging the GPU has a fixed cost (upload + dispatch + readback, ~3–5 ms). Below ~20 000 chunks the CPU wins. Above it, the GPU gives a 5–10× speedup. The default threshold is exactly that break-even point, but it is exposed as gpuThreshold for tuning.

The candidate mask is single-shot: it applies to the next searchBegin and is automatically cleared at the end of the last slice. That prevents stale masks from silently corrupting an unrelated next search if anything ever goes wrong in the JS layer.

RUST + TYPESCRIPT

When each language wins

Some parsers live in Rust; others live in TypeScript. The rule is not "Rust for important things". It is: which primitive of the host environment wins for this operation?

DOCX parsing

Rust

Streaming XML state machine. Files reach hundreds of MB; loading the DOM would be infeasible.

XLSX parsing

Rust

Same as DOCX. Shared strings tables get huge in real-world workbooks.

PDF extraction

Rust

pdf-extract dependency is mature and large. Worth its own module loaded lazily.

Bloom + Bitap hot loop

Rust

Bit-parallel arithmetic. WASM v128 wins clearly over JS bitwise.

Markdown / HTML / RTF

Files fit in memory; V8 regex is JIT-compiled C++ — faster than a hand-written WASM parser.

JSON parsing

JSON.parse is C++ inside the JS engine — no Rust port can beat it.

Content hash FNV-1a

One-shot per file, not a hot path. JS bitwise is fine at 100 MB/s.

WebGPU shader

WGSL

GPU language. No alternative.

OPFS / IndexedDB

Browser APIs; unreachable from WASM.

Worker orchestration

new Worker() is a JS API; postMessage protocol lives where the workers live.

DOCX, XLSX and PDF go to Rust because the source files can be hundreds of megabytes and streaming a state machine is the only way to handle them. Markdown, HTML, JSON, CSV, EML and RTF go to TypeScript because they fit comfortably in memory and V8 gives us String.prototype.replace with JIT-compiled regex, which beats any hand-written WASM parser for documents of typical size.

DEEPER DIVES

Where to read more

This page summarises the parts that come up most often in conversation. For the full pseudocode, layout tables and side-by-side cost comparisons, check the in-repo Maintainer's guide (MAINTAINER.md) and Technical deep-dive (TECHNICAL.md).

API Reference →Every method, type and option Guides →Recipes by use case Source →Repo with MAINTAINER + TECHNICAL