AlbexEngine API

Albex exposes one primary class — AlbexEngine — that wraps the WebAssembly module and orchestrates indexing, search, persistence, and the adaptive runtime. Additional opt-in helpers (AlbexEngineWorker, AlbexPool, TieredStore, BloomGpu) are exported from subpath entry points so consumers only pay for what they use.

Install, import, use. The WASM binary travels with the npm package and your bundler (Vite, Webpack 5+, Next, esbuild, Rollup, Parcel 2, Bun, Deno) resolves it automatically through import.meta.url. No assets to copy. No paths to remember.

npm install albex
import { AlbexEngine } from "albex";

const engine = new AlbexEngine();
await engine.init();

await engine.indexFile(myFile);
const results = engine.search('contrato', { windowed: true });

The library ships six WASM variants of the main engine (3 capacity tiers × baseline/SIMD) plus a lazy 1 MB PDF module. The zero-config path loads the std-baseline binary, which works on every device. If you want runtime tier auto-selection (mini/std/pro picked from navigator.deviceMemory) serve the variants yourself and pass wasmBaseUrl. See the Architecture page for the rationale.

33 KB
main wasm (baseline)
~50 KB
total bundle gzipped (no PDF)
11
supported formats
0
runtime dependencies

new AlbexEngine(opts)

new AlbexEngine(opts: AlbexOptions)
Returns: AlbexEngine

Constructs the engine. The WASM module is NOT loaded yet — call init() before any other method. The constructor only validates options and stores them.

import { AlbexEngine } from "albex";

// Zero config — the WASM binary ships with the package and your bundler
// (Vite, Webpack 5+, Next, esbuild, Rollup, Parcel 2, Bun) resolves it
// automatically through `import.meta.url`. No assets to copy, no URL to
// configure, no path to remember.
const engine = new AlbexEngine();
await engine.init();

// Optional overrides — only if you want tier auto-selection or a CDN.
// new AlbexEngine({ wasmBaseUrl: "/assets" })  // serve the 6 variants yourself
// new AlbexEngine({ wasmUrl: "https://cdn.example.com/albex_wasm.wasm" })
AlbexOptions
FieldTypeDescription
wasmUrl?stringExplicit URL to the .wasm binary. Overrides every other option. Useful when serving from a custom CDN.
wasmBaseUrl?stringBase directory containing tier variants (`albex_wasm_<tier>[_simd].wasm`). Required only if you want runtime tier auto-selection. Leave undefined to fall back to the bundled std-baseline binary.
pdfWasmUrl?stringOverride for `albex_pdf.wasm`. By default the bundled module is resolved via `import.meta.url` and loaded lazily on first PDF.
tier?'auto'|'mini'|'std'|'pro'Capacity tier. `auto` requires `wasmBaseUrl` because the bundler cannot know which of the 6 binaries to copy. Without `wasmBaseUrl` the engine loads std by default.
simd?'auto'|'on'|'off'SIMD variant policy. Only effective when `wasmBaseUrl` is set.
gpu?'auto'|'on'|'off'WebGPU pre-filter policy. Default: `auto` (enabled when corpus > `gpuThreshold`).
gpuThreshold?numberMinimum chunk count to engage WebGPU. Default: 20 000.

engine.init()

engine.init(): Promise<void>
Returns: Promise<void>

Resolves the WASM URL (using wasmUrl or wasmBaseUrl + tier auto-detection), fetches the binary, instantiates it, runs initial setup, and subscribes the engine to the global ResourceManager. Throws AlbexInitError on fetch or instantiation failure.

await engine.init();
// engine.tier         → 'mini' | 'std' | 'pro'
// engine.simdEnabled  → boolean
// engine.gpuEngaged   → true after the first search that uses WebGPU

reset() & [Symbol.dispose]()

engine.reset(): void

Clears every indexed document and search result. The engine is immediately ready to index a fresh corpus. The WASM module instance is preserved (no re-fetch).

engine[Symbol.dispose](): void

TC39 explicit-resource-management hook. Resets state, unsubscribes from the resource manager, destroys the GPU device (if any), and nulls out internal references so the WASM instance becomes unreachable for GC. Use with using engine = new AlbexEngine(...) when available.

engine.indexFile()

engine.indexFile(file: File): Promise<IndexedDocument>
Returns: Promise<IndexedDocument>

Detects the format from the extension, parses the file, and streams text into the WASM index. Content is hashed (FNV-1a 64-bit) before indexing — if a document with the same hash already exists, the previous entry is returned and no work is done. Throws AlbexUnsupportedFormatError or AlbexParseError on failures.

Supported: .docx, .xlsx, .pdf, .md, .html/.htm, .json, .csv, .eml, .rtf, .txt, .xml.

const input = document.querySelector('input[type=file]');

input.addEventListener('change', async () => {
  for (const file of input.files) {
    const doc = await engine.indexFile(file);
    console.log(`${file.name}: ${doc.chunks} chunks, hash=${doc.contentHash}`);
  }
});

// Idempotent: re-indexing the same file is a no-op and returns the existing
// IndexedDocument (matched by FNV-1a content hash).
IndexedDocument
FieldTypeDescription
namestringOriginal file name from the File object.
extstringLowercase extension without leading dot.
chunksnumberNumber of chunks produced from this document.
indexTimeMsnumberWall-clock time spent indexing.
textBytesnumberBytes of indexed text contributed by this document.
docIdnumberStable identifier within the engine. Persists across compact().
contentHashstring64-bit FNV-1a hex hash of the source bytes. Used for dedup.

engine.searchCooperative()

engine.searchCooperative(query: string, opts?: SearchOptions): AsyncIterable<SearchResult>
Returns: AsyncIterable<SearchResult>

Cooperative variant of search(). The corpus is processed in slices; between slices the engine yields to the browser scheduler via scheduler.yield() (or requestAnimationFrame fallback). UI stays responsive during 50 ms+ scans. Use this in any interactive search box.

// Cooperative streaming search — main thread stays at 60 fps.
for await (const r of engine.searchCooperative('contrato', { frameBudgetMs: 8 })) {
  renderResult(r);   // render incrementally as results arrive
}
SearchOptions
FieldTypeDescription
windowed?booleanReturn cropped snippets with ASCII ellipsis markers instead of full chunks.
before?numberBytes of context before the match (default 60).
after?numberBytes of context after the match (default 120).
frameBudgetMs?numberSlice duration for searchCooperative() before yielding to the scheduler. Default 8 ms.
SearchResult
FieldTypeDescription
documentNamestringFile name as registered by indexFile().
locationnumberParagraph index (DOCX/TXT/MD/…) or page number (PDF, 1-based).
scorenumberComposite relevance score 0–1000 (higher is better).
snippetstringChunk text, optionally windowed with `"... "` / `" ..."` sentinels.
matchStartnumberByte offset of the primary token start in `snippet`.
matchEndnumberByte offset of the primary token end (exclusive).
matchesMatchSpan[]All matched token spans in query order. Length 1–4.

Tuning knobs

engine.setMaxErrors(n: 0 | 1 | 2 | 3): void
Returns: void

Maximum edit distance for fuzzy match. 0 = exact only. Engine auto-shrinks for short queries.

engine.setMaxErrors(1);
engine.setThreshold(n: number): void
Returns: void

Minimum score (0-1000) below which results are dropped. Default 250.

engine.setThreshold(400);
engine.setMaxResults(n: number): void
Returns: void

Cap on number of returned results. 1-200. Default 50.

engine.setMaxResults(100);
engine.setLanguage(lang: 'off' | 'es'): void
Returns: void

Enable lightweight Spanish stemming on query tokens. Indexed text is never stemmed, so snippets stay faithful to the source.

engine.setLanguage('es');  // "contratos" now matches "contrato"

Remove · replace · compact

Indexing is idempotent (content-hash dedup is automatic). Documents can be removed individually without rebuilding the entire index. Reclaim storage on demand with compact().

engine.removeDocument(idOrName: string): boolean
Returns: boolean

Tombstone a document. Subsequent searches skip its chunks. Storage is reclaimed by compact(). `id` accepts the file name or contentHash returned by indexFile.

engine.removeDocument('contract-2024-03.pdf');
engine.removeDocument(doc.contentHash);  // by hash also works
engine.replaceDocument(name: string, newFile: File): Promise<IndexedDocument>
Returns: Promise<IndexedDocument>

Atomic remove + re-index. Bypasses dedup so re-indexing the same bytes after a remove works.

await engine.replaceDocument('contract.pdf', newVersionFile);
engine.compact(): void
Returns: void

Reclaim storage from tombstoned documents. Rewrites internal arrays in place. Doc IDs of survivors are preserved.

engine.removeDocument('old.pdf');
engine.compact();  // bytes freed; subsequent indexFile sees real headroom

Snapshots: OPFS & IndexedDB

The full engine state can be serialised to a binary blob and restored later. OPFS is preferred (zero-copy writes); IndexedDB is the universal fallback. A 16 MB snapshot typically restores in tens of milliseconds — far faster than re-parsing the documents.

engine.save(name: string): Promise<void>
Returns: Promise<void>

Serialise the index to a binary snapshot in OPFS (preferred) or IndexedDB.

await engine.save('my-corpus');
engine.load(name: string): Promise<boolean>
Returns: Promise<boolean>

Restore a previously saved snapshot. Returns true on success, false if missing or header mismatched.

if (!await engine.load('my-corpus')) {
  console.log('No snapshot yet; starting fresh');
}
engine.loadOrInit(name: string): Promise<boolean>
Returns: Promise<boolean>

Convenience wrapper: load if it exists, otherwise reset() and start clean.

await engine.loadOrInit('my-corpus');
engine.deleteSnapshot(name: string): Promise<void>
Returns: Promise<void>

Remove a snapshot from storage.

await engine.deleteSnapshot('old-corpus');
engine.listSnapshots(): Promise<string[]>
Returns: Promise<string[]>

List the names of all snapshots saved in the current origin.

const names = await engine.listSnapshots();

getStats() & getLastSearchStats()

engine.getStats(): EngineStats

Snapshot of current engine state: document count, chunk count, memory usage, loaded tier, capacity caps.

engine.getLastSearchStats(): SearchStats | null

Bloom/Bitap pipeline counters from the most recent search call. Useful for debugging relevance and detecting performance regressions.

EngineStats
FieldTypeDescription
documentsnumberActive (non-tombstoned) document count.
chunksnumberTotal indexed chunks.
textUsednumberBytes of indexed text currently stored.
textCapacitynumberMaximum text bytes the loaded tier can hold.
wasmMemoryBytesnumberTotal WASM linear memory size (BSS + grown pages).
tier'mini'|'std'|'pro'|nullTier of the loaded binary. null before init().
maxChunksnumberCompile-time chunk capacity for the loaded tier.
maxDocsnumberCompile-time document capacity for the loaded tier.
SearchStats
FieldTypeDescription
querystringVerbatim query string.
timeMsnumberEnd-to-end search time.
resultsnumberNumber of results above the threshold.
bloomTestednumberChunks tested against the Bloom filter.
bloomPassednumberChunks that passed Bloom (subset of tested).
bitapMatchednumberChunks confirmed by Bitap (subset of passed).

Profile detection & helpers

The library exposes pure helpers from the same package so consumers can read the device profile, override tier selection, or build their own UI around resource state. None of them require an initialised engine.

detectProfile(opts?: { fresh?: boolean }): Promise<DeviceProfile>
Returns: Promise<DeviceProfile>

Probe the host capabilities (cores, memory, WASM features, WebGPU, storage budget, network, battery). Result cached in sessionStorage. Pass `fresh: true` to bypass the cache.

import { detectProfile } from 'albex';
const profile = await detectProfile();
console.log(profile.memoryGB, profile.wasm.simd);
pickTier(profile: DeviceProfile): 'mini' | 'std' | 'pro'
Returns: 'mini' | 'std' | 'pro'

Pure heuristic: <=1 GB → mini, 2-4 GB → std, >=8 GB → pro, null (Safari) → std.

const tier = pickTier(profile);
pickWorkerCount(profile: DeviceProfile): number
Returns: number

cores/2 clamped [1, 8]. Falls to 1 when battery is reported below 20 % and discharging.

const workers = pickWorkerCount(profile);
shouldUseGpu(profile: DeviceProfile, chunkCount: number, threshold?: number): boolean
Returns: boolean

true when WebGPU is available AND chunk count crosses the threshold (default 20 000).

if (shouldUseGpu(profile, engine.getStats().chunks)) { /* … */ }
DeviceProfile
FieldTypeDescription
coresnumbernavigator.hardwareConcurrency.
memoryGBnumber | nullnavigator.deviceMemory (capped at 8 GB by spec; null on Safari).
wasm.simdbooleanWebAssembly v128 supported (validated via probe module).
wasm.bulkMemorybooleanBulk memory ops supported.
wasm.threadsbooleanThreads supported AND page cross-origin isolated.
webgpubooleannavigator.gpu present.
coopCoepbooleancrossOriginIsolated === true.
storage{ quotaBytes, usageBytes }navigator.storage.estimate() result.
net{ effectiveType, saveData }Connection info if reported (Chrome only).
battery{ level, charging } | nullBattery state if available.
visiblebooleandocument.visibilityState at probe time.
Tier capacities
TierMax docsMax chunksMax textWorking set
mini3225 0004 MB~5 MB
std128100 00016 MB~20 MB
pro1 024800 000128 MB~160 MB

AlbexEngineWorker

new AlbexEngineWorker(opts: AlbexWorkerOptions)
Returns: AlbexEngineWorker

Drop-in replacement for AlbexEngine that runs the entire engine inside a Web Worker. Surface is identical except every method returns a Promise. Files are transferred via postMessage with transferable ArrayBuffers — no copy.

Import from albex/worker. The runtime script is exported separately as albex/worker-runtime and must be referenced via new URL(..., import.meta.url).

import { AlbexEngineWorker } from 'albex/worker';

// Same zero-config pattern as the main engine — the worker runtime URL is
// the only thing you must point at (so the bundler can spawn it).
const engine = new AlbexEngineWorker({
  workerUrl: new URL('albex/worker-runtime', import.meta.url),
});

await engine.init();
const results = await engine.search('contrato', { windowed: true });

AlbexPool

new AlbexPool(opts: AlbexPoolOptions)
Returns: AlbexPool

Orchestrates N worker shards. Documents are sharded round-robin across workers. Searches broadcast to every shard and the coordinator merges top-K results preserving the global descending score order. Default worker count is half of hardwareConcurrency clamped to [1, 8]; battery-aware (drops to 1 on low battery).

Import from albex/pool.

import { AlbexPool } from 'albex/pool';

const pool = new AlbexPool({
  workerUrl: new URL('albex/worker-runtime', import.meta.url),
  workers: 'auto',   // cores/2 clamped [1, 8]
});
await pool.init();

await pool.indexFile(fileA);                   // sharded round-robin
await pool.indexFile(fileB);
const results = await pool.search('contrato'); // map-reduce across shards

TieredStore

new TieredStore(engine: AlbexEngine, opts?: TieredStoreOptions)
Returns: TieredStore

Adds hot/warm tiers behind the engine. When the engine reaches evictThreshold of capacity, the LRU document is removed and its original blob remains in OPFS. promote(name) brings it back by re-indexing from the persisted blob.

Import from albex/tiered.

import { AlbexEngine, TieredStore } from 'albex';

const engine = new AlbexEngine();
await engine.init();

const store = new TieredStore(engine, { evictThreshold: 0.85, hotFloor: 4 });
await store.init();

await store.indexFile(file);          // persists original blob in OPFS
await store.promote('older-doc.pdf'); // brings warm doc back to engine

BloomGpu (advanced)

new BloomGpu()
Returns: BloomGpu

Standalone WebGPU Bloom-scan accelerator. AlbexEngine instantiates one automatically when opts.gpu permits and the corpus exceeds gpuThreshold (default 20 000 chunks). You only need to touch this class directly to integrate the WGSL shader into a non-Albex pipeline.

Import from albex/gpu.

Typed error hierarchy

Every error thrown by Albex extends AlbexError. Switch on the kind discriminator (which survives structuredClone across worker boundaries) or use instanceof against the subclasses.

Error subclasses
ClasskindThrown when
AlbexError(base)Base class for every Albex error. Carries `kind` discriminator.
AlbexInitErrorinitWASM fetch failed, init() not called, or PDF module not initialised.
AlbexUnsupportedFormatErrorunsupported_formatFile extension is not in the supported list. Carries `ext` field.
AlbexParseErrorparseA parser (DOCX/XLSX/PDF/JSON/…) failed. Carries `format` field.
AlbexCapacityErrorcapacityScratchpad write would exceed buffer size, or hard cap reached.
import {
  AlbexError, AlbexInitError, AlbexParseError,
  AlbexUnsupportedFormatError, AlbexCapacityError,
} from 'albex';

try {
  await engine.indexFile(file);
} catch (e) {
  if (e instanceof AlbexUnsupportedFormatError) {
    console.warn(`Unsupported: .${e.ext}`);
  } else if (e instanceof AlbexParseError) {
    console.warn(`Failed to parse ${e.format}:`, e.message);
  } else if (e instanceof AlbexCapacityError) {
    console.warn('Engine full — upgrade tier or use TieredStore');
  } else throw e;
}

npm entry points

Albex ships multiple subpath exports so each feature can be tree-shaken independently. Import only what you use.

Subpath exports
Import pathSurfacePulls in
albexAlbexEngine, types, errors, profile helpersmain engine code, profile detector, persistence layer, resource manager
albex/workerAlbexEngineWorkermain-thread wrapper that proxies to a Worker
albex/worker-runtime(runs inside a Worker)Worker-side handler; reference via `new URL(..., import.meta.url)`
albex/poolAlbexPool, AlbexPoolOptionspool coordinator that orchestrates N worker shards
albex/tieredTieredStore, TieredStoreOptionshot/warm tier manager with OPFS persistence of original blobs
albex/gpuBloomGpu, packBloomsFromChunksstandalone WebGPU runtime + WGSL shader for Bloom scan