Don’t Parse Your Index Twice: BM25 Search on Cloudflare Workers
I run a static digital garden built with Quartz. Quartz generates a contentIndex.json at build time - a 6MB file containing every page’s title, content, and tags. I wanted a server-side BM25 search endpoint in front of it, deployed as a Cloudflare Pages Function.
The first version worked. Then it stopped working. Then it worked again, but 3x faster than before. Here’s what happened.
The Setup
Quartz builds contentIndex.json as a static asset. The file has one entry per page:
{
"quran/surahs/surah-075": {
"title": "Surah 75 - Al-Qiyamah",
"content": "In the name of God... Day of Resurrection..."
}
}6,696 entries. 3.7 million characters of text. A Cloudflare Pages Function at /api/search fetches this file, runs BM25, and returns ranked results.
The First Bug: CPU Limit Exceeded
Error 1102: Worker exceeded CPU time limit.
Cloudflare Workers on the free tier cap CPU time at 10ms per request. My handler was doing this on every request:
export async function onRequestGet(context) {
const data = await fetchContentIndex(env); // fetch + JSON.parse
const results = bm25Search(query, data); // build index + score
return Response.json(results);
}The problem is bm25Search. BM25 requires an inverted index: for each term, which documents contain it and how many times. Building that index means iterating every token in every document - 3.7M characters - to compute:
docLengths: token count per documenttermDf: how many documents contain each termtermPostings:{ term -> { slug -> tf } }avgDl: average document length across the corpus
That’s a lot of work. On 6,696 documents it took ~80ms. Well over the 10ms limit.
The Fix: Split Build from Score
The insight is that the inverted index doesn’t change between requests. The corpus is static - it was built at deploy time and doesn’t change until the next deploy. There’s no reason to rebuild it on every query.
Cloudflare Workers run in “isolates” - lightweight V8 contexts that stay alive for a few minutes to handle multiple requests before being evicted. Module-level variables persist across requests within the same isolate.
So: build the index once, cache it at module scope, reuse it on every subsequent request.
// Module-level cache - survives across requests within an isolate
let _cachedIndex = null; // raw contentIndex.json
let _cacheEtag = null;
let _builtIndex = null; // pre-computed posting lists
function buildIndex(index) {
const slugs = Object.keys(index);
const N = slugs.length;
const docLengths = {};
const termDf = {};
const termPostings = {};
for (const slug of slugs) {
const entry = index[slug];
const text = ((entry.title || "") + " " + (entry.content || "")).trim();
const tokens = tokenize(text);
docLengths[slug] = tokens.length;
const seen = new Set();
for (const tok of tokens) {
if (!termPostings[tok]) termPostings[tok] = {};
termPostings[tok][slug] = (termPostings[tok][slug] || 0) + 1;
if (!seen.has(tok)) {
termDf[tok] = (termDf[tok] || 0) + 1;
seen.add(tok);
}
}
}
let totalLen = 0;
for (const slug of slugs) totalLen += docLengths[slug];
const avgDl = totalLen / N || 1;
return { N, avgDl, docLengths, termDf, termPostings };
}
async function loadIndex(env) {
// ETag check - avoid re-fetching if unchanged
const headers = {};
if (_cacheEtag) headers["If-None-Match"] = _cacheEtag;
const res = await env.ASSETS.fetch(
new Request("https://placeholder/static/contentIndex.json", { headers })
);
if (res.status === 304 && _builtIndex) {
return { raw: _cachedIndex, built: _builtIndex }; // cache hit
}
const data = await res.json();
_cachedIndex = data;
_cacheEtag = res.headers.get("ETag") || null;
_builtIndex = buildIndex(data); // expensive - runs once per isolate lifetime
return { raw: data, built: _builtIndex };
}The scoring step - the part that actually runs per request - only touches the pre-built structures:
function bm25Search(queryStr, builtIdx, rawIndex, n = 10, k1 = 1.5, b = 0.75) {
const { N, avgDl, docLengths, termDf, termPostings } = builtIdx;
const qTerms = tokenize(queryStr);
const scores = {};
for (const term of qTerms) {
const df = termDf[term] || 0;
if (df === 0) continue;
const idf = Math.log((N - df + 0.5) / (df + 0.5) + 1);
const postings = termPostings[term] || {};
for (const [slug, tf] of Object.entries(postings)) {
const dl = docLengths[slug];
const tfNorm = (tf * (k1 + 1)) / (tf + k1 * (1 - b + (b * dl) / avgDl));
scores[slug] = (scores[slug] || 0) + idf * tfNorm;
}
}
return Object.entries(scores)
.sort((a, b) => b[1] - a[1])
.slice(0, n)
.map(([slug, score]) => ({ slug, score, title: rawIndex[slug]?.title }));
}Per-request CPU cost is now just: tokenize the query (cheap) + look up a handful of terms in pre-built hash maps (very cheap) + sort a small result set (cheap). Well under 10ms.
The Second Bug: Bot Fight Mode
Once the Worker was working, I added a Python eval script to benchmark it. Every call returned 403.
req = urllib.request.Request(url)
# urllib default user-agent: "Python-urllib/3.12"Cloudflare’s Bot Fight Mode blocks requests from known automation user agents. The fix is one line:
req = urllib.request.Request(
url,
headers={
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/122.0.0.0 Safari/537.36",
},
)The Third Bug: Deploying from the Wrong Directory
The Worker kept vanishing after deploys. Cloudflare Pages discovers functions/ automatically - but only if you run wrangler pages deploy from the directory that contains functions/ as a sibling, not from a parent directory.
# Wrong - runs from repo root, no functions/ here
wrangler pages deploy .dev/public/quran --project-name qurangraphe
# Right - runs from .dev/quartz/ where functions/ lives
cd .dev/quartz
wrangler pages deploy /path/to/.dev/public/quran --project-name qurangrapheRunning from the wrong directory deploys static files only. The Worker is silently dropped with no error.
Latency Results
After all three fixes, I benchmarked the Worker against a Python implementation of the same BM25 algorithm running locally against the same contentIndex.json:
| Query | Python (local) | CF Worker (warm) | speedup |
|---|---|---|---|
| Fatihah opening chapter | 809ms | 310ms | 2.6x |
| Day of Resurrection | 942ms | 207ms | 4.5x |
| Alafasy recitation | 833ms | 189ms | 4.4x |
| Juz 30 short surahs | 754ms | 255ms | 3.0x |
| Moses staff Pharaoh | 592ms | 211ms | 2.8x |
| avg | 786ms | 234ms | ~3x |
The Python numbers include loading the 6MB file, parsing JSON, building the index, and scoring - every time. The Worker numbers include the full network round trip to Cloudflare, but only scoring on the hot path.
The cold start (first request per isolate) still pays the full build cost - around 1 second. Subsequent requests hit the cache.
Validation: Torah Site (19MB Index)
The quran site index is 6MB. Does the same pattern hold for a 3x larger index?
I deployed the same search.js to a second site - torahgraphe.pages.dev - backed by the Torah contentIndex.json at 19MB (~17,000+ documents including chapter files, tag pages, and research notes). Same Worker code, same caching strategy, different corpus size.
Results (flex-offline = Python local, flex-api = CF Worker, warm unless noted):
| Query | flex-offline ms | flex-api ms | notes |
|---|---|---|---|
| Genesis creation account | 2242 | 1866 | cold start |
| YHWH divine name covenant | 2451 | 178 | warm |
| Passover Exodus plagues | 2837 | 1813 | cold (parallel spawn) |
| Levitical priesthood atonement | 2179 | 279 | warm |
| Paleo Hebrew consonants | 1935 | 1774 | cold (parallel spawn) |
| avg | 2329 | ~229 warm / ~1818 cold |
The parallel eval (4 threads, 5 queries) caused multiple Worker isolates to spin up simultaneously - which is why some requests hit cold and some warm. The warm numbers are what matters: ~178-279ms, nearly identical to the quran site despite a 3x larger index.
Three-site comparison:
| Site | Index size | Docs | flex-offline avg | flex-api warm | speedup (warm) |
|---|---|---|---|---|---|
| mormongraphe | 1.4MB | 262 | ~167ms | ~237ms | ~0.7x |
| qurangraphe | 6MB | 6,696 | 786ms | ~210ms | ~3.7x |
| torahgraphe | 19MB | ~17,000 | 2329ms | ~229ms | ~10x |
Three findings:
-
flex-offline scales linearly with index size. Python re-parses and re-builds the index every call, so 19MB takes ~3x longer than 6MB (~2329ms vs ~786ms). At 1.4MB (Mormon), it’s faster than the network round trip - flex-offline actually beats flex-api warm (~167ms vs ~237ms). The crossover point where CF Workers beats local Python is somewhere around 2-3MB.
-
flex-api warm latency is index-size-independent above ~2MB. The 19MB Torah index scores queries in ~229ms; the 6MB Quran index scores in ~210ms. Nearly identical, despite a 3x size difference. Once posting lists are in memory, scoring is O(query_terms * postings_per_term) - fast regardless of corpus size.
-
Cold start scales with index size. Mormon: ~200ms cold. Quran: ~1s cold. Torah: ~1.8s cold. The Worker must fetch the JSON blob over the network and iterate all documents during
buildIndex(). But cold starts only happen on the first request per isolate (or after eviction, typically minutes). For any site with regular traffic, warm requests dominate.
BM25 Accuracy Gotchas
Speed is the easy part. Once the index is cached, the harder problem is whether BM25 returns useful results. Five things that surprised me:
Document length normalization penalizes long chapter files. BM25 normalizes term frequency by document length. A 400-line chapter scores far lower than an 80-line tag page for thematic queries - even if the chapter is the “obvious” source. On a scripture site, the tag page for “covenant” is the correct informational answer to a query about covenant theology. It should be in your expected results, not the chapter that mentions the word twice in passing.
Hyphenated terms become opaque tokens. “Paleo-Hebrew” is indexed as one token that never appears standalone in a query. Strip hyphens from queries before scoring, or your tokenizer will silently return zero results for multi-word compound terms.
H2 heading text is the document title in qmd FTS (and receives higher field weight than body text). Use descriptive headings - ”## Genesis 1 - The Creation Account” - not structural ones like ”## Chapter 1”. The heading is what floats your document to the top.
Fixture over-specificity is the most common source of false zeros in evals. When a valid but unexpected document ranks first, the fix is to expand the expected list, not change the content. The eval’s expected list should include the best informational answer, not only the most obvious one.
Query-translation mismatches cause zero results. If your query uses “Judgment” but your indexed content uses “Resurrection”, BM25 has no token overlap and returns nothing. Match query vocabulary to the actual translation in use when writing evals.
Why This Pattern Works
Static site search indexes don’t change between requests. If you’re building search over any generated JSON blob - Quartz’s contentIndex.json, Pagefind’s index, a custom build artifact - the same pattern applies:
- Fetch the blob once, cache with ETag
- Build your data structures once, cache at module scope
- Per-request: just query the pre-built structures
The split between “build” and “score” is the key. BM25 is naturally decomposed this way - IDF and document lengths are corpus-level statistics that don’t depend on the query. Only the final scoring loop touches query terms, and that loop is fast once the posting lists are in memory.
On Cloudflare Workers, module-level state is your friend for exactly this pattern. The isolate stays warm for minutes, serving many requests against the same cached index. The cost is paid once; the benefit is reaped on every subsequent query.
Caveats
Isolate persistence is not guaranteed. The official CF Workers docs warn explicitly against relying on module-level variables for correctness: “A variable set during one request is still present during the next. This causes cross-request data leaks, stale state…” The docs also note isolates “may be evicted after their events are properly resolved.”
For a read-only static search index this warning doesn’t apply - there’s no user state, no cross-request leakage risk, and the worst case on eviction is a cold rebuild (identical to the first request’s behavior). But the framing “isolates stay warm for minutes” describes observed behavior, not a contract. Don’t build anything correctness-critical on top of module-level state.
The pattern is: use module scope for static, immutable, corpus-level data. Never for user state, auth tokens, or anything that should be request-scoped.
Alternatives at $0 (Open-Source)
If you want free search without a Worker, two open-source options cover most cases:
Pagefind - compiles a WASM search index at build time alongside your static site. No backend, no runtime cost, works with any static site generator. The index is served as static files from your existing host. Accuracy differs from BM25 (proprietary algorithm), but for most sites it’s the simplest zero-infra option.
Orama - in-browser BM25 + vector search in a <2kb bundle. No build step, no backend. Drops in as a JS import. Best for sites under ~500 pages where the in-memory index fits comfortably.
The CF Workers approach only wins when you want server-side ranking (no client JS required) and your index is large enough that warm latency (~210-229ms) beats local Python (~786ms+). That crossover happens at roughly 2-3MB of index size. Below that, client-side options like Pagefind are simpler and faster.
NameResolver: Fixing Exact Entity Lookups
BM25 is a relevance ranking algorithm. It rewards term frequency relative to document length and corpus rarity. What it cannot do is deterministically return a specific page when the query is literally the page’s title.
A few queries in my eval exposed this failure structurally:
"Musa"- should return the Atlas/People/Musa page at rank 1; BM25 returned it at rank 8 (shorter pages with higher term density ranked above)"Al-Baqarah"- should return Surah 002; BM25 returned zero results (the surah title includes “Al-Baqarah” but the query tokenizes differently against the stored slug)"Genesis 1"- should return the Genesis 1 chapter; BM25 returned research notes that cited Genesis 1 multiple times
These aren’t tuning problems. No BM25 parameter adjustment fixes them - the algorithm doesn’t have the concept of “this query IS a document title.” It’s a structural mismatch between the query type (entity lookup) and the algorithm (relevance ranking).
The fix is a pre-search layer that I call a NameResolver: a normalized title-to-slug hash table built once at index-load time. Resolution happens before BM25. If the query matches a title exactly, return that page immediately; only fall through to BM25 if there’s no match.
function normalizeTitle(text) {
// NFKD -> strip non-ASCII -> lowercase -> hyphens/underscores to spaces -> strip punctuation
const ascii = text.normalize("NFKD").replace(/[\u0080-\uffff]/g, "");
return ascii.toLowerCase()
.replace(/[-_]+/g, " ")
.replace(/[^a-z0-9 ]+/g, "")
.replace(/\s+/g, " ")
.trim();
}
function buildResolver(index) {
const table = {};
for (const [slug, entry] of Object.entries(index)) {
const norm = normalizeTitle(entry.title || "");
if (norm && !(norm in table)) table[norm] = slug;
// Surah titles: strip "surah N" prefix so "Al-Baqarah" resolves
// without needing to know the surah number
const m = /^surah\s+\d+\s+/.exec(norm);
if (m) {
const suffix = norm.slice(m[0].length);
if (suffix && !(suffix in table)) table[suffix] = slug;
}
}
return table;
}The normalization chain is: Unicode NFKD decompose → strip all non-ASCII (removes diacritics) → lowercase → collapse hyphens/underscores to spaces → strip remaining punctuation. This makes “Al-Baqarah”, “al baqarah”, “Al Baqarah”, and “al-baqarah” all hash to the same key “al baqarah”. The surah prefix strip handles Arabic surah names: “surah 2 al baqarah” → strip “surah 2 ” → “al baqarah” - which then resolves the same entry.
At query time:
function resolveQuery(query, table) {
const norm = normalizeTitle(query);
return norm in table ? [table[norm]] : [];
}If resolveQuery returns a slug, skip BM25 and return that page at rank 1. If it returns empty, run BM25 normally.
Impact: Agent-query MRR went from 0.442 to 1.000. Human-search MRR (59-query suite) went from 0.772 to 0.906. The 0.906 is the BM25 ceiling - 6 remaining failures (4 structural zeros: positional, vocabulary mismatch; 2 partials: paraphrase rank 3, framing rank 9) require semantic search to fix.
The NameResolver pattern is directly deployable in a CF Worker alongside the BM25 index, adds negligible memory (the table is O(documents)), and runs in O(1) per query.
Benchmark comparison
MRR (Mean Reciprocal Rank) measures whether the most relevant result appears at rank 1. MRR 1.000 means the best answer is always first; MRR 0.333 means it’s on average rank 3. Accuracy disqualifies Pagefind and Orama as alternatives when ranking quality matters - retrieval speed is irrelevant if the right answer is buried.
Quran corpus (6MB contentIndex.json / 6,881 HTML pages) + Torah (19MB) + Mormon (1.4MB). MRR measured against 59 queries across all corpora including Torah, Quran, cross-scripture, adversarial, and semantic-gap queries. The Quran-only set (6 queries) is 1.000 for both BM25 engines; the all-query gap (0.906 vs 0.770) reflects 7 cross-corpus queries targeting the full combined site (graphelogos), which doesn’t have a /api/search endpoint configured.
| Engine | Index build | Retrieval (local) | Retrieval (CF hosted, warm) | MRR (Quran) | MRR (all 59q) |
|---|---|---|---|---|---|
| Quartz FlexSearch | static (Quartz build-time) | client-side JS | client-side JS | 0.167 | 0.061 |
| Orama | 452ms (in-process) | 5ms | n/a - client-side | 0.667 | not measured |
| Pagefind | 26s CLI (one-time) | 35ms (WASM, warm) | n/a - client-side | 0.333 | not measured |
| Python BM25 + NameResolver | ~786ms per-request | ~786ms | n/a | 1.000 | 0.906 |
| CF Worker BM25 + NameResolver | cold ~1s / warm 0 | n/a | ~210ms | 1.000 | 0.770 |
Notes on the table:
- Orama and Pagefind score in the browser - “CF hosted” means files served from CF Pages, but ranking runs client-side
- Python BM25 re-builds the full index on every call; build and retrieval cost are the same number
- CF Worker BM25 pays build cost once per isolate lifetime - warm requests only pay the scoring cost
- FlexSearch retrieval is browser-dependent; not measured in ms here
- Pagefind first-query latency is ~90ms (cold WASM shard load); warm queries ~12-35ms
Estimated per-corpus scaling:
| Corpus | Orama build | Pagefind CLI build | CF Worker (warm) |
|---|---|---|---|
| Mormon (1.4MB, 262 docs) | ~18ms | ~1s | ~237ms |
| Quran (6MB, 6,696 docs) | 452ms (measured) | ~26s (measured) | ~210ms (measured) |
| Torah (19MB, ~17k docs) | ~1,100ms | ~60s | ~229ms |
Orama build scales linearly with document count. Pagefind CLI scales with page count and word count. CF Worker warm latency is index-size-independent above ~2MB - it stays flat as corpus grows.
Summary
The pattern has three parts:
-
Cache the index at module scope. Separate
buildIndex()frombm25Search(). CallbuildIndexonce per isolate lifetime. The posting lists are static - there’s no reason to rebuild them per request. -
Add a NameResolver as a pre-search layer. Build a normalized title-to-slug hash table alongside the inverted index. Entity queries (
"Musa","Al-Baqarah","Genesis 1") are entity lookups, not relevance queries. BM25 returns them at rank 8 at best; a hash table returns them in O(1). Resolution first; BM25 as fallback. -
Use
--branch=mainwhen deploying. Without it,wrangler pages deploycreates a preview deployment. The production URL keeps serving the old Worker until a branch-pinned deploy overwrites it.
The CF Worker approach pays off when your index is above ~2-3MB. Below that, a client-side option like Pagefind is simpler and faster. Above that threshold, warm CF Worker latency (~210-229ms across corpora) beats local Python (~786ms-2329ms) and stays flat as the corpus grows.
The current implementation is live on three sites: qurangraphe.pages.dev, torahgraphe.pages.dev, mormongraphe.pages.dev. Source is in .dev/quartz/functions/api/search.js.