Skip to content

Benchmarks

Every number on this page is position-verified against every competitor. Mismatch aborts the run. 57,072 positions cross-checked on the 10 MB Wikipedia headline alone — 15 exact matches + 1 documented soft-parity (ripgrep's non-overlapping multi-pattern semantics).

Every number is reproduced from the markdown tables under benchmark/output/m2-pro/md/*.md and the raw JSON under benchmark/output/m2-pro/*.json. Reproduction commands are at the bottom of this page — all public benchmarks are in the repo under benchmark/.

The headline: parot flatlines while everyone else scales linearly

Query the same 10 MB of Wikipedia with N phrases. Everyone else scales linearly. Parot flatlines.

Phrases → 10 100 1,000 10,000 100,000
parot 0.14 ms 0.18 ms 0.99 ms 6.4 ms 60 ms
pyahocorasick 62.7 ms 126 ms 161 ms 213 ms 685 ms
str.count (Python) 36.6 ms 422 ms 4.2 s 41 s DNF
pandas str.count 69 ms 510 ms 5.1 s 57 s DNF
polars count_matches 281 ms 639 ms 7.6 s 70 s DNF
pyarrow count_substring 182 ms 1.8 s 17 s 170 s DNF
duckdb LENGTH/REPLACE 227 ms 2.1 s 22 s 222 s DNF

Wikipedia 10 MB corpus (9,999,999 bytes), Apple M2 Pro. Index build is O(N) and happens once (~700 ms). After that, query cost is independent of corpus size — 100,000 distinct phrases finish in 60 ms while every tokenizer-free substring engine times out. DNF = exceeded 30 s per-call ceiling.


Performance highlights

Benchmark Result
1 GB search 0.97 ms to batch_count 100 patterns across 1 GB of Wikipedia. Build once (~148 s), then every query is free.
Genomics 21,000x faster than BioPython on CRISPR guide search against GRCh38 chr22 (50 MB).
DataFrames 36,000x faster than polars str.count on 500K rows, 50 patterns. Same results, one-line change.
Scale-invariant query 0.34 ms whether your text is 10 KB or 10 GB.
100K phrases, 10 MB corpus 60 ms total. Every other substring engine times out past 10K.

Every number below is sourced from a script in benchmark/, checked into the repo, with raw JSON output committed alongside.


Methodology

Hardware & software

  • Python benchmarks: Apple M2 Pro, macOS 15.5, Python 3.12.8, Rust 1.94.1
  • JS/WASM benchmarks: Apple M4, macOS, Node.js 25.8.0, Rust 1.94.1

Two machines — JS/WASM only

The Python, Rust, and CLI benchmarks all ran on the same M2 Pro. The JS/WASM numbers (the four JS competitors in the headline table, the WASM vs-mnemonist table, the parot-in-the-browser table) were produced on an M4. Do not read absolute times across those two machines. All speedups in the headline table are computed against parot native (M2 Pro, 0.34 ms) — a single baseline — so the Speedup column is coherent across languages.

Package versions (canonical run, 2026-04-15)

Package Version
parot 0.2.4
pyahocorasick 2.3.0
ahocorasick-rs 1.0.3
stringzilla 4.6.0
duckdb 1.5.2
google-re2 latest via pip
pandas / polars / pyarrow pinned in benchmark/pyproject.toml
comparator A (Rust-backed text index on PyPI) 2.3.4
comparator B (Rust-backed SA on PyPI) 0.0.20
pylcs 0.1.1

Every benchmark script writes the exact competitor versions it imported into benchmark/output/m2-pro/*.json under environment.packages.

Real corpora — no synthetic data

Every table on this page uses a real public corpus:

Corpus Size Source Used by
Wikipedia-EN 10 MB slice 9,999,999 B dumps.wikimedia.org headline scaling + 16-competitor + multipattern
Wikipedia-EN 1 GB slice 1,000,000,000 B dumps.wikimedia.org 1 GB hero
GRCh38 chr22 50,818,468 B Ensembl release 110 genomics / CRISPR
Pride and Prejudice 752 KB Project Gutenberg duplicates
Complete Shakespeare 5.6 MB Project Gutenberg duplicates
King James Bible 4.6 MB Project Gutenberg duplicates
Dickens Collected (16 novels) ~22 MB / 3.9M words Project Gutenberg duplicates
Proust (ISOLT) ~3.4 MB / 691K words Project Gutenberg duplicates

Patterns are not hardcoded — they are sampled from the corpus itself with a fixed seed via benchmark/common.py::sample_count_queries(text, n=1000, seed=99). These are mid-frequency phrases a user would realistically search for.

Measurement protocol

  • Python: 2 warmup iterations (discarded), 10 measurement iterations (typical), median reported with IQR.
  • JavaScript: 2 warmup, 5 measurement, median + IQR.
  • Indexes are built once before the timed region; only the query is timed. Build time is reported separately in the caption, never folded into a per-cell number.
  • GC is not pinned. Median + IQR mitigate the occasional spike.

Correctness verification

Every competitor's position list is compared bit-for-bit against parot's batch_find_all before timing starts. The 10 MB headline run cross-verifies 57,072 positions across 11 strict competitors + 1 soft; the 1 GB run cross-verifies 100 patterns across 14 competitors. Any mismatch aborts the run non-zero. Reaching the end of a run is the correctness receipt.

The "soft" bucket contains competitors with documented semantic differences (currently: ripgrep's non-overlapping multi-pattern mode) — their mismatches are logged but not fatal.

Raw data

  • Python scripts: benchmark/{headlines,memory,docs,internal}/bench_*.py
  • JS scripts: benchmark/*.mjs
  • Output (JSON + MD + CSV): benchmark/output/m2-pro/ and benchmark/output/m4/
  • Manifest of every script: benchmark/BENCHMARKS.md

16 competitors on 10 MB Wikipedia (find_all)

Source: bench_hero_locate.py · raw JSON

Median wall-clock time for find_all() on 100 mid-frequency phrases against a single 10 MB Wikipedia corpus. One parot index built once; every competitor compared on the same corpus with the same 100 patterns. Every row's position list is verified against parot's batch_find_all before the competitor's median is allowed to count. Returns every position for every pattern — not just the first, not just a count.

Competitor Platform Regime Median (ms) Speedup vs parot
parot (Us!) Python + Rust index 0.34 1.0x (baseline)
parot WASM (Us!) JavaScript index 1.09 0.3x (~3x slower than native)
stringzilla Python substring 42.6 125x
ahocorasick-rs Python (Rust bind) multi-pattern 53.7 158x
pyahocorasick Python multi-pattern 115.5 339x
ripgrep (subprocess) subprocess substring 123.0 361x ⚠ soft
modern-ahocorasick JavaScript multi-pattern 340.5 999x
String.matchAll JavaScript regex 393.9 1,156x
RegExp.exec /g JavaScript regex 395.8 1,161x
String.indexOf JavaScript substring 402.8 1,182x
bytes.find loop Python substring 383.2 1,124x
str.find loop Python substring 427.3 1,253x
regex (mrab) Python regex 449.0 1,317x
re.finditer Python regex 676.7 1,985x
pyarrow.compute (chunked) Python substring 1,336 3,919x
polars.str.contains (chunked) Python substring 1,457 4,274x
pandas.str.contains (chunked) Python substring 1,709 5,013x
google-re2 Python regex 3,027 8,880x

parot build: 187 ms (one-time, 38.6 MB heap). Speedups are computed against parot native (0.34 ms) so a user can read "125x over stringzilla" and "1,184x over String.indexOf" on the same baseline even though the JS rows were measured on an M4.

Soft parity: ripgrep uses non-overlapping multi-pattern semantics (19 positions differ from parot's overlapping ground truth — 4,737 vs 4,756 hits). Every other row is position-identical: 4,756 hits, exact.

Parity receipt (from bench_hero_locate.json):

  • 11 strict competitors, all position-identical — str.find loop, bytes.find loop, re.finditer, regex (mrab), google-re2, stringzilla, pyahocorasick, ahocorasick-rs, pandas (chunked), polars (chunked), pyarrow (chunked).
  • 1 soft competitor — ripgrep (subprocess).
  • 100 patterns × 11 strict competitors = 57,072 positions validated before any timing ran.

parot vs 16 competitors — find_all() on 10 MB Wikipedia


1 GB Wikipedia — batch_count

Source: bench_hero_plot.py · raw JSON

Batch-counting 100 patterns over 1 GB of English Wikipedia. Build cost is amortized over the query stream — the question is: once built, how fast can you run queries?

Competitor Regime Median (ms) Build (ms)
parot (Us!)batch_count index 0.97 147,975
comparator A (Rust-backed text index) — loop index 0.58 4,447,514
comparator B (Rust-backed SA) index 0.89 64,548
stringzilla substring 4,623 0
ahocorasick-rs multi-pattern 10,689 0.20
ripgrep (subprocess) substring 11,275 0
pyahocorasick multi-pattern 12,643 0.04
bytes.count substring 63,527 0
re.findall regex 68,085 0.34
pandas.str.count substring 68,582 0
str.count substring 69,258 0
polars.count_matches substring 108,425 0
pyarrow.count_substring substring 181,895 0
duckdb contains substring 273,283 801
google-re2 regex 385,105 0.43

parot build: 148 s once, then 0.97 ms for 100 patterns across 1 GB. Comparator A is 1.7x faster per query but takes 30x longer to build (4,448 s vs 148 s) — only worth it if you run millions of queries per index. parot heap: 5.05 GB for 1 GB corpus (5.1x linearity, consistent with the memory table below).

parot vs 14 competitors on a 1 GB Wikipedia corpus


Genomics — CRISPR off-target search (GRCh38 chr22, 50 MB)

Source: bench_genomics.py · raw JSON

Real data: GRCh38 chromosome 22 (50,818,468 bases). Real workload: 1,000 sgRNA guides sampled from the Brunello lentiviral knockout library, batch_find_all'd against the chromosome.

Competitor Regime Median (ms) Speedup vs parot
parot (Us!)batch_find_all index 2.31 1.0x
python bytes.find loop substring 49,333 21,313x
BioPython Seq.count_overlap substring (count-only) 49,576 21,418x
ripgrep (subprocess) substring 189.3 82x

parot build: 1,162 ms for the full chromosome (cached as a .parot index thereafter). BioPython has no native find-all; we compared count-only for parity. 1,000-guide CRISPR run cross-verifies 108,402 positions.

parot batch_find_all vs BioPython / bytes.find / ripgrep on GRCh38 chr22

DNA motif sweep — 50 MB

Source: bench_dna.py · raw JSON

Motif sweep across genome sizes — parot holds flat at ~0.03 ms while everyone else scales linearly.

Genome size bytes.count re.findall BioPython parot vs BioPython
1 MB 44.5 ms 37.2 ms 37.2 ms 0.024 ms 1,541x
10 MB 474.3 ms 379.5 ms 379.5 ms 0.031 ms 12,128x
50 MB 2,275 ms 1,856 ms 1,856 ms 0.036 ms 51,964x

At 50 MB the gap opens to 51,964x vs BioPython on a pure motif-count workload — consistent with the 21,000x CRISPR result above (the CRISPR run does the full find_all with positions, which costs more).


Parot vs pandas / polars (DataFrames)

Source: bench_dataframes.py (lives in the private parot-core benchmark suite) · raw JSON

Build the parot index once from a column of text, then run many str.contains / str.count-style queries. pandas and polars rescan the full column every time.

Different data models

pandas returns per-row results (a Series); parot operates on the concatenated text and returns a flat total or a contains-mask. The speedups reflect both the algorithmic win and the per-row Python object overhead pandas carries. See Index.from_series and Index.contains_mask for the symmetric API.

contains — vary patterns (100K rows, query time only)

Patterns pandas (ms) polars (ms) parot (ms) vs pandas vs polars
5 157.9 11.7 0.57 276x 20x
10 346.0 22.3 1.52 228x 15x
50 1,664 108.1 8.41 198x 13x
100 3,509 222.1 14.4 244x 16x

count — vary rows (50 patterns, query time only)

Rows pandas (ms) polars (ms) parot (ms) vs pandas vs polars
10,000 131.4 147.4 0.11 1,148x 1,288x
100,000 642.0 920.0 0.15 4,214x 6,038x
500,000 4,736 6,190 0.17 27,694x 36,201x

This is the 36,000x headline — 500K rows, 50 patterns, parot at 0.17 ms vs polars at 6.19 s.

Break-even — contains (100K rows)

Patterns pandas polars parot Winner
1 34 ms 3 ms 1.56 s polars
50 1,649 ms 110 ms 1.53 s polars
500 18 s 1.1 s 1.59 s polars
1,000 37 s 2.3 s 1.72 s parot
2,500 90 s 5.5 s 1.90 s parot

Parot eats a ~1.5 s index-build cost. It pays off around the thousand-pattern mark on 100K rows of text. For one-off contains lookups, use polars.

Break-even — count (100K rows)

Patterns pandas polars parot Winner
1 16 ms 23 ms 1.64 s pandas
100 1,298 ms 1,860 ms 1.56 s pandas
250 3,272 ms 4,704 ms 1.59 s parot
500 6,456 ms 9,379 ms 1.53 s parot

Parot vs Aho-Corasick vs Regex (multi-pattern)

Source: bench_multipattern.py · raw JSON

Vary pattern count on 1 MB corpus (query only, all pre-built)

Patterns str.count re.findall pyahocorasick ahocorasick-rs parot parot batch
10 4.78 ms 6.33 ms 7.12 ms 5.40 ms 0.020 ms 0.051 ms
50 22.8 ms 36.7 ms 12.0 ms 6.05 ms 0.14 ms 0.082 ms
200 94.8 ms 148.6 ms 14.6 ms 12.1 ms 0.61 ms 0.18 ms
1,000 453.9 ms 719.2 ms 18.5 ms 16.1 ms 3.06 ms 0.69 ms

Vary corpus size with 200 patterns

Corpus str.count re.findall pyahocorasick ahocorasick-rs parot parot batch
100 KB 9.20 ms 12.8 ms 1.41 ms 1.00 ms 0.52 ms 0.16 ms
1 MB 93.0 ms 146.1 ms 14.2 ms 11.8 ms 0.62 ms 0.20 ms
10 MB 834.3 ms 1,289 ms 149.5 ms 112.7 ms 0.69 ms 0.75 ms

At 10 MB × 200 patterns: parot batch_count is 1,112x faster than str.count, 1,719x faster than re.findall, and 150x faster than ahocorasick-rs.

Build time (1 MB corpus)

Method Build time
re.findall 0 (no build)
pyahocorasick 0.10 ms
ahocorasick-rs 0.16 ms
parot 61.3 ms

Aho-Corasick indexes the patterns in microseconds; parot indexes the text in tens of milliseconds. Parot wins any time you re-query the same text with new patterns.


Duplicate phrase detection

Source: bench_duplicates.py · raw JSON · CLI corpus data

No other library provides word-boundary-aware duplicate phrase detection. These numbers are from the find_duplicates Python API; the parot scan CLI hits the same figures.

Absolute performance (real Gutenberg text)

Text Size Time Phrases found
Pride and Prejudice 752 KB 75 ms 2,025
Proust: In Search of Lost Time ~3.4 MB 564 ms 20,192
King James Bible 4.6 MB 2,385 ms 57,009
Complete Shakespeare 5.6 MB 796 ms 10,308
Dickens Collected (16 novels) ~22 MB / 3.9M words 6,770 ms 171,060

The Bible (4.6 MB) is slower than Shakespeare (5.6 MB) because it contains 5.5x more repeated phrases (57K vs 10K). Output size, not input size, dominates. Dickens Collected is an end-to-end stress test: 16 novels, 3.9M words, 22 MB, 171K duplicate phrases in under 7 seconds.

Parameter sensitivity (Pride & Prejudice)

min_words min_chars max_words Time Phrases
3 6 50 127.5 ms 8,703
4 9 50 76.9 ms 2,025
5 12 50 64.4 ms 470
8 20 50 58.4 ms 10
4 9 20 76.4 ms 2,025

Looser params (min_words=3) find 4x more phrases at 1.7x the cost. The default (4/9) is the precision/recall sweet spot.

Normalization modes (Pride & Prejudice)

Mode Time Phrases Delta vs default
default 76.3 ms 2,025
case_insensitive 87.3 ms 1,614 +14% time, −20% phrases
collapse_whitespace 78.6 ms 2,025 +3% time, 0 phrases
both 243.1 ms 2,263 +219% time, +12% phrases

case_insensitive collapses phrases that differ only in case (e.g. "he said"/"He said" become one entry), which is why it reports fewer distinct phrases despite covering more of the text. Enable it when you care about deduplication, not when you care about maximum phrase count.

Block detection

Mode Time Phrases
False 74.2 ms 2,025
True 76.6 ms 2,025

Zero-cost toggle. Block detection matters for texts with repeated paragraphs/sections (documentation, legal boilerplate).


Parot vs comparator A (Rust-backed text index on PyPI)

Source: bench_vs_index.py · raw JSON — the closest Rust-backed competitor on PyPI.

Build time

Text size parot (ms) comparator A (ms) Winner
100 KB 3.4 12.8 parot (3.7x)
1 MB 35.4 76.9 parot (2.2x)
10 MB 370.1 1,425 parot (3.8x)

count — 100 patterns, query only

Text size parot (ms) comparator A (ms) Winner
100 KB 0.28 0.29 tied
1 MB 0.94 0.72 comparator A (1.3x)
10 MB 0.35 0.41 parot (1.2x)

find_all — 100 patterns, query only

parot uses batch_find_all; comparator A has no batch API, so we loop its per-pattern position lookup (as a user would).

Text size Patterns Total hits parot batch_find_all (ms) comparator A loop (ms) Winner
100 KB 100 201 0.11 1.87 parot (17x)
1 MB 100 3,523 0.12 3.18 parot (26x)
10 MB 100 17,876 0.35 43.3 parot (125x)

Memory — apples-to-apples

Both libraries measured via serialized-artifact size (Index.to_bytes() vs pickle.dumps(...)). The parot heap column is the live Python/Rust runtime footprint — comparator A has no equivalent accessor, so it's parot-only.

Text size Raw parot serialized parot ser. ratio comparator A pickle comparator A ratio parot heap parot heap ratio
100 KB 99,999 420,452 4.2x 126,949 1.3x 507,392 5.1x
1 MB 999,988 4,553,401 4.6x 1,174,839 1.2x 5,052,836 5.1x
10 MB 9,999,988 50,506,913 5.1x 13,152,252 1.3x 50,506,356 5.1x

Comparator A is ~4x smaller on disk because it keeps a more compact compressed form and drops the positional detail parot preserves. Parot keeps that detail — that's what powers batch_find_all's 125x lead over comparator A's per-pattern position lookup.

Summary

Comparator A is ~4x smaller on disk and ~1.3x faster on single-pattern count. Parot is 2–3x faster on build, 17–125x faster on batch find_all, and brings LCS + LPF + duplicate detection + serialization + sentence splitting + WASM + CLI with it.


Break-even — when does parot pay off?

Source: bench_breakeven.py · raw JSON

Per-query cost

Text size str.count per query parot build parot per query Break-even
100 KB 0.045 ms 5.22 ms 0.003 ms 125 queries
1 MB 0.46 ms 59.8 ms 0.003 ms 131 queries
10 MB 4.29 ms 750.5 ms 0.003 ms 176 queries

Total time (build + N queries) on 10 MB text

Queries str.count parot Winner
1 4.3 ms 751 ms str.count()
50 214 ms 751 ms str.count()
100 429 ms 751 ms str.count()
250 1,072 ms 751 ms parot
500 2,144 ms 752 ms parot
1,000 4,288 ms 754 ms parot (5.7x)

Break-even clusters around 125–180 queries regardless of text size. After that, every additional query costs parot 3 μs and str.count milliseconds-to-seconds.


Parot memory — serialized vs heap

Source: bench_memory.py · raw JSON

Parot keeps full positional detail alongside its compressed index — unlike comparator B, which drops positional detail in favor of a smaller artifact. The trade: parot is ~4x larger on disk but supports batch_find_all, segment APIs, and shared-prefix analysis without a rebuild.

Corpus Raw parot serialized parot ser. ratio parot heap parot heap ratio comparator A (index+text) comparator A ratio comparator B pickle comparator B ratio
1 MB 977 KB 4.3 MB 4.6x 4.8 MB 5.1x 4.8 MB 5.0x 1.1 MB 1.2x
4 MB 3.8 MB 18.3 MB 4.8x 19.3 MB 5.1x 19.1 MB 5.0x 4.8 MB 1.2x
16 MB 15.3 MB 77.1 MB 5.0x 77.1 MB 5.1x 76.3 MB 5.0x 20.2 MB 1.3x
64 MB 61.0 MB 323.5 MB 5.3x 308.3 MB 5.1x 305.2 MB 5.0x 81.4 MB 1.3x
256 MB 244.1 MB 1.32 GB 5.6x 1.20 GB 5.1x 1.19 GB 5.0x 314 MB 1.3x

Parot serialized ≈ comparator A (index+text) ≈ 5x raw. Parot heap is essentially identical (5.1x) — the Rust Vecs pack tightly. Comparator B's pickle sits at 1.3x because it drops positional detail.

parot memory scales linearly — serialized (on-disk) vs heap (runtime)


Memory/speed tradeoff

Source: bench_locate_levels.py · raw JSON

The memory_compactness parameter (0–4) controls the memory/speed tradeoff for find_all(). count() is unaffected — its speed stays identical across levels.

memory_compactness Build (ms) Memory (bytes) count (ms) find_all (ms)
0 (default) 60.3 9,052,792 0.28 0.31
1 61.8 2,302,872 0.28 1.04
2 60.1 1,677,880 0.28 2.60
3 60.1 1,365,384 0.28 6.03
4 60.5 1,209,136 0.28 13.2

count() is flat at ~0.28 ms regardless of level. find_all() spans 42x (0.31 ms → 13.2 ms). Memory spans 7.5x (1.2 MB → 9.1 MB on a 1 MB corpus).

memory_compactness memory/speed tradeoff curve

Normalization overhead (case-insensitive, collapse-whitespace)

Mode Build (ms) Memory (bytes) count (ms)
default 60.0 1,677,880 0.28
case_insensitive 64.5 1,677,912 0.28
normalize_whitespace 63.1 1,676,056 0.30
both 65.2 1,676,072 0.28

Normalization is effectively free at build and query time — the cost is paid up front when the text is copied and folded.


Parot build time

Source: bench_build_time.py

Parot's index construction runs in linear time. Slope verified against two other Rust-backed text indexes on PyPI:

parot build time vs corpus size — parot vs two other Rust-backed text indexes on PyPI

Build time at reference sizes (from bench_scaling.py):

Text size parot build
1 KB 0.10 ms
10 KB 0.50 ms
100 KB 5.00 ms
1 MB 58.8 ms
10 MB 655.2 ms

Memory grid — peak RSS + on-disk artifact

Source: bench_memory_grid.py

The two dimensions that matter in production: peak build RAM (how much headroom you need to construct the index) and on-disk artifact (how much space it takes to persist the index once built). Both swept across corpus size × memory_compactness.

Peak build RAM

Peak build RSS across corpus size × memory_compactness

On-disk artifact (serialized)

On-disk artifact size across corpus size × memory_compactness

Peak build RAM is always ~5x corpus regardless of memory_compactness — the full index is materialized during construction and discarded after. The on-disk artifact shrinks by up to ~4x as memory_compactness rises.


JavaScript / WASM

Index construction: WASM vs mnemonist (M4)

Source: bench_vs_mnemonist.mjs (lives in the parot-core JS benchmark suite). mnemonist is the standard JS package in this space (8.7M downloads/week, pure JS).

Input size parot (ours) mnemonist Speedup
10 K chars 0.44 ms 4.7 ms 11x
100 K chars 3.9 ms 58 ms 15x
500 K chars 20 ms 330 ms 17x
1 M chars 42 ms 787 ms 19x
10 M chars 730 ms DNF
100 M chars 9.8 s DNF

mnemonist DNFs at 10M+. Parot's linear-time construction scales to 100M chars. Parot is the only WASM-based positional index on npm.

Parot in the browser — vs indexOf + Aho-Corasick (M4)

Source: bench_parot_index.mjs (lives in the parot-core JS benchmark suite). Parot is the only positional search index of its kind on npm.

100 patterns, varying text size:

Text size indexOf loop modern-ahocorasick parot vs indexOf
10 KB 0.32 ms 0.34 ms 0.22 ms 1.5x
100 KB 3.8 ms 2.4 ms 0.22 ms 17x
1 MB 39 ms 24 ms 0.26 ms 143x
10 MB 383 ms 224 ms 0.29 ms 1,287x

Parot query time stays flat at ~0.3 ms regardless of text size. indexOf and Aho-Corasick scale linearly. Build cost: 54 ms on 1 MB, 1.7 MB WASM heap.


Longest common substring vs pylcs (M4)

Benchmark pending rewrite

These numbers are from a legacy bench_sa_construction.py retired when the suite was tiered. The LCS story will be restored as a dedicated script under benchmark/docs/. The table below reflects the last captured run.

vs pylcs (C++ DP backend):

Input size parot (ours) pylcs Speedup
1 KB 0.24 ms 3.4 ms 14x
5 KB 0.41 ms 85 ms 206x
10 KB 0.62 ms 331 ms 532x

Linear-time index vs quadratic DP — the gap grows quadratically.


Index construction (Rust native, M4)

Benchmark pending rewrite

Same retirement situation as the LCS table. Run the Rust-native benches in parot-core (cargo bench) to reproduce today.

Parot automatically selects among internal construction backends based on input size. Typical construction times on an M2 Pro (single-thread):

Input size Build
1–5 KB 0.02–0.10 ms
50 KB 0.72 ms
500 KB 6.9 ms
1 MB 13.9 ms
5 MB 66 ms

The substring gap (token-based search libraries miss substrings)

Token-based search libraries (fuse.js, lunr, minisearch) split text into words. They can't find arbitrary substrings:

Query fuse.js lunr minisearch Parot
"rown fox" -- Y Y Y
"ATTA" -- -- -- Y
"Script dev" -- -- -- Y
"ows-Whe" -- -- -- Y
"ix arr" -- -- -- Y
"uick brown f" -- Y Y Y
Score 0/6 2/6 2/6 6/6

Reproduce with benchmark/utils/demo_substring_gap.mjs. Parot's WASM build runs in any browser — no server round-trip, no backend needed.


Running the benchmarks yourself

All commands run from the project root.

# Aggregate: runs the full benchmark suite
just bench

# Smoke test: every script, BENCH_QUICK=1, ~30s total
just bench-quick

# The scaling headline: parot vs pyahocorasick / str.count / pandas / polars /
# pyarrow / duckdb on a 10 MB Wikipedia corpus, sweeping N in {10, 100, 1K,
# 10K, 100K} and writing a log-log SVG to benchmark/charts/.
just bench-query-scaling          # Python + JS, plus the SVG
just bench-query-scaling-run      # Python + JS only (no plot)
just bench-query-scaling-py       # Python sweep only (slow)
just bench-query-scaling-js       # JS sweep only (fast)
just bench-query-scaling-plot     # re-render SVG from saved JSON

Running individual benchmark scripts directly

Because each benchmark script is a standalone Python or Node program that writes its own JSON to benchmark/output/{chip}/, you can invoke them one at a time:

# Headlines (Tier 1 — the pitch deck)
uv run benchmark/headlines/bench_hero_locate.py       # 16 competitors vs parot, 10 MB Wikipedia
uv run benchmark/headlines/bench_hero_plot.py         # 1 GB Wikipedia, 100 patterns
uv run benchmark/headlines/bench_query_scaling.py     # Python side of the scaling sweep
uv run benchmark/headlines/bench_scaling.py           # parot query time vs corpus size
uv run benchmark/headlines/bench_breakeven.py         # build-once-query-many crossover
uv run benchmark/headlines/bench_vs_index.py          # vs a Rust-backed text index on PyPI
node  benchmark/headlines/bench_query_scaling_js.mjs  # JS side of the scaling sweep

# Memory (Tier 2 — footprint + tradeoffs)
uv run benchmark/memory/bench_memory.py               # 5x heap linearity + disk artifact
uv run benchmark/memory/bench_locate_levels.py        # memory_compactness memory/speed tradeoff
uv run benchmark/memory/bench_build_time.py           # O(n) build-time slope
uv run benchmark/memory/bench_memory_grid.py          # peak RSS x corpus size x memory_compactness

# Docs (Tier 3 — application evidence)
uv run benchmark/docs/bench_multipattern.py           # parot vs Aho-Corasick + regex
uv run benchmark/docs/bench_duplicates.py             # duplicate phrase detection
uv run benchmark/docs/bench_genomics.py               # CRISPR guide search vs BioPython
uv run benchmark/docs/bench_dna.py                    # DNA motif sweep
# bench_dataframes.py ships inside the parot-core benchmark suite rather than
# this public repo; its JSON output is committed under benchmark/output/.

# JS / WASM
node benchmark/bench_hero_js.mjs                      # WASM parot vs indexOf vs modern-ahocorasick
node benchmark/bench_scale.mjs                        # WASM parot scale sweep (1K -> 5M words)

Every script writes to benchmark/output/{chip}/bench_*.json and (for most) benchmark/output/{chip}/md/bench_*.md. The chip slug is auto-detected (m2-pro, m4, ...). Delete the matching JSON to force a re-run.

Regenerating charts

Most benchmark scripts now emit their own {name}_{dark,light}.svg pair under benchmark/output/{chip}/{hero,memory,docs}/ as part of the run. To refresh the images on this page, re-run the corresponding benchmark and copy the SVG pair into docs/assets/ with the naming convention <topic>-{dark,light}.svg.

Large-corpus runs (workstation only)

FSA_BENCH_XL=1 uv run benchmark/headlines/bench_scaling.py    # 10 GB tier
FSA_BENCH_XXL=1 uv run benchmark/headlines/bench_scaling.py   # 100 GB tier

The default-gated tier is laptop-runnable. Each tier needs the corresponding corpus downloaded via benchmark/utils/download.sh.