Demos¶

Runnable examples showing Parot in real-world scenarios. Each Python demo is a standalone script — install Parot (pip install "git+https://github.com/sophiaconsulting/parot#subdirectory=bindings/python") and run the file.

Content Moderation: Spam & Copypasta Detection¶

4 of 4 planted campaigns detected in 46 ms. Bot networks amplify messages by posting near-identical content. Exact-match hashing misses campaigns with minor variation. Word-boundary-aware duplicate detection catches them all.

Demo: 5,160 synthetic posts — 5,000 organic and 160 spam across 4 coordinated campaigns (crypto scam, astroturfing, disinformation, phishing). Run find_duplicates() on the concatenated feed.

import parot

# Concatenate all posts in a feed batch
feed_text = "\n".join(post["text"] for post in posts)

# Surface repeated phrases (46ms on 336K characters)
results = parot.find_duplicates(
    feed_text,
    min_words=8,
    min_chars=30,
    max_words=60,
)

Results (Apple Silicon):

Suspected Campaign	Occurrences	Words	Type
Invest now in TurboMoon coin and earn guaranteed returns...	45	33	Crypto scam
Breaking news: according to multiple unnamed sources...	60	20	Disinformation
Urgent security alert your account has been compromised...	25	29	Phishing
I switched to BrandX last month and honestly it changed...	30	22	Astroturfing

Insight: bot copypasta is distinguishable from organic repetition by phrase length. Campaigns are 20–33 words; organic template overlap is 8–14 words. A simple heuristic (>15 words + >10 occurrences) separates signal from noise.

uv run examples/demo_content_moderation.py

Bioinformatics: Genome-Scale Motif Search¶

21,000× faster than BioPython on 50 MB of GRCh38 chr22. Index a genome once; look up thousands of short motifs in time proportional only to pattern length.

Demo: 19 real biological motifs (EcoRI, BamHI, HindIII, TATA box, E-box, etc.) on synthetic genomes up to 50 MB, compared against bytes.count(), re.findall(), and BioPython.

from parot import Index

genome = open("genome.fasta").read()
index = Index(genome)  # Build once

# Each query: ~0.03ms regardless of genome size
enzymes = ["GAATTC", "GGATCC", "AAGCTT", "GCGGCCGC"]
for motif in enzymes:
    print(f"{motif}: {index.count(motif):,} occurrences")

Results (Apple M2 Pro, 1,000 sgRNA guides × GRCh38 chr22, 50 MB):

Competitor	Regime	Median (ms)	Speedup vs parot
parot `batch_find_all`	index	2.31	1.0x
`bytes.find` loop	substring	49,333	21,313x
BioPython `Seq.count_overlap`	count-only	49,576	21,418x
ripgrep (subprocess)	substring	189.3	82x

Query time stays flat as the genome grows — pattern length, not corpus size, is what matters. Same class of data structure that powers BWA and Bowtie2.

uv run examples/demo_genome_search.py

Plagiarism Detection: Finding Copied Passages¶

37% of the essay plagiarized, detected in 1 ms. Two complementary primitives: longest_common_substring finds the single longest shared run; find_duplicates surfaces every shared phrase.

Demo: student essay vs. source document. The student copied 5 paragraphs verbatim and wrote their own transitions.

import parot

# Find the single longest shared passage (0.57ms)
lcs = parot.longest_common_substring(source, essay)

# Find ALL shared phrases between documents
combined = source + "\n\n" + essay
duplicates = parot.find_duplicates(combined, min_words=5)
shared = [d for d in duplicates if d["count"] >= 2]

Results (Apple Silicon):

Longest shared passage: 277 characters, 34 words — found in 0.57ms
All shared phrases: 15 distinct passages totaling 128 words — found in 1ms
Verdict: 37% of the essay was plagiarized

The 15 detected phrases range from 4 to 28 words, capturing every copied passage including fragments split across paragraph boundaries.

uv run examples/demo_plagiarism_detection.py

Client-Side Browser Search¶

1,287× faster than indexOf on 10 MB. Ship Parot in WebAssembly for instant full-text search with no server round-trip.

Open the live demo → — searches 10 MB of Dickens in your browser.

Or use it in any project:

import init, { Index } from 'parot';
await init();

const text = document.body.innerText;   // ~100 KB of docs
const index = new Index(text, 0);       // Build: ~10 ms

const count = index.count("search term");
const positions = index.findAll("search term");
console.log(`${count} matches, index is ${index.heapSize()} bytes`);

Query time depends only on pattern length — searching 10 MB takes the same time as 10 KB. memory_compactness=0 (default) puts the WASM footprint at ~5× text; memory_compactness=4 shrinks it to ~1.4× at the cost of slower find_all.

Results (Apple M4):

Text size	indexOf loop	parot	Speedup
10 KB	0.32ms	0.22ms	1.5x
100 KB	3.8ms	0.22ms	17x
1 MB	39ms	0.26ms	143x
10 MB	383ms	0.29ms	1,287x

The demo includes a live text editor — paste your own content and search it instantly.

Index Visualizer¶

An educational tool for exploring Parot's index structures. Type a short string; every data structure is computed and visualized live.

Open the live demo →

Or run locally with just demo and open http://localhost:8765/examples/demo_visualizer.html.

What it shows:

Index table — all suffixes sorted lexicographically, with rank and position
Shared-prefix bars — shared-prefix lengths shown as colored bars
Transform string — each character in its own cell, with sentinel highlighted
Longest repeated substring — highlighted in the original text
Distinct substring count — total unique substrings
Round-trip — demonstrates perfect recovery from the reversible transform

Try typing "mississippi" or "abracadabra" — classic textbook examples that reveal beautiful patterns.

Text Complexity Analyzer¶

Paste text; get instant metrics — distinct substring count, longest repeated substring, and every duplicate phrase color-coded inline. A writing tool for finding accidental repetition.

Open the live demo →

Or run locally with just demo and open http://localhost:8765/examples/demo_text_complexity.html.

Features:

Information density — distinct substring count measures how much unique content exists
Duplicate phrase detection — every repeated phrase highlighted in a different color
Interactive table — click any phrase to scroll to it in the text
Compare mode — paste two texts side by side to compare complexity profiles
Tunable parameters — adjust min words, toggle case sensitivity

Includes presets: Lorem ipsum (highly repetitive), Dickens (literary), random characters (high entropy). The contrast between these is immediately striking.

Longest Common Substring Finder¶

Two side-by-side text panes. Paste text in each; the longest shared passage is highlighted in both. Immediately intuitive: "what do these two texts have in common?"

Open the live demo →

Or run locally with just demo and open http://localhost:8765/examples/demo_lcs.html.

Features:

Side-by-side highlighting — the LCS highlighted in green in both panes
Stats — LCS length (chars and words), similarity percentage, time taken
Multiple occurrences — if the LCS appears more than once, all instances are highlighted
Preset examples — original vs plagiarized, two Wikipedia articles, similar code snippets

Useful for comparing documents, detecting copied content, or finding shared code patterns.