Skip to content

Demos

Runnable examples showing Parot in real-world scenarios. Each Python demo is a standalone script — install Parot (pip install "git+https://github.com/sophiaconsulting/parot#subdirectory=bindings/python") and run the file.


Content Moderation: Spam & Copypasta Detection

4 of 4 planted campaigns detected in 46 ms. Bot networks amplify messages by posting near-identical content. Exact-match hashing misses campaigns with minor variation. Word-boundary-aware duplicate detection catches them all.

Demo: 5,160 synthetic posts — 5,000 organic and 160 spam across 4 coordinated campaigns (crypto scam, astroturfing, disinformation, phishing). Run find_duplicates() on the concatenated feed.

import parot

# Concatenate all posts in a feed batch
feed_text = "\n".join(post["text"] for post in posts)

# Surface repeated phrases (46ms on 336K characters)
results = parot.find_duplicates(
    feed_text,
    min_words=8,
    min_chars=30,
    max_words=60,
)

Results (Apple Silicon):

Suspected Campaign Occurrences Words Type
Invest now in TurboMoon coin and earn guaranteed returns... 45 33 Crypto scam
Breaking news: according to multiple unnamed sources... 60 20 Disinformation
Urgent security alert your account has been compromised... 25 29 Phishing
I switched to BrandX last month and honestly it changed... 30 22 Astroturfing

Insight: bot copypasta is distinguishable from organic repetition by phrase length. Campaigns are 20–33 words; organic template overlap is 8–14 words. A simple heuristic (>15 words + >10 occurrences) separates signal from noise.

uv run examples/demo_content_moderation.py

21,000× faster than BioPython on 50 MB of GRCh38 chr22. Index a genome once; look up thousands of short motifs in time proportional only to pattern length.

Demo: 19 real biological motifs (EcoRI, BamHI, HindIII, TATA box, E-box, etc.) on synthetic genomes up to 50 MB, compared against bytes.count(), re.findall(), and BioPython.

from parot import Index

genome = open("genome.fasta").read()
index = Index(genome)  # Build once

# Each query: ~0.03ms regardless of genome size
enzymes = ["GAATTC", "GGATCC", "AAGCTT", "GCGGCCGC"]
for motif in enzymes:
    print(f"{motif}: {index.count(motif):,} occurrences")

Results (Apple M2 Pro, 1,000 sgRNA guides × GRCh38 chr22, 50 MB):

Competitor Regime Median (ms) Speedup vs parot
parot batch_find_all index 2.31 1.0x
bytes.find loop substring 49,333 21,313x
BioPython Seq.count_overlap count-only 49,576 21,418x
ripgrep (subprocess) substring 189.3 82x

Query time stays flat as the genome grows — pattern length, not corpus size, is what matters. Same class of data structure that powers BWA and Bowtie2.

uv run examples/demo_genome_search.py

Plagiarism Detection: Finding Copied Passages

37% of the essay plagiarized, detected in 1 ms. Two complementary primitives: longest_common_substring finds the single longest shared run; find_duplicates surfaces every shared phrase.

Demo: student essay vs. source document. The student copied 5 paragraphs verbatim and wrote their own transitions.

import parot

# Find the single longest shared passage (0.57ms)
lcs = parot.longest_common_substring(source, essay)

# Find ALL shared phrases between documents
combined = source + "\n\n" + essay
duplicates = parot.find_duplicates(combined, min_words=5)
shared = [d for d in duplicates if d["count"] >= 2]

Results (Apple Silicon):

  • Longest shared passage: 277 characters, 34 words — found in 0.57ms
  • All shared phrases: 15 distinct passages totaling 128 words — found in 1ms
  • Verdict: 37% of the essay was plagiarized

The 15 detected phrases range from 4 to 28 words, capturing every copied passage including fragments split across paragraph boundaries.

uv run examples/demo_plagiarism_detection.py

1,287× faster than indexOf on 10 MB. Ship Parot in WebAssembly for instant full-text search with no server round-trip.

Open the live demo → — searches 10 MB of Dickens in your browser.

Or use it in any project:

import init, { Index } from 'parot';
await init();

const text = document.body.innerText;   // ~100 KB of docs
const index = new Index(text, 0);       // Build: ~10 ms

const count = index.count("search term");
const positions = index.findAll("search term");
console.log(`${count} matches, index is ${index.heapSize()} bytes`);

Query time depends only on pattern length — searching 10 MB takes the same time as 10 KB. memory_compactness=0 (default) puts the WASM footprint at ~5× text; memory_compactness=4 shrinks it to ~1.4× at the cost of slower find_all.

Results (Apple M4):

Text size indexOf loop parot Speedup
10 KB 0.32ms 0.22ms 1.5x
100 KB 3.8ms 0.22ms 17x
1 MB 39ms 0.26ms 143x
10 MB 383ms 0.29ms 1,287x

The demo includes a live text editor — paste your own content and search it instantly.


Index Visualizer

An educational tool for exploring Parot's index structures. Type a short string; every data structure is computed and visualized live.

Open the live demo →

Or run locally with just demo and open http://localhost:8765/examples/demo_visualizer.html.

What it shows:

  • Index table — all suffixes sorted lexicographically, with rank and position
  • Shared-prefix bars — shared-prefix lengths shown as colored bars
  • Transform string — each character in its own cell, with sentinel highlighted
  • Longest repeated substring — highlighted in the original text
  • Distinct substring count — total unique substrings
  • Round-trip — demonstrates perfect recovery from the reversible transform

Try typing "mississippi" or "abracadabra" — classic textbook examples that reveal beautiful patterns.


Text Complexity Analyzer

Paste text; get instant metrics — distinct substring count, longest repeated substring, and every duplicate phrase color-coded inline. A writing tool for finding accidental repetition.

Open the live demo →

Or run locally with just demo and open http://localhost:8765/examples/demo_text_complexity.html.

Features:

  • Information density — distinct substring count measures how much unique content exists
  • Duplicate phrase detection — every repeated phrase highlighted in a different color
  • Interactive table — click any phrase to scroll to it in the text
  • Compare mode — paste two texts side by side to compare complexity profiles
  • Tunable parameters — adjust min words, toggle case sensitivity

Includes presets: Lorem ipsum (highly repetitive), Dickens (literary), random characters (high entropy). The contrast between these is immediately striking.


Longest Common Substring Finder

Two side-by-side text panes. Paste text in each; the longest shared passage is highlighted in both. Immediately intuitive: "what do these two texts have in common?"

Open the live demo →

Or run locally with just demo and open http://localhost:8765/examples/demo_lcs.html.

Features:

  • Side-by-side highlighting — the LCS highlighted in green in both panes
  • Stats — LCS length (chars and words), similarity percentage, time taken
  • Multiple occurrences — if the LCS appears more than once, all instances are highlighted
  • Preset examples — original vs plagiarized, two Wikipedia articles, similar code snippets

Useful for comparing documents, detecting copied content, or finding shared code patterns.