Demo Recording Script¶
Last updated: 2026-05-19 for v0.6.0 Purpose: 60-second conference-ready demo showing PikoClaw's complete extraction workflow. Target: Panathenea 2026 — Athens, May 27–29 No dependencies: WiFi-free, runs locally, reproducible
What's new in v0.6.0 (since last script revision)¶
These v0.6.0 features are layered into the demo at natural points — no scene was inflated to fit them, but they're the headline capabilities to surface on stage:
pikoclaw doctor(#270) — environment pre-flight. Open the demo with this so the audience sees a green-light report before we touch any data.pikoclaw attachment extract(#242) — single-file deep extraction with OCR/vision. Lets us tell a "one attachment, recovered intelligence" story.pikoclaw graph-compute(#285, ICD §8.3) — offline graph metrics. Pairs with the network graph scene so we're quantifying the network, not just rendering it.--progress(#275, ICD §9) — Rich progress bar on stderr. Replaces the old text-overlay fake.- Presidio PII redaction (#241) —
--redaction-policy-fileswaps the legacy regex redactor for Microsoft Presidio. - New flags (#260):
--output-format,--options-file,--provenance,--interface-version. - Structured exit codes (ICD §7.3): 0 success, 2 input/validation, 3 partial, 4 fatal, 5 internal. Useful talking point when an audience member asks "how do you script this in CI?"
Preparation (Before Recording)¶
1. Install PikoClaw¶
# Create a fresh virtual environment
python3 -m venv demo-env
source demo-env/bin/activate
# Install PikoClaw
pip install pikoclaw
# Verify installation
pikoclaw info
pikoclaw --interface-version # Prints ICD version (e.g. "1.0.0")
2. Pre-flight with doctor (NEW in v0.6.0)¶
pikoclaw doctor validates the environment before extraction: Python version, optional deps (scikit-learn, networkx, pypff, Presidio, MarkItDown), disk space, write permissions. Use the human-readable form on screen and the JSON form in your backup notes.
# Human-readable report (good for the demo)
pikoclaw doctor
# Structured report (for the pre-recording checklist / CI gate)
pikoclaw doctor --json
Doctor exits 0 if everything's green, non-zero if any required dependency is missing — so a CI job can gate the conference machine before plugging it in.
3. Download Test Data¶
Option A: Enron Dataset (Recommended for authenticity)
# Download the full Enron corpus (~400 MB compressed, ~1.3 GB extracted)
wget https://www.cs.cmu.edu/~enron/enron_mail_20150507.tar.gz
# Extract a single user's mailbox for the demo (smaller, faster)
tar -xzf enron_mail_20150507.tar.gz maildir/allen-p/
# This gives you ~1700 messages from Phillip Allen's mailbox
Option B: Synthetic Test Data (Faster download)
# Create a minimal maildir for testing
mkdir -p demo-maildir/{cur,new,tmp}
# Generate a few sample emails (you'll need to create these)
# Or use an existing personal MBOX export
Option C: Your Own Data
# Gmail Takeout: Download from https://takeout.google.com
# Select "Mail" only, MBOX format
# You'll get a file like "All mail Including Spam and Trash.mbox"
# Outlook PST: Export from Outlook via File → Open & Export → Import/Export
# Choose "Export to a file" → "Outlook Data File (.pst)"
4. Set Up Obsidian (Optional but Impressive)¶
# Download Obsidian: https://obsidian.md
# Create a new vault pointing to where you'll extract output
# This lets you show the wiki with live wikilinks in the demo
Demo Script (60 seconds)¶
Scene 1: The Problem (0:00–0:08)¶
Screen: Terminal with empty directory
Voiceover:
"When someone leaves your organization, their email doesn't have to leave with them. Let's extract institutional knowledge from this archive."
Action:
ls -lh maildir/allen-p/ # Show the raw maildir
pikoclaw doctor # NEW: pre-flight green-light, builds audience trust
pikoclaw doctor takes about a second and prints a one-screen report. If the recording is tight, cut between ls and doctor with a quick fade — but keep doctor on screen long enough to read the "Overall: PASS" line. That's the trust signal.
Scene 2: The Command (0:08–0:18)¶
Screen: Terminal, typing the command
Voiceover:
"One command. Provenance, redaction, progress reporting, all built in."
Action:
pikoclaw extract maildir/allen-p/ \
--output allen-kb \
--json --csv \
--provenance \
--progress \
--redaction-policy-file policies/demo-redaction.json
Each flag earns its place in the demo:
--provenance(#260) writesprovenance.jsonwith the source SHA-256, tool version, and warnings — that's Scene 5.--progress(#275) renders a Rich progress bar on stderr instead of relying on a fake text overlay. This is real now.--redaction-policy-file(#241) routes through Microsoft Presidio for PII redaction. The policy file is a JSON document that lists which entity types to scrub (EMAIL_ADDRESS, PHONE_NUMBER, US_SSN, etc.) and the redaction strategy per type.
Equivalent compact form using --output-format (#260) — useful for the slide deck if the multi-flag form looks busy:
pikoclaw extract maildir/allen-p/ \
--output allen-kb \
--output-format json,csv,obsidian,graph \
--provenance --progress
The graph format additionally auto-runs graph-compute on completion (Scene 4 setup, free of charge).
Scene 3: The Output — Wiki (0:18–0:32)¶
Screen: Obsidian with the generated wiki open
Voiceover:
"Navigable wiki. Threads, contacts, full-text search — all Obsidian-native."
Action:
- Open allen-kb/wiki/index.md in Obsidian
- Click a wikilink to a contact: [[Phillip Allen]]
- Show the contact page with sent/received counts
- Click a thread: [[Thread: Q3 Budget Discussion]]
- Show the threaded conversation with participants and timeline
On-screen text overlay:
• Obsidian-native [[wikilinks]]
• Conversation threading
• Contact intelligence
• PII redacted via Presidio
Scene 4: The Network — Graph + Metrics (0:32–0:44)¶
Screen: Browser with graph.html open, then terminal showing graph_metrics.json
Voiceover:
"Force-directed graph shows who talked to whom. Then we quantify it — offline, no LLM, deterministic."
Action:
# Visualize: interactive D3 graph
pikoclaw viz allen-kb -o allen-kb/graph.html
# Quantify: NEW in v0.6.0 — offline graph metrics (ICD §8.3)
pikoclaw graph-compute --input allen-kb --top-n 10
graph-compute (#285) reads contacts.json, builds the contact network in NetworkX, computes density, clustering coefficient, connected components, and the top-N connectors by degree, then writes allen-kb/graph_metrics.json. The summary prints to stdout:
Graph metrics written to allen-kb/graph_metrics.json
Nodes: 142
Edges: 387
Density: 0.038524
Components: 4
Clustering: 0.412908
Top connector: Phillip Allen <phillip.allen@enron.com> (degree=98)
On-screen text overlay:
• HITS scores
• Louvain communities (colored clusters)
• Knowledge risk metrics (top connectors = succession risk)
The "top connector" line is the money shot for the "Post-Departure Knowledge Recovery" pitch: if Phillip Allen leaves, you can see precisely which 97 other people lost their primary line of communication.
Scene 5: The Attachment Story (0:44–0:52)¶
Screen: Terminal, extracting a single attachment
Voiceover:
"A budget spreadsheet buried in a 2002 email. Pulled out, OCR'd, indexed — in one call."
Action:
# NEW in v0.6.0 — single-file deep extraction (#242, ICD §7.1)
pikoclaw attachment extract \
maildir/allen-p/sent/Q3_budget.xlsx \
--depth deep \
--provenance \
--output ./attachments-out
pikoclaw attachment extract (#242) handles a single high-value attachment outside the bulk pipeline:
--depth shallowruns MarkItDown (fast, text-only).--depth deepadds OCR (for scanned PDFs / images) and vision-language extraction. Slower, but it's the one to demo because it's the differentiator.--provenanceemits a per-attachmentprovenance.jsonwith the file's SHA-256 and the extraction tool version — same audit-trail contract as the bulk extractor.
Output prints the SHA-256 hash and the path to the extracted Markdown.
Scene 6: Provenance + The Pitch (0:52–1:00)¶
Screen: Terminal displaying provenance.json, then title card
Voiceover:
"Every extraction is auditable. Source hash, tool version, warnings, every time."
Action:
Output:
{
"source_hash": "a3f5b8c...",
"tool": "pikoclaw",
"version": "0.6.0",
"extracted_at": "2026-05-19T10:30:00Z",
"statistics": {
"total_emails": 1729,
"total_contacts": 142,
"total_threads": 387
},
"warnings": []
}
Closing card:
Technical Details (Not in Video, But Good to Know)¶
Timing Breakdown¶
| Stage | Time | Notes |
|---|---|---|
| Problem + doctor | 8s | Show raw data, doctor confirms green |
| Command execution | 10s | Real --progress bar, no fake overlay |
| Wiki navigation | 14s | Click through 2-3 pages in Obsidian |
| Graph + metrics | 12s | D3 graph plus graph-compute summary |
| Attachment extract | 8s | attachment extract --depth deep on a single file |
| Provenance + CTA | 8s | provenance.json, then closing card |
Total: ~60 seconds
Exit-code talking point (ICD §7.3)¶
If asked "how do you wire this into CI?":
| Code | Meaning | Example |
|---|---|---|
| 0 | SUCCESS | All files extracted cleanly |
| 2 | INPUT_VALIDATION | Bad path, unsupported format, missing dependency |
| 3 | PARTIAL_EXTRACTION | Some files skipped (e.g. password-protected) |
| 4 | FATAL_EXTRACTION | All sources failed |
| 5 | INTERNAL_ERROR | Unhandled exception — please file a bug |
A Karkinos job runner (or any shell script) can branch on exit code without parsing stderr.
Post-Production Tips¶
-
Real progress is real. With
--progressyou get a Rich progress bar on stderr — record at native speed. No "Processing..." fake overlay needed. -
Smooth transitions: Fade between terminal, Obsidian, and browser windows. Avoid jarring cuts.
-
Annotations: Use text overlays to highlight key features (wikilinks, graph metrics, SHA-256 hash, exit codes).
-
No audio needed: This script works as a silent demo with captions, or with a live voiceover during presentation.
-
Backup plan: Pre-render the output. Have
allen-kb/already generated in a backup directory, plus a recorded run of the full pipeline in case anything goes sideways on stage.
Fallback: Slide Deck (If Demo Fails)¶
If live demo or video playback fails at the venue:
-
Slide 1: Problem statement — "Email archives are institutional memory waiting to be unlocked."
-
Slide 2: Command screenshot —
pikoclaw extract mailbox.pst --provenance --progress -
Slide 3: Wiki screenshot in Obsidian with wikilinks visible
-
Slide 4: Graph visualization screenshot and
graph_metrics.jsonsummary (density, top connector) -
Slide 5: Attachment extraction —
pikoclaw attachment extract … --depth deep -
Slide 6: Provenance JSON screenshot with SHA-256 hash circled
-
Slide 7: GitHub repo QR code + pip install command
Total slides: 7 (~8s/slide)
Checklist Before Recording¶
- [ ] Fresh Python virtual environment
- [ ]
pikoclaw infoshows version 0.6.0 or later - [ ]
pikoclaw doctorexits 0 (all green) - [ ]
pikoclaw --interface-versionprints expected ICD version - [ ] Test data downloaded and extracted (Enron or personal archive)
- [ ]
policies/demo-redaction.jsonwritten and tested - [ ] Obsidian installed and vault configured
- [ ] D3 graph loads in browser (
pikoclaw viz allen-kb -o allen-kb/graph.html) - [ ]
graph-computeruns andgraph_metrics.jsonlooks right - [ ]
attachment extract --depth deepruns end-to-end on the chosen file - [ ]
provenance.jsondisplays correctly withjq - [ ] Screen recording software ready (OBS, QuickTime, etc.)
- [ ] Clear terminal history (
clear) - [ ] Close unnecessary windows/notifications
- [ ] Test full workflow once before recording
Recording Settings¶
Resolution: 1920x1080 (Full HD) Frame rate: 30 fps (60 fps if showing smooth graph interactions) Audio: Optional voiceover or silent with captions Length: 60 seconds ± 5 seconds Format: MP4 (H.264 codec) for maximum compatibility
Alternative: Live Demo at Conference¶
If presenting live instead of pre-recorded video:
-
Have backup data: Pre-extracted output in case WiFi fails or extraction is slow.
-
Practice timing: Rehearse the full workflow to stay under 60 seconds.
-
Use presenter notes: Keep this script as a reference during the talk.
-
Show the command first: Type it out for the audience, then run it.
-
Fallback to slides: If anything breaks, switch to the slide deck (see above).
Post-Demo: Distribute Files¶
After the conference, share:
- Demo video: Upload to YouTube with captions
- Sample output:
allen-kb.zip(sanitized Enron extraction, post-Presidio) on GitHub Releases - Sample policy:
policies/demo-redaction.jsonnext to the release - Instructions: Link to this script from the main README
Goal: Anyone should be able to reproduce the demo on their own machine in under 5 minutes.
Questions to Answer in Q&A¶
Q: How long does extraction take for large archives?
A: Enron corpus (~1.3 GB, ~500K messages) takes ~5-10 minutes on a modern laptop. Extraction is I/O-bound, not CPU-bound. With --progress you can watch it work in real time.
Q: Does it work on Windows? A: Yes, but libpff installation for PST support can be tricky. We recommend Docker on Windows. Maildir/MBOX/EML/Slack work natively.
Q: Can I use this for GDPR compliance?
A: Yes. --redaction-policy-file runs Microsoft Presidio (PERSON, EMAIL_ADDRESS, PHONE_NUMBER, US_SSN, CREDIT_CARD, IP_ADDRESS, and custom entity recognizers). Provenance metadata provides chain of custody for audits.
Q: Does it send data anywhere?
A: No. Zero network dependencies. Air-gapped by default. --telemetry-off is the default — PikoClaw never phones home.
Q: How do I script this in CI? A: Structured exit codes per ICD §7.3: 0 success, 2 input error, 3 partial (some files skipped), 4 fatal, 5 internal bug. Branch on exit code, no stderr parsing required.
Q: Can I extract a single attachment without running the full pipeline?
A: Yes — pikoclaw attachment extract <file> --depth deep. Useful for forensic workflows where one specific document matters.
Q: What's the difference between PikoClaw and PicoClaw? A: PicoClaw (Sipeed, 12K+ stars) is a lightweight edge AI agent. PikoClaw is its long-term memory layer — the institutional knowledge store that agents query.
Success Metrics¶
After the demo:
- Immediate: GitHub stars spike, docs site traffic increases
- Short-term: 3+ people run the demo and open issues/PRs
- Mid-term: Someone builds a PikoClaw + PicoClaw integration (§25 in roadmap)
- Long-term: Academic citation in a threading/knowledge extraction paper
Good luck.