Skip to content

Demo Recording Script

Last updated: 2026-05-19 for v0.6.0 Purpose: 60-second conference-ready demo showing PikoClaw's complete extraction workflow. Target: Panathenea 2026 — Athens, May 27–29 No dependencies: WiFi-free, runs locally, reproducible


What's new in v0.6.0 (since last script revision)

These v0.6.0 features are layered into the demo at natural points — no scene was inflated to fit them, but they're the headline capabilities to surface on stage:

  • pikoclaw doctor (#270) — environment pre-flight. Open the demo with this so the audience sees a green-light report before we touch any data.
  • pikoclaw attachment extract (#242) — single-file deep extraction with OCR/vision. Lets us tell a "one attachment, recovered intelligence" story.
  • pikoclaw graph-compute (#285, ICD §8.3) — offline graph metrics. Pairs with the network graph scene so we're quantifying the network, not just rendering it.
  • --progress (#275, ICD §9) — Rich progress bar on stderr. Replaces the old text-overlay fake.
  • Presidio PII redaction (#241) — --redaction-policy-file swaps the legacy regex redactor for Microsoft Presidio.
  • New flags (#260): --output-format, --options-file, --provenance, --interface-version.
  • Structured exit codes (ICD §7.3): 0 success, 2 input/validation, 3 partial, 4 fatal, 5 internal. Useful talking point when an audience member asks "how do you script this in CI?"

Preparation (Before Recording)

1. Install PikoClaw

# Create a fresh virtual environment
python3 -m venv demo-env
source demo-env/bin/activate

# Install PikoClaw
pip install pikoclaw

# Verify installation
pikoclaw info
pikoclaw --interface-version   # Prints ICD version (e.g. "1.0.0")

2. Pre-flight with doctor (NEW in v0.6.0)

pikoclaw doctor validates the environment before extraction: Python version, optional deps (scikit-learn, networkx, pypff, Presidio, MarkItDown), disk space, write permissions. Use the human-readable form on screen and the JSON form in your backup notes.

# Human-readable report (good for the demo)
pikoclaw doctor

# Structured report (for the pre-recording checklist / CI gate)
pikoclaw doctor --json

Doctor exits 0 if everything's green, non-zero if any required dependency is missing — so a CI job can gate the conference machine before plugging it in.

3. Download Test Data

Option A: Enron Dataset (Recommended for authenticity)

# Download the full Enron corpus (~400 MB compressed, ~1.3 GB extracted)
wget https://www.cs.cmu.edu/~enron/enron_mail_20150507.tar.gz

# Extract a single user's mailbox for the demo (smaller, faster)
tar -xzf enron_mail_20150507.tar.gz maildir/allen-p/

# This gives you ~1700 messages from Phillip Allen's mailbox

Option B: Synthetic Test Data (Faster download)

# Create a minimal maildir for testing
mkdir -p demo-maildir/{cur,new,tmp}

# Generate a few sample emails (you'll need to create these)
# Or use an existing personal MBOX export

Option C: Your Own Data

# Gmail Takeout: Download from https://takeout.google.com
# Select "Mail" only, MBOX format
# You'll get a file like "All mail Including Spam and Trash.mbox"

# Outlook PST: Export from Outlook via File → Open & Export → Import/Export
# Choose "Export to a file" → "Outlook Data File (.pst)"

4. Set Up Obsidian (Optional but Impressive)

# Download Obsidian: https://obsidian.md
# Create a new vault pointing to where you'll extract output
# This lets you show the wiki with live wikilinks in the demo

Demo Script (60 seconds)

Scene 1: The Problem (0:00–0:08)

Screen: Terminal with empty directory

Voiceover:

"When someone leaves your organization, their email doesn't have to leave with them. Let's extract institutional knowledge from this archive."

Action:

ls -lh maildir/allen-p/        # Show the raw maildir
pikoclaw doctor                # NEW: pre-flight green-light, builds audience trust

pikoclaw doctor takes about a second and prints a one-screen report. If the recording is tight, cut between ls and doctor with a quick fade — but keep doctor on screen long enough to read the "Overall: PASS" line. That's the trust signal.


Scene 2: The Command (0:08–0:18)

Screen: Terminal, typing the command

Voiceover:

"One command. Provenance, redaction, progress reporting, all built in."

Action:

pikoclaw extract maildir/allen-p/ \
  --output allen-kb \
  --json --csv \
  --provenance \
  --progress \
  --redaction-policy-file policies/demo-redaction.json

Each flag earns its place in the demo:

  • --provenance (#260) writes provenance.json with the source SHA-256, tool version, and warnings — that's Scene 5.
  • --progress (#275) renders a Rich progress bar on stderr instead of relying on a fake text overlay. This is real now.
  • --redaction-policy-file (#241) routes through Microsoft Presidio for PII redaction. The policy file is a JSON document that lists which entity types to scrub (EMAIL_ADDRESS, PHONE_NUMBER, US_SSN, etc.) and the redaction strategy per type.

Equivalent compact form using --output-format (#260) — useful for the slide deck if the multi-flag form looks busy:

pikoclaw extract maildir/allen-p/ \
  --output allen-kb \
  --output-format json,csv,obsidian,graph \
  --provenance --progress

The graph format additionally auto-runs graph-compute on completion (Scene 4 setup, free of charge).


Scene 3: The Output — Wiki (0:18–0:32)

Screen: Obsidian with the generated wiki open

Voiceover:

"Navigable wiki. Threads, contacts, full-text search — all Obsidian-native."

Action: - Open allen-kb/wiki/index.md in Obsidian - Click a wikilink to a contact: [[Phillip Allen]] - Show the contact page with sent/received counts - Click a thread: [[Thread: Q3 Budget Discussion]] - Show the threaded conversation with participants and timeline

On-screen text overlay:

• Obsidian-native [[wikilinks]]
• Conversation threading
• Contact intelligence
• PII redacted via Presidio


Scene 4: The Network — Graph + Metrics (0:32–0:44)

Screen: Browser with graph.html open, then terminal showing graph_metrics.json

Voiceover:

"Force-directed graph shows who talked to whom. Then we quantify it — offline, no LLM, deterministic."

Action:

# Visualize: interactive D3 graph
pikoclaw viz allen-kb -o allen-kb/graph.html

# Quantify: NEW in v0.6.0 — offline graph metrics (ICD §8.3)
pikoclaw graph-compute --input allen-kb --top-n 10

graph-compute (#285) reads contacts.json, builds the contact network in NetworkX, computes density, clustering coefficient, connected components, and the top-N connectors by degree, then writes allen-kb/graph_metrics.json. The summary prints to stdout:

Graph metrics written to allen-kb/graph_metrics.json
  Nodes: 142
  Edges: 387
  Density: 0.038524
  Components: 4
  Clustering: 0.412908
  Top connector: Phillip Allen <phillip.allen@enron.com> (degree=98)

On-screen text overlay:

• HITS scores
• Louvain communities (colored clusters)
• Knowledge risk metrics (top connectors = succession risk)

The "top connector" line is the money shot for the "Post-Departure Knowledge Recovery" pitch: if Phillip Allen leaves, you can see precisely which 97 other people lost their primary line of communication.


Scene 5: The Attachment Story (0:44–0:52)

Screen: Terminal, extracting a single attachment

Voiceover:

"A budget spreadsheet buried in a 2002 email. Pulled out, OCR'd, indexed — in one call."

Action:

# NEW in v0.6.0 — single-file deep extraction (#242, ICD §7.1)
pikoclaw attachment extract \
  maildir/allen-p/sent/Q3_budget.xlsx \
  --depth deep \
  --provenance \
  --output ./attachments-out

pikoclaw attachment extract (#242) handles a single high-value attachment outside the bulk pipeline:

  • --depth shallow runs MarkItDown (fast, text-only).
  • --depth deep adds OCR (for scanned PDFs / images) and vision-language extraction. Slower, but it's the one to demo because it's the differentiator.
  • --provenance emits a per-attachment provenance.json with the file's SHA-256 and the extraction tool version — same audit-trail contract as the bulk extractor.

Output prints the SHA-256 hash and the path to the extracted Markdown.


Scene 6: Provenance + The Pitch (0:52–1:00)

Screen: Terminal displaying provenance.json, then title card

Voiceover:

"Every extraction is auditable. Source hash, tool version, warnings, every time."

Action:

cat allen-kb/provenance.json | jq

Output:

{
  "source_hash": "a3f5b8c...",
  "tool": "pikoclaw",
  "version": "0.6.0",
  "extracted_at": "2026-05-19T10:30:00Z",
  "statistics": {
    "total_emails": 1729,
    "total_contacts": 142,
    "total_threads": 387
  },
  "warnings": []
}

Closing card:

PikoClaw v0.6.0

github.com/nft2-me/PikoClaw
nft2-me.github.io/PikoClaw

pip install pikoclaw


Technical Details (Not in Video, But Good to Know)

Timing Breakdown

Stage Time Notes
Problem + doctor 8s Show raw data, doctor confirms green
Command execution 10s Real --progress bar, no fake overlay
Wiki navigation 14s Click through 2-3 pages in Obsidian
Graph + metrics 12s D3 graph plus graph-compute summary
Attachment extract 8s attachment extract --depth deep on a single file
Provenance + CTA 8s provenance.json, then closing card

Total: ~60 seconds

Exit-code talking point (ICD §7.3)

If asked "how do you wire this into CI?":

Code Meaning Example
0 SUCCESS All files extracted cleanly
2 INPUT_VALIDATION Bad path, unsupported format, missing dependency
3 PARTIAL_EXTRACTION Some files skipped (e.g. password-protected)
4 FATAL_EXTRACTION All sources failed
5 INTERNAL_ERROR Unhandled exception — please file a bug

A Karkinos job runner (or any shell script) can branch on exit code without parsing stderr.

Post-Production Tips

  1. Real progress is real. With --progress you get a Rich progress bar on stderr — record at native speed. No "Processing..." fake overlay needed.

  2. Smooth transitions: Fade between terminal, Obsidian, and browser windows. Avoid jarring cuts.

  3. Annotations: Use text overlays to highlight key features (wikilinks, graph metrics, SHA-256 hash, exit codes).

  4. No audio needed: This script works as a silent demo with captions, or with a live voiceover during presentation.

  5. Backup plan: Pre-render the output. Have allen-kb/ already generated in a backup directory, plus a recorded run of the full pipeline in case anything goes sideways on stage.


Fallback: Slide Deck (If Demo Fails)

If live demo or video playback fails at the venue:

  1. Slide 1: Problem statement — "Email archives are institutional memory waiting to be unlocked."

  2. Slide 2: Command screenshot — pikoclaw extract mailbox.pst --provenance --progress

  3. Slide 3: Wiki screenshot in Obsidian with wikilinks visible

  4. Slide 4: Graph visualization screenshot and graph_metrics.json summary (density, top connector)

  5. Slide 5: Attachment extraction — pikoclaw attachment extract … --depth deep

  6. Slide 6: Provenance JSON screenshot with SHA-256 hash circled

  7. Slide 7: GitHub repo QR code + pip install command

Total slides: 7 (~8s/slide)


Checklist Before Recording

  • [ ] Fresh Python virtual environment
  • [ ] pikoclaw info shows version 0.6.0 or later
  • [ ] pikoclaw doctor exits 0 (all green)
  • [ ] pikoclaw --interface-version prints expected ICD version
  • [ ] Test data downloaded and extracted (Enron or personal archive)
  • [ ] policies/demo-redaction.json written and tested
  • [ ] Obsidian installed and vault configured
  • [ ] D3 graph loads in browser (pikoclaw viz allen-kb -o allen-kb/graph.html)
  • [ ] graph-compute runs and graph_metrics.json looks right
  • [ ] attachment extract --depth deep runs end-to-end on the chosen file
  • [ ] provenance.json displays correctly with jq
  • [ ] Screen recording software ready (OBS, QuickTime, etc.)
  • [ ] Clear terminal history (clear)
  • [ ] Close unnecessary windows/notifications
  • [ ] Test full workflow once before recording

Recording Settings

Resolution: 1920x1080 (Full HD) Frame rate: 30 fps (60 fps if showing smooth graph interactions) Audio: Optional voiceover or silent with captions Length: 60 seconds ± 5 seconds Format: MP4 (H.264 codec) for maximum compatibility


Alternative: Live Demo at Conference

If presenting live instead of pre-recorded video:

  1. Have backup data: Pre-extracted output in case WiFi fails or extraction is slow.

  2. Practice timing: Rehearse the full workflow to stay under 60 seconds.

  3. Use presenter notes: Keep this script as a reference during the talk.

  4. Show the command first: Type it out for the audience, then run it.

  5. Fallback to slides: If anything breaks, switch to the slide deck (see above).


Post-Demo: Distribute Files

After the conference, share:

  • Demo video: Upload to YouTube with captions
  • Sample output: allen-kb.zip (sanitized Enron extraction, post-Presidio) on GitHub Releases
  • Sample policy: policies/demo-redaction.json next to the release
  • Instructions: Link to this script from the main README

Goal: Anyone should be able to reproduce the demo on their own machine in under 5 minutes.


Questions to Answer in Q&A

Q: How long does extraction take for large archives? A: Enron corpus (~1.3 GB, ~500K messages) takes ~5-10 minutes on a modern laptop. Extraction is I/O-bound, not CPU-bound. With --progress you can watch it work in real time.

Q: Does it work on Windows? A: Yes, but libpff installation for PST support can be tricky. We recommend Docker on Windows. Maildir/MBOX/EML/Slack work natively.

Q: Can I use this for GDPR compliance? A: Yes. --redaction-policy-file runs Microsoft Presidio (PERSON, EMAIL_ADDRESS, PHONE_NUMBER, US_SSN, CREDIT_CARD, IP_ADDRESS, and custom entity recognizers). Provenance metadata provides chain of custody for audits.

Q: Does it send data anywhere? A: No. Zero network dependencies. Air-gapped by default. --telemetry-off is the default — PikoClaw never phones home.

Q: How do I script this in CI? A: Structured exit codes per ICD §7.3: 0 success, 2 input error, 3 partial (some files skipped), 4 fatal, 5 internal bug. Branch on exit code, no stderr parsing required.

Q: Can I extract a single attachment without running the full pipeline? A: Yes — pikoclaw attachment extract <file> --depth deep. Useful for forensic workflows where one specific document matters.

Q: What's the difference between PikoClaw and PicoClaw? A: PicoClaw (Sipeed, 12K+ stars) is a lightweight edge AI agent. PikoClaw is its long-term memory layer — the institutional knowledge store that agents query.


Success Metrics

After the demo:

  • Immediate: GitHub stars spike, docs site traffic increases
  • Short-term: 3+ people run the demo and open issues/PRs
  • Mid-term: Someone builds a PikoClaw + PicoClaw integration (§25 in roadmap)
  • Long-term: Academic citation in a threading/knowledge extraction paper

Good luck.