Output Formats¶
PikoClaw produces two output formats: a Markdown wiki for human reading and a JSON export for programmatic access. Both consume the same ExtractionResult data contract.
Wiki Output¶
Generated by default. Contains navigable Markdown files.
File Structure¶
wiki/
├── index.md # Overview, stats, folder structure, top contacts
├── contacts.md # Contact directory sorted by message count
├── contacts.json # Machine-readable contact graph
├── threads.md # Conversation threads grouped by topic
├── calendar.md # Calendar events timeline
├── network-analysis.md # Graph intelligence (requires networkx)
├── provenance.json # Extraction provenance metadata
└── emails/
├── all.md # All emails, newest first (full detail)
├── inbox.md # Received messages (summary view)
└── sent.md # Sent messages (summary view)
index.md¶
The landing page with:
- Extraction timestamp and PikoClaw version
- Summary statistics table (emails, calendar events, contacts, threads)
- Navigation links to all other pages
- Folder structure from the source archive
- Top 10 contacts by message count
contacts.md¶
Full contact directory sorted by total message count:
| Name | Sent | Received | Total | First Seen | Last Seen | |
|---|---|---|---|---|---|---|
| Alice Smith | alice@example.com | 87 | 55 | 142 | 2019-03-12 | 2024-01-15 |
contacts.json¶
Machine-readable contact graph with nodes and edges:
{
"nodes": [
{
"id": "alice@example.com",
"name": "Alice Smith",
"email": "alice@example.com",
"message_count": 142,
"sent_count": 87,
"received_count": 55,
"first_seen": "2019-03-12T08:30:00",
"last_seen": "2024-01-15T14:30:00",
"domains": ["example.com"],
"hub_score": 0.234,
"authority_score": 0.189,
"pagerank": 0.0312,
"community": 0,
"degree": 28,
"in_degree": 12,
"out_degree": 16
}
],
"edges": [
{
"from": "alice@example.com",
"to": "bob@example.com",
"count": 23
}
]
}
Graph metrics
The hub_score, authority_score, pagerank, community, degree, in_degree, and out_degree fields are only present when networkx is installed. Without it, nodes contain only the communication statistics.
threads.md¶
Conversation threads sorted by message count (busiest threads first):
- Multi-message threads show the first 5 messages with date, sender, and subject
- Single-message threads listed in a compact summary section
network-analysis.md¶
Communication graph intelligence (requires networkx). See Network Analysis for details.
provenance.json¶
Extraction metadata for audit trails. See Provenance & Attestation for details.
JSON Output¶
Generated with --json. Produces extraction.json with the complete extraction.
Schema¶
{
"provenance": {
"tool": "PikoClaw",
"version": "0.5.0",
"extracted_at": "2026-02-23T15:30:00+00:00",
"source_files": ["mailbox.pst"],
"source_hash": "a1b2c3d4...",
"source_format": "pst"
},
"statistics": {
"total_messages": 12847,
"total_emails": 12847,
"total_calendar_events": 234,
"total_contacts": 342,
"total_threads": 4291,
"multi_message_threads": 1847
},
"emails": [
{
"message_id": "<abc123@example.com>",
"subject": "Q4 Budget Review",
"from": "Alice Smith <alice@example.com>",
"from_email": "alice@example.com",
"from_name": "Alice Smith",
"to": ["Bob Jones <bob@example.com>"],
"cc": [],
"date": "2024-01-15T10:30:00",
"folder": "Inbox",
"kind": "email",
"body": "Hi Bob, here are the Q4 numbers...",
"attachments": [
{
"filename": "Q4-budget.xlsx",
"size_bytes": 45230,
"content_type": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
}
],
"thread_subject": "Q4 Budget Review",
"source_format": "pst"
}
],
"contacts": [
{
"email": "alice@example.com",
"name": "Alice Smith",
"message_count": 142,
"sent_count": 87,
"received_count": 55,
"first_seen": "2019-03-12T08:30:00",
"last_seen": "2024-01-15T14:30:00",
"domains": ["example.com"]
}
],
"calendar_events": [
{
"uid": "",
"summary": "Team Standup",
"description": "Daily sync",
"location": "Conference Room B",
"start": "2024-01-15T09:00:00",
"end": "2024-01-15T09:30:00",
"status": ""
}
],
"threads": [
{
"thread_id": "<abc123@example.com>",
"subject": "Q4 Budget Review",
"message_count": 7,
"participants": ["Alice Smith <alice@example.com>", "Bob Jones <bob@example.com>"],
"first_date": "2024-01-10T08:00:00",
"last_date": "2024-01-15T10:30:00",
"message_ids": ["<abc123@example.com>", "<def456@example.com>"]
}
],
"warnings": []
}
Email Body Truncation¶
Email bodies in the JSON export are truncated to 1,000 characters. For full body text, use the wiki output.
Using with LLMs¶
The JSON structure is designed for RAG pipelines and LLM context injection.