Skip to content

Output Formats

PikoClaw produces two output formats: a Markdown wiki for human reading and a JSON export for programmatic access. Both consume the same ExtractionResult data contract.

Wiki Output

Generated by default. Contains navigable Markdown files.

File Structure

wiki/
├── index.md              # Overview, stats, folder structure, top contacts
├── contacts.md           # Contact directory sorted by message count
├── contacts.json         # Machine-readable contact graph
├── threads.md            # Conversation threads grouped by topic
├── calendar.md           # Calendar events timeline
├── network-analysis.md   # Graph intelligence (requires networkx)
├── provenance.json       # Extraction provenance metadata
└── emails/
    ├── all.md            # All emails, newest first (full detail)
    ├── inbox.md          # Received messages (summary view)
    └── sent.md           # Sent messages (summary view)

index.md

The landing page with:

  • Extraction timestamp and PikoClaw version
  • Summary statistics table (emails, calendar events, contacts, threads)
  • Navigation links to all other pages
  • Folder structure from the source archive
  • Top 10 contacts by message count

contacts.md

Full contact directory sorted by total message count:

Name Email Sent Received Total First Seen Last Seen
Alice Smith alice@example.com 87 55 142 2019-03-12 2024-01-15

contacts.json

Machine-readable contact graph with nodes and edges:

{
  "nodes": [
    {
      "id": "alice@example.com",
      "name": "Alice Smith",
      "email": "alice@example.com",
      "message_count": 142,
      "sent_count": 87,
      "received_count": 55,
      "first_seen": "2019-03-12T08:30:00",
      "last_seen": "2024-01-15T14:30:00",
      "domains": ["example.com"],
      "hub_score": 0.234,
      "authority_score": 0.189,
      "pagerank": 0.0312,
      "community": 0,
      "degree": 28,
      "in_degree": 12,
      "out_degree": 16
    }
  ],
  "edges": [
    {
      "from": "alice@example.com",
      "to": "bob@example.com",
      "count": 23
    }
  ]
}

Graph metrics

The hub_score, authority_score, pagerank, community, degree, in_degree, and out_degree fields are only present when networkx is installed. Without it, nodes contain only the communication statistics.

threads.md

Conversation threads sorted by message count (busiest threads first):

  • Multi-message threads show the first 5 messages with date, sender, and subject
  • Single-message threads listed in a compact summary section

network-analysis.md

Communication graph intelligence (requires networkx). See Network Analysis for details.

provenance.json

Extraction metadata for audit trails. See Provenance & Attestation for details.


JSON Output

Generated with --json. Produces extraction.json with the complete extraction.

Schema

{
  "provenance": {
    "tool": "PikoClaw",
    "version": "0.5.0",
    "extracted_at": "2026-02-23T15:30:00+00:00",
    "source_files": ["mailbox.pst"],
    "source_hash": "a1b2c3d4...",
    "source_format": "pst"
  },
  "statistics": {
    "total_messages": 12847,
    "total_emails": 12847,
    "total_calendar_events": 234,
    "total_contacts": 342,
    "total_threads": 4291,
    "multi_message_threads": 1847
  },
  "emails": [
    {
      "message_id": "<abc123@example.com>",
      "subject": "Q4 Budget Review",
      "from": "Alice Smith <alice@example.com>",
      "from_email": "alice@example.com",
      "from_name": "Alice Smith",
      "to": ["Bob Jones <bob@example.com>"],
      "cc": [],
      "date": "2024-01-15T10:30:00",
      "folder": "Inbox",
      "kind": "email",
      "body": "Hi Bob, here are the Q4 numbers...",
      "attachments": [
        {
          "filename": "Q4-budget.xlsx",
          "size_bytes": 45230,
          "content_type": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
        }
      ],
      "thread_subject": "Q4 Budget Review",
      "source_format": "pst"
    }
  ],
  "contacts": [
    {
      "email": "alice@example.com",
      "name": "Alice Smith",
      "message_count": 142,
      "sent_count": 87,
      "received_count": 55,
      "first_seen": "2019-03-12T08:30:00",
      "last_seen": "2024-01-15T14:30:00",
      "domains": ["example.com"]
    }
  ],
  "calendar_events": [
    {
      "uid": "",
      "summary": "Team Standup",
      "description": "Daily sync",
      "location": "Conference Room B",
      "start": "2024-01-15T09:00:00",
      "end": "2024-01-15T09:30:00",
      "status": ""
    }
  ],
  "threads": [
    {
      "thread_id": "<abc123@example.com>",
      "subject": "Q4 Budget Review",
      "message_count": 7,
      "participants": ["Alice Smith <alice@example.com>", "Bob Jones <bob@example.com>"],
      "first_date": "2024-01-10T08:00:00",
      "last_date": "2024-01-15T10:30:00",
      "message_ids": ["<abc123@example.com>", "<def456@example.com>"]
    }
  ],
  "warnings": []
}

Email Body Truncation

Email bodies in the JSON export are truncated to 1,000 characters. For full body text, use the wiki output.

Using with LLMs

pikoclaw extract mailbox.pst --json --no-wiki
cat pikoclaw-output/extraction.json | your-llm-tool

The JSON structure is designed for RAG pipelines and LLM context injection.