Skip to content

Data Model

PikoClaw's data model lives in models.py. All fields are aligned to Internet standards for maximum interoperability.

Standards Alignment

PikoClaw Type Standard Reference
Message Internet Message Format RFC 5322
EmailAddress addr-spec / name-addr RFC 5322 ss3.4
Thread JMAP Thread object RFC 8621 ss3
Contact vCard 4.0 RFC 6350
CalendarEvent iCalendar VEVENT RFC 5545
Attachment MIME body part RFC 2045/2046

Core Types

Message

The universal adapter output type. Every adapter produces list[Message].

@dataclass
class Message:
    # Identity (RFC 5322 ss3.6.4)
    message_id: str          # Message-ID header
    in_reply_to: str         # In-Reply-To header (parent message)
    references: list[str]    # References header chain

    # Envelope (RFC 5322 ss3.6)
    subject: str
    from_address: EmailAddress
    to: list[EmailAddress]
    cc: list[EmailAddress]
    bcc: list[EmailAddress]

    # Timestamps (ISO 8601)
    date: str                # RFC 5322 Date header
    received_at: str         # Delivery time
    created_at: str          # Creation time (MAPI)
    modified_at: str         # Last modification time

    # Content
    body_plain: str          # Plain text body
    body_html: str           # HTML body
    attachments: list[Attachment]

    # Classification
    kind: MessageKind        # EMAIL, CALENDAR, CONTACT, TASK, NOTE, UNKNOWN
    folder: str              # Source folder path

    # Threading
    thread_subject: str      # Normalized subject (Re:/Fwd: stripped)

    # Source metadata
    source_format: str       # "pst", "mbox", "maildir"
    headers: dict[str, str]  # Raw RFC 5322 headers
    extra: dict[str, str]    # Adapter-specific fields (e.g., Enron X-headers)

Key properties:

  • all_recipients -- Combined To + Cc + Bcc
  • all_participants -- From + all recipients (filtered to non-empty)
  • best_date -- Most reliable timestamp (date > received_at > created_at > modified_at)

EmailAddress

Frozen (immutable) value object for email addresses.

@dataclass(frozen=True)
class EmailAddress:
    email: str   # addr-spec (RFC 5322 ss3.4)
    name: str    # display name
  • str(addr) returns "Name <email>" or just "email" or "Name"
  • addr.display returns the best available human-readable identifier

Contact

Aggregated from message analysis (not extracted directly from source).

@dataclass
class Contact:
    email: str
    name: str
    message_count: int       # Total messages involving this contact
    sent_count: int          # Messages FROM this contact
    received_count: int      # Messages TO this contact
    first_seen: str          # Earliest message date
    last_seen: str           # Latest message date
    domains: list[str]       # All email domains observed

Thread

Conversation thread grouping related messages (JMAP Thread model).

@dataclass
class Thread:
    thread_id: str           # Root Message-ID or generated ID
    subject: str             # Normalized thread subject
    messages: list[Message]  # Ordered by date
    participants: list[EmailAddress]
    first_date: str
    last_date: str
    message_count: int       # Property (len of messages)

CalendarEvent

Calendar events from iCalendar data or MAPI appointment items.

@dataclass
class CalendarEvent:
    uid: str                 # RFC 5545 UID
    summary: str             # Event title
    description: str
    location: str
    start: str               # ISO 8601
    end: str                 # ISO 8601
    organizer: EmailAddress
    attendees: list[EmailAddress]
    status: str              # CONFIRMED, TENTATIVE, CANCELLED

ExtractionResult

The complete output container consumed by all output generators.

@dataclass
class ExtractionResult:
    messages: list[Message]
    calendar_events: list[CalendarEvent]
    contacts: list[Contact]              # Aggregated by pipeline
    threads: list[Thread]                # Grouped by pipeline

    source_files: list[str]
    source_format: str
    extracted_at: str                    # ISO 8601

    # Provenance
    warnings: list[str]
    source_hash: str                     # SHA-256 of source files
    tool_version: str                    # PikoClaw version

    stats: dict[str, int]                # Property (computed)

CRM Portability

The data model is designed for CRM export:

PikoClaw Field Salesforce HubSpot MS Dynamics
EmailAddress.email Contact.Email email emailaddress1
EmailAddress.name Contact.Name firstname + lastname fullname
Contact.organization Account.Name company parentcustomerid
Message.message_id EmailMessage.MessageIdentifier (custom) (custom)
Message.subject EmailMessage.Subject hs_email_subject subject