Data Model¶
PikoClaw's data model lives in models.py. All fields are aligned to Internet standards for maximum interoperability.
Standards Alignment¶
| PikoClaw Type | Standard | Reference |
|---|---|---|
Message |
Internet Message Format | RFC 5322 |
EmailAddress |
addr-spec / name-addr | RFC 5322 ss3.4 |
Thread |
JMAP Thread object | RFC 8621 ss3 |
Contact |
vCard 4.0 | RFC 6350 |
CalendarEvent |
iCalendar VEVENT | RFC 5545 |
Attachment |
MIME body part | RFC 2045/2046 |
Core Types¶
Message¶
The universal adapter output type. Every adapter produces list[Message].
@dataclass
class Message:
# Identity (RFC 5322 ss3.6.4)
message_id: str # Message-ID header
in_reply_to: str # In-Reply-To header (parent message)
references: list[str] # References header chain
# Envelope (RFC 5322 ss3.6)
subject: str
from_address: EmailAddress
to: list[EmailAddress]
cc: list[EmailAddress]
bcc: list[EmailAddress]
# Timestamps (ISO 8601)
date: str # RFC 5322 Date header
received_at: str # Delivery time
created_at: str # Creation time (MAPI)
modified_at: str # Last modification time
# Content
body_plain: str # Plain text body
body_html: str # HTML body
attachments: list[Attachment]
# Classification
kind: MessageKind # EMAIL, CALENDAR, CONTACT, TASK, NOTE, UNKNOWN
folder: str # Source folder path
# Threading
thread_subject: str # Normalized subject (Re:/Fwd: stripped)
# Source metadata
source_format: str # "pst", "mbox", "maildir"
headers: dict[str, str] # Raw RFC 5322 headers
extra: dict[str, str] # Adapter-specific fields (e.g., Enron X-headers)
Key properties:
all_recipients-- Combined To + Cc + Bccall_participants-- From + all recipients (filtered to non-empty)best_date-- Most reliable timestamp (date > received_at > created_at > modified_at)
EmailAddress¶
Frozen (immutable) value object for email addresses.
@dataclass(frozen=True)
class EmailAddress:
email: str # addr-spec (RFC 5322 ss3.4)
name: str # display name
str(addr)returns"Name <email>"or just"email"or"Name"addr.displayreturns the best available human-readable identifier
Contact¶
Aggregated from message analysis (not extracted directly from source).
@dataclass
class Contact:
email: str
name: str
message_count: int # Total messages involving this contact
sent_count: int # Messages FROM this contact
received_count: int # Messages TO this contact
first_seen: str # Earliest message date
last_seen: str # Latest message date
domains: list[str] # All email domains observed
Thread¶
Conversation thread grouping related messages (JMAP Thread model).
@dataclass
class Thread:
thread_id: str # Root Message-ID or generated ID
subject: str # Normalized thread subject
messages: list[Message] # Ordered by date
participants: list[EmailAddress]
first_date: str
last_date: str
message_count: int # Property (len of messages)
CalendarEvent¶
Calendar events from iCalendar data or MAPI appointment items.
@dataclass
class CalendarEvent:
uid: str # RFC 5545 UID
summary: str # Event title
description: str
location: str
start: str # ISO 8601
end: str # ISO 8601
organizer: EmailAddress
attendees: list[EmailAddress]
status: str # CONFIRMED, TENTATIVE, CANCELLED
ExtractionResult¶
The complete output container consumed by all output generators.
@dataclass
class ExtractionResult:
messages: list[Message]
calendar_events: list[CalendarEvent]
contacts: list[Contact] # Aggregated by pipeline
threads: list[Thread] # Grouped by pipeline
source_files: list[str]
source_format: str
extracted_at: str # ISO 8601
# Provenance
warnings: list[str]
source_hash: str # SHA-256 of source files
tool_version: str # PikoClaw version
stats: dict[str, int] # Property (computed)
CRM Portability¶
The data model is designed for CRM export:
| PikoClaw Field | Salesforce | HubSpot | MS Dynamics |
|---|---|---|---|
EmailAddress.email |
Contact.Email | emailaddress1 | |
EmailAddress.name |
Contact.Name | firstname + lastname | fullname |
Contact.organization |
Account.Name | company | parentcustomerid |
Message.message_id |
EmailMessage.MessageIdentifier | (custom) | (custom) |
Message.subject |
EmailMessage.Subject | hs_email_subject | subject |