Skip to main content

Overview

IPED uses a comprehensive parser framework built on top of Apache Tika to extract metadata and content from digital evidence. The parser architecture enables processing of diverse file formats, application artifacts, and embedded data structures commonly encountered in forensic investigations.

Apache Tika Integration

IPED extends Apache Tika’s parser framework to provide specialized forensic capabilities:

Standard Tika Parsers

Office documents, PDFs, images, videos, and common file formats

Custom IPED Parsers

Chat applications, P2P clients, browser artifacts, and mobile data

SQLite Processing

Automated extraction from SQLite databases with schema detection

Embedded Document Extraction

Recursive parsing of containers and embedded artifacts

Parser Categories

IPED implements specialized parsers across multiple forensic domains:

Communication Artifacts

  • Chat Applications: WhatsApp, Telegram, Skype, Discord, Threema
  • Email Clients: PST, MBOX, DBX, MSG formats
  • Social Media: Facebook, Instagram, TikTok artifacts

Browser Forensics

  • Chrome/Chromium: History, downloads, searches, cache
  • Firefox: Places database, bookmarks, session data
  • Safari: SQLite and plist-based artifacts
  • Edge/IE: WebCacheV01.dat, index.dat files

P2P Applications

  • BitTorrent: Torrent files, resume.dat structures
  • eMule: known.met, part.met sharing records
  • Shareaza: Library.dat file catalogs
  • Ares: ShareH.dat and ShareL.dat databases

Mobile Artifacts

  • UFDR Format: Cellebrite extraction support
  • AD1 Format: AccessData image processing
  • iOS: Plist files, SQLite databases
  • Android: XML preferences, SQLite databases

Parser Architecture

Base Classes

public abstract class AbstractParser implements Parser {
    public abstract Set<MediaType> getSupportedTypes(ParseContext context);
    public abstract void parse(InputStream stream, ContentHandler handler, 
                              Metadata metadata, ParseContext context);
}

Key Components

MediaType
string
Identifies file types for parser selection (e.g., application/x-whatsapp-db)
ParseContext
object
Provides access to searcher, item metadata, and configuration
ContentHandler
SAX handler
Receives parsed content as HTML or structured events
Metadata
key-value store
Stores extracted properties using standardized property names
EmbeddedDocumentExtractor
interface
Handles recursive extraction of embedded artifacts

Metadata Extraction

IPED parsers extract standardized metadata properties:

Communication Properties

ExtraProperties.USER_ACCOUNT
ExtraProperties.USER_NAME
ExtraProperties.USER_PHONE
ExtraProperties.MESSAGE_DATE
ExtraProperties.MESSAGE_BODY
ExtraProperties.PARTICIPANTS
ExtraProperties.LINKED_ITEMS

P2P Properties

ExtraProperties.P2P_META_PREFIX + "torrentInfoHash"
ExtraProperties.SHARED_HASHES
ExtraProperties.P2P_REGISTRY_COUNT
ExtraProperties.CSAM_HASH_HITS

Browser Properties

ExtraProperties.URL
ExtraProperties.VISIT_DATE
ExtraProperties.DOWNLOAD_DATE
ExtraProperties.LOCAL_PATH

Parser Configuration

Parsers support configuration through Tika annotations:
@Field
public void setExtractMessages(boolean extractMessages) {
    this.extractMessages = extractMessages;
}

@Field
public void setMergeBackups(boolean mergeBackups) {
    this.mergeBackups = mergeBackups;
}

@Field
public void setMinChatSplitSize(int minChatSplitSize) {
    this.minChatSplitSize = minChatSplitSize;
}

HTML Report Generation

Most parsers generate HTML reports for visualization:
XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, metadata);
xhtml.startDocument();
xhtml.startElement("table");
// Generate structured HTML output
xhtml.endElement("table");
xhtml.endDocument();
HTML reports include CSS styling, hyperlinks to related items, and structured data presentation for analysis interface integration.

Item Linking

Parsers can link artifacts to case items using various strategies:

Hash-Based Linking

if (Util.isValidHash(m.getMediaHash())) {
    metadata.add(ExtraProperties.LINKED_ITEMS, 
                BasicProps.HASH + ":" + m.getMediaHash());
    if (m.isFromMe())
        metadata.add(ExtraProperties.SHARED_HASHES, m.getMediaHash());
}

Query-Based Linking

String query = BasicProps.LENGTH + ":" + fileSize + 
               " && " + BasicProps.NAME + ":\"" + fileName + "\"";
metadata.add(ExtraProperties.LINKED_ITEMS, query);

Parser Examples

See the following pages for detailed parser documentation:

Chat Applications

WhatsApp, Telegram, Skype parsers

Web Browsers

Chrome, Firefox, Safari, Edge parsers

P2P Applications

BitTorrent, eMule, Shareaza, Ares parsers

Mobile Artifacts

UFDR and mobile data parsers

Performance Considerations

Large chat databases with merge operations can be memory-intensive. Configure minChatSplitSize to control report fragmentation.

Optimization Features

  • Deleted Record Recovery: Optional scanning of SQLite free pages
  • Backup Merging: Combines multiple backup databases
  • Batch Processing: Searches performed in batches to reduce overhead
  • Parallel Processing: Multi-threaded parsing where applicable

Next Steps

1

Explore Chat Parsers

2

Browser Artifacts

3

Build docs developers (and LLMs) love