Extractor Pattern

Extractors are the core data extraction components in NewPipe Extractor. Each extractor is responsible for fetching and parsing data from a specific type of content (streams, channels, playlists, etc.).

Base Extractor Class

All extractors inherit from the abstract Extractor class:

public abstract class Extractor {
    private final StreamingService service;
    private final LinkHandler linkHandler;
    private final Downloader downloader;
    private boolean pageFetched = false;
    
    protected Extractor(StreamingService service, 
                       LinkHandler linkHandler) {
        this.service = Objects.requireNonNull(service);
        this.linkHandler = Objects.requireNonNull(linkHandler);
        this.downloader = Objects.requireNonNull(NewPipe.getDownloader());
    }
    
    public abstract void onFetchPage(Downloader downloader) 
            throws IOException, ExtractionException;
    
    public abstract String getName() throws ParsingException;
}

Two-Phase Extraction

Extractors follow a two-phase pattern to separate object creation from data fetching:

Phase 1: Initialization

Create the extractor with a LinkHandler:

StreamingService service = NewPipe.getService("YouTube");
StreamExtractor extractor = service.getStreamExtractor(
    "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
);
// Extractor created, but no data fetched yet

Phase 2: Fetching

Call fetchPage() to download and parse data:

extractor.fetchPage();  // Downloads and parses the page
String title = extractor.getName();
long views = extractor.getViewCount();

Always call fetchPage() before accessing extractor data. Calling getter methods before fetchPage() will throw IllegalStateException.

Fetch Page Implementation

The fetchPage() method ensures data is fetched only once:

public void fetchPage() throws IOException, ExtractionException {
    if (pageFetched) {
        return;  // Already fetched, skip
    }
    onFetchPage(downloader);
    pageFetched = true;
}

protected void assertPageFetched() {
    if (!pageFetched) {
        throw new IllegalStateException(
            "Page is not fetched. Make sure you call fetchPage()"
        );
    }
}

protected boolean isPageFetched() {
    return pageFetched;
}

Extractor Types

StreamExtractor
ChannelExtractor
PlaylistExtractor
SearchExtractor
CommentsExtractor

Extracts data from individual video/audio streams.

public abstract class StreamExtractor extends Extractor {
    // Metadata
    public abstract List<Image> getThumbnails() throws ParsingException;
    public Description getDescription() throws ParsingException;
    public long getLength() throws ParsingException;
    public long getViewCount() throws ParsingException;
    public long getLikeCount() throws ParsingException;
    
    // Uploader info
    public abstract String getUploaderName() throws ParsingException;
    public abstract String getUploaderUrl() throws ParsingException;
    public List<Image> getUploaderAvatars() throws ParsingException;
    public boolean isUploaderVerified() throws ParsingException;
    
    // Media streams
    public abstract List<AudioStream> getAudioStreams() 
            throws IOException, ExtractionException;
    public abstract List<VideoStream> getVideoStreams() 
            throws IOException, ExtractionException;
    public abstract List<VideoStream> getVideoOnlyStreams() 
            throws IOException, ExtractionException;
    
    // Additional features
    public abstract StreamType getStreamType() throws ParsingException;
    public List<SubtitlesStream> getSubtitlesDefault() 
            throws IOException, ExtractionException;
    public InfoItemsCollector getRelatedItems() 
            throws IOException, ExtractionException;
}

Extracts data from content creator channels.

public abstract class ChannelExtractor extends Extractor {
    public abstract List<Image> getAvatars() throws ParsingException;
    public abstract List<Image> getBanners() throws ParsingException;
    public String getFeedUrl() throws ParsingException;
    public long getSubscriberCount() throws ParsingException;
    public String getDescription() throws ParsingException;
    public List<ListLinkHandler> getTabs() throws ParsingException;
}

Extracts playlist content and metadata.

public abstract class PlaylistExtractor extends ListExtractor {
    public List<Image> getThumbnails() throws ParsingException;
    public String getUploaderUrl() throws ParsingException;
    public String getUploaderName() throws ParsingException;
    public List<Image> getUploaderAvatars() throws ParsingException;
    public long getStreamCount() throws ParsingException;
}

Extracts search results with filtering support.

public abstract class SearchExtractor extends ListExtractor {
    protected final String searchString;
    
    public String getSearchString() {
        return searchString;
    }
    
    public boolean isCorrectedSearch() throws ParsingException;
    public List<MetaInfo> getMetaInfo() throws ParsingException;
}

Extracts comment threads and replies.

public abstract class CommentsExtractor extends ListExtractor {
    public boolean isCommentsDisabled() throws ParsingException;
}

Common Extractor Methods

All extractors inherit these methods from the base Extractor class:

// Identity
public String getId() throws ParsingException;
public abstract String getName() throws ParsingException;

// URLs
public String getOriginalUrl() throws ParsingException;
public String getUrl() throws ParsingException;
public String getBaseUrl() throws ParsingException;

// Service reference
public StreamingService getService();
public int getServiceId();

// Access to downloader
public Downloader getDownloader();

// Link handler
public LinkHandler getLinkHandler();

StreamExtractor Deep Dive

StreamExtractor is the most complex extractor, handling video and audio streams:

Basic Metadata

// Required methods
public abstract List<Image> getThumbnails() throws ParsingException;
public abstract String getUploaderUrl() throws ParsingException;
public abstract String getUploaderName() throws ParsingException;
public abstract StreamType getStreamType() throws ParsingException;

// Optional methods with default implementations
public Description getDescription() throws ParsingException {
    return Description.EMPTY_DESCRIPTION;
}

public long getLength() throws ParsingException {
    return 0;  // 0 for livestreams
}

public long getViewCount() throws ParsingException {
    return -1;  // -1 if not available
}

public long getLikeCount() throws ParsingException {
    return -1;
}

Upload Date

// Textual date from service
public String getTextualUploadDate() throws ParsingException {
    return null;
}

// Parsed date object
public DateWrapper getUploadDate() throws ParsingException {
    return null;
}

For live streams, both upload date methods should return null.

Media Stream URLs

// Audio streams (no video)
public abstract List<AudioStream> getAudioStreams() 
        throws IOException, ExtractionException;

// Video streams with audio
public abstract List<VideoStream> getVideoStreams() 
        throws IOException, ExtractionException;

// Video streams without audio (requires separate audio)
public abstract List<VideoStream> getVideoOnlyStreams() 
        throws IOException, ExtractionException;

// DASH manifest URL
public String getDashMpdUrl() throws ParsingException {
    return "";
}

// HLS playlist URL
public String getHlsUrl() throws ParsingException {
    return "";
}

You must return at least one of: audio streams, video streams, video-only streams, DASH URL, or HLS URL. Otherwise, extraction is considered failed.

Subtitles

// Default subtitles
public List<SubtitlesStream> getSubtitlesDefault() 
        throws IOException, ExtractionException {
    return Collections.emptyList();
}

// Subtitles filtered by format
public List<SubtitlesStream> getSubtitles(MediaFormat format) 
        throws IOException, ExtractionException {
    return Collections.emptyList();
}

Additional Metadata

// Age restriction
public int getAgeLimit() throws ParsingException {
    return NO_AGE_LIMIT;  // 0 = no restriction
}

// Privacy setting
public Privacy getPrivacy() throws ParsingException {
    return Privacy.PUBLIC;
}

// Category
public String getCategory() throws ParsingException {
    return "";
}

// Tags
public List<String> getTags() throws ParsingException {
    return Collections.emptyList();
}

// License
public String getLicence() throws ParsingException {
    return "";
}

// Language
public Locale getLanguageInfo() throws ParsingException {
    return null;
}

public InfoItemsCollector<? extends InfoItem, ? extends InfoItemExtractor>
getRelatedItems() throws IOException, ExtractionException {
    return null;  // null if not available
}

Timeline Features

// Timestamp in URL (e.g., ?t=120 for 2 minutes)
public long getTimeStamp() throws ParsingException {
    return 0;
}

// Chapter segments
public List<StreamSegment> getStreamSegments() throws ParsingException {
    return Collections.emptyList();
}

// Preview frames/thumbnails
public List<Frameset> getFrames() throws ExtractionException {
    return Collections.emptyList();
}

Meta Information

// Additional context (e.g., COVID-19 info, fact-checks)
public List<MetaInfo> getMetaInfo() throws ParsingException {
    return Collections.emptyList();
}

// Short-form content detection (YouTube Shorts, TikTok)
public boolean isShortFormContent() throws ParsingException {
    return false;
}

Localization in Extractors

Extractors support per-instance localization:

// Force specific localization
public void forceLocalization(Localization localization);
public void forceContentCountry(ContentCountry contentCountry);

// Get active localization
public Localization getExtractorLocalization() {
    return forcedLocalization == null 
        ? getService().getLocalization() 
        : forcedLocalization;
}

public ContentCountry getExtractorContentCountry() {
    return forcedContentCountry == null 
        ? getService().getContentCountry() 
        : forcedContentCountry;
}

// Get time ago parser for parsing relative dates
public TimeAgoParser getTimeAgoParser() {
    return getService().getTimeAgoParser(getExtractorLocalization());
}

Usage Example

StreamExtractor extractor = service.getStreamExtractor(url);

// Force German localization
extractor.forceLocalization(new Localization("de", "DE"));
extractor.forceContentCountry(new ContentCountry("DE"));

extractor.fetchPage();
String title = extractor.getName();  // Title in German if available

Implementation Example

Here’s a simplified extractor implementation:

public class YoutubeStreamExtractor extends StreamExtractor {
    private JsonObject playerResponse;
    private JsonObject videoDetails;
    
    public YoutubeStreamExtractor(StreamingService service, 
                                 LinkHandler linkHandler) {
        super(service, linkHandler);
    }
    
    @Override
    public void onFetchPage(Downloader downloader) 
            throws IOException, ExtractionException {
        // Download page
        String pageContent = downloader.get(getUrl()).responseBody();
        
        // Parse JSON data
        playerResponse = JsonParser.object()
            .from(extractPlayerResponse(pageContent));
        videoDetails = playerResponse.getObject("videoDetails");
    }
    
    @Override
    public String getName() throws ParsingException {
        assertPageFetched();
        return videoDetails.getString("title");
    }
    
    @Override
    public long getViewCount() throws ParsingException {
        assertPageFetched();
        return Long.parseLong(videoDetails.getString("viewCount"));
    }
    
    @Override
    public List<AudioStream> getAudioStreams() 
            throws IOException, ExtractionException {
        assertPageFetched();
        List<AudioStream> audioStreams = new ArrayList<>();
        
        JsonArray formats = playerResponse
            .getObject("streamingData")
            .getArray("adaptiveFormats");
            
        for (Object format : formats) {
            JsonObject f = (JsonObject) format;
            if (f.has("mimeType") && 
                f.getString("mimeType").startsWith("audio")) {
                audioStreams.add(new AudioStream(
                    f.getString("url"),
                    MediaFormat.fromMimeType(f.getString("mimeType")),
                    f.getInt("bitrate")
                ));
            }
        }
        
        return audioStreams;
    }
    
    // ... implement other required methods
}

Error Handling

Return Defaults for Optional Data

Many methods have default implementations that return empty/null values:

public String getCategory() throws ParsingException {
    return "";  // Empty string if not available
}

public long getLikeCount() throws ParsingException {
    return -1;  // -1 if not available
}

Throw ParsingException for Required Data

If required data cannot be extracted, throw ParsingException:

public String getName() throws ParsingException {
    assertPageFetched();
    String title = videoDetails.getString("title");
    if (title == null || title.isEmpty()) {
        throw new ParsingException("Could not extract title");
    }
    return title;
}

Handle Content Availability

Check if content is available:

public String getErrorMessage() {
    // Parse error from page if video is unavailable
    return null;
}

public ContentAvailability getContentAvailability() 
        throws ParsingException {
    if (isPrivate) return ContentAvailability.PRIVATE;
    if (isDeleted) return ContentAvailability.REMOVED;
    return ContentAvailability.AVAILABLE;
}

Best Practices

Always call assertPageFetched() in getter methods to ensure data is available
Return default values instead of throwing exceptions for optional data
Cache parsed data to avoid re-parsing on multiple getter calls
Use meaningful exception messages to help with debugging
Handle platform-specific edge cases (e.g., YouTube Shorts, live streams)
Test with various content types (public, private, deleted, geo-blocked)

Services

Understand StreamingService architecture

Link Handlers

Learn about URL parsing

Localization

Configure language preferences

Getting Started

Core Concepts

Guides

Supported Services

Base Extractor Class

Two-Phase Extraction

Phase 1: Initialization

Phase 2: Fetching

Fetch Page Implementation

Extractor Types

Common Extractor Methods

StreamExtractor Deep Dive

Basic Metadata

Upload Date

Media Stream URLs

Subtitles

Additional Metadata

Timeline Features

Meta Information

Localization in Extractors

Usage Example

Implementation Example

Error Handling

Best Practices

Services

Link Handlers

Localization

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Guides

Supported Services

​Base Extractor Class

​Two-Phase Extraction

​Phase 1: Initialization

​Phase 2: Fetching

​Fetch Page Implementation

​Extractor Types

​Common Extractor Methods

​StreamExtractor Deep Dive

​Basic Metadata

​Upload Date

​Media Stream URLs

​Subtitles

​Additional Metadata

​Related Content

​Timeline Features

​Meta Information

​Localization in Extractors

​Usage Example

​Implementation Example

​Error Handling

​Best Practices

​Related Documentation

Services

Link Handlers

Localization

Build docs developers (and LLMs) love

Base Extractor Class

Two-Phase Extraction

Phase 1: Initialization

Phase 2: Fetching

Fetch Page Implementation

Extractor Types

Common Extractor Methods

StreamExtractor Deep Dive

Basic Metadata

Upload Date

Media Stream URLs

Subtitles

Additional Metadata

Related Content

Timeline Features

Meta Information

Localization in Extractors

Usage Example

Implementation Example

Error Handling

Best Practices

Related Documentation