Understanding the Extractor pattern for data extraction from streaming services
Extractors are the core data extraction components in NewPipe Extractor. Each extractor is responsible for fetching and parsing data from a specific type of content (streams, channels, playlists, etc.).
StreamingService service = NewPipe.getService("YouTube");StreamExtractor extractor = service.getStreamExtractor( "https://www.youtube.com/watch?v=dQw4w9WgXcQ");// Extractor created, but no data fetched yet
Extracts data from individual video/audio streams.
public abstract class StreamExtractor extends Extractor { // Metadata public abstract List<Image> getThumbnails() throws ParsingException; public Description getDescription() throws ParsingException; public long getLength() throws ParsingException; public long getViewCount() throws ParsingException; public long getLikeCount() throws ParsingException; // Uploader info public abstract String getUploaderName() throws ParsingException; public abstract String getUploaderUrl() throws ParsingException; public List<Image> getUploaderAvatars() throws ParsingException; public boolean isUploaderVerified() throws ParsingException; // Media streams public abstract List<AudioStream> getAudioStreams() throws IOException, ExtractionException; public abstract List<VideoStream> getVideoStreams() throws IOException, ExtractionException; public abstract List<VideoStream> getVideoOnlyStreams() throws IOException, ExtractionException; // Additional features public abstract StreamType getStreamType() throws ParsingException; public List<SubtitlesStream> getSubtitlesDefault() throws IOException, ExtractionException; public InfoItemsCollector getRelatedItems() throws IOException, ExtractionException;}
Extracts data from content creator channels.
public abstract class ChannelExtractor extends Extractor { public abstract List<Image> getAvatars() throws ParsingException; public abstract List<Image> getBanners() throws ParsingException; public String getFeedUrl() throws ParsingException; public long getSubscriberCount() throws ParsingException; public String getDescription() throws ParsingException; public List<ListLinkHandler> getTabs() throws ParsingException;}
Extracts playlist content and metadata.
public abstract class PlaylistExtractor extends ListExtractor { public List<Image> getThumbnails() throws ParsingException; public String getUploaderUrl() throws ParsingException; public String getUploaderName() throws ParsingException; public List<Image> getUploaderAvatars() throws ParsingException; public long getStreamCount() throws ParsingException;}
Extracts search results with filtering support.
public abstract class SearchExtractor extends ListExtractor { protected final String searchString; public String getSearchString() { return searchString; } public boolean isCorrectedSearch() throws ParsingException; public List<MetaInfo> getMetaInfo() throws ParsingException;}
Extracts comment threads and replies.
public abstract class CommentsExtractor extends ListExtractor { public boolean isCommentsDisabled() throws ParsingException;}
StreamExtractor extractor = service.getStreamExtractor(url);// Force German localizationextractor.forceLocalization(new Localization("de", "DE"));extractor.forceContentCountry(new ContentCountry("DE"));extractor.fetchPage();String title = extractor.getName(); // Title in German if available
Many methods have default implementations that return empty/null values:
public String getCategory() throws ParsingException { return ""; // Empty string if not available}public long getLikeCount() throws ParsingException { return -1; // -1 if not available}
Throw ParsingException for Required Data
If required data cannot be extracted, throw ParsingException:
public String getName() throws ParsingException { assertPageFetched(); String title = videoDetails.getString("title"); if (title == null || title.isEmpty()) { throw new ParsingException("Could not extract title"); } return title;}
Handle Content Availability
Check if content is available:
public String getErrorMessage() { // Parse error from page if video is unavailable return null;}public ContentAvailability getContentAvailability() throws ParsingException { if (isPrivate) return ContentAvailability.PRIVATE; if (isDeleted) return ContentAvailability.REMOVED; return ContentAvailability.AVAILABLE;}