Core Concepts
Link Handlers serve three primary purposes:- URL Validation: Check if a URL belongs to a specific service and content type
- ID Extraction: Extract platform-specific identifiers from URLs
- URL Generation: Construct canonical URLs from IDs
LinkHandler Class
TheLinkHandler class is a simple data container:
public class LinkHandler implements Serializable {
protected final String originalUrl; // URL as provided by user
protected final String url; // Canonical URL
protected final String id; // Extracted ID
public LinkHandler(String originalUrl, String url, String id) {
this.originalUrl = originalUrl;
this.url = url;
this.id = id;
}
public String getOriginalUrl() {
return originalUrl;
}
public String getUrl() {
return url;
}
public String getId() {
return id;
}
public String getBaseUrl() throws ParsingException {
return Utils.getBaseUrl(url);
}
}
Properties Explained
- Original URL
- Canonical URL
- ID
- Base URL
The URL exactly as provided by the user:Includes all query parameters and fragments.
"https://www.youtube.com/watch?v=dQw4w9WgXcQ&t=30s"
The cleaned, standardized URL:Used for equality checks and deduplication.
"https://www.youtube.com/watch?v=dQw4w9WgXcQ"
The platform-specific identifier:Used to construct API requests and URLs.
"dQw4w9WgXcQ"
The domain portion of the URL:Useful for federated services (e.g., PeerTube instances).
"https://www.youtube.com"
LinkHandlerFactory
The factory pattern is used to createLinkHandler instances:
public abstract class LinkHandlerFactory {
// Abstract methods to implement
public abstract String getId(String url)
throws ParsingException;
public abstract String getUrl(String id)
throws ParsingException;
public abstract boolean onAcceptUrl(String url)
throws ParsingException;
// Optional: URL generation with base URL
public String getUrl(String id, String baseUrl)
throws ParsingException {
return getUrl(id);
}
// Factory methods
public LinkHandler fromUrl(String url) throws ParsingException;
public LinkHandler fromId(String id) throws ParsingException;
public boolean acceptUrl(String url) throws ParsingException;
}
Factory Methods
Creating from URL
public LinkHandler fromUrl(String url) throws ParsingException {
if (Utils.isNullOrEmpty(url)) {
throw new IllegalArgumentException("The url is null or empty");
}
// Follow Google redirects
String polishedUrl = Utils.followGoogleRedirectIfNeeded(url);
String baseUrl = Utils.getBaseUrl(polishedUrl);
return fromUrl(polishedUrl, baseUrl);
}
public LinkHandler fromUrl(String url, String baseUrl)
throws ParsingException {
Objects.requireNonNull(url, "URL cannot be null");
if (!acceptUrl(url)) {
throw new ParsingException("URL not accepted: " + url);
}
String id = getId(url);
return new LinkHandler(url, getUrl(id, baseUrl), id);
}
Creating from ID
public LinkHandler fromId(String id) throws ParsingException {
Objects.requireNonNull(id, "ID cannot be null");
String url = getUrl(id);
return new LinkHandler(url, url, id);
}
public LinkHandler fromId(String id, String baseUrl)
throws ParsingException {
Objects.requireNonNull(id, "ID cannot be null");
String url = getUrl(id, baseUrl);
return new LinkHandler(url, url, id);
}
URL Validation
public boolean acceptUrl(String url) throws ParsingException {
return onAcceptUrl(url);
}
Implementation Example
Here’s how YouTube implements a stream link handler factory:public class YoutubeStreamLinkHandlerFactory extends LinkHandlerFactory {
private static final YoutubeStreamLinkHandlerFactory INSTANCE =
new YoutubeStreamLinkHandlerFactory();
public static YoutubeStreamLinkHandlerFactory getInstance() {
return INSTANCE;
}
@Override
public String getId(String url) throws ParsingException {
// Handle various YouTube URL formats
if (url.contains("youtube.com/watch")) {
return Parser.matchGroup1("[?&]v=([a-zA-Z0-9_-]{11})", url);
} else if (url.contains("youtu.be/")) {
return Parser.matchGroup1("youtu\\.be/([a-zA-Z0-9_-]{11})", url);
} else if (url.contains("youtube.com/embed/")) {
return Parser.matchGroup1("embed/([a-zA-Z0-9_-]{11})", url);
}
throw new ParsingException("Could not extract ID from URL: " + url);
}
@Override
public String getUrl(String id) throws ParsingException {
return "https://www.youtube.com/watch?v=" + id;
}
@Override
public boolean onAcceptUrl(String url) {
// Accept various YouTube URL patterns
return url.contains("youtube.com/watch") ||
url.contains("youtu.be/") ||
url.contains("youtube.com/embed/") ||
url.contains("youtube.com/shorts/");
}
}
Usage
LinkHandlerFactory factory = YoutubeStreamLinkHandlerFactory.getInstance();
// From URL
LinkHandler handler = factory.fromUrl(
"https://youtu.be/dQw4w9WgXcQ?t=30"
);
String id = handler.getId(); // "dQw4w9WgXcQ"
String canonical = handler.getUrl(); // "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
// From ID
LinkHandler handler2 = factory.fromId("dQw4w9WgXcQ");
String url = handler2.getUrl(); // "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
// Validation
boolean valid = factory.acceptUrl(
"https://www.youtube.com/watch?v=dQw4w9WgXcQ"
); // true
ListLinkHandler
ListLinkHandler extends LinkHandler with support for content filters and sorting:
public class ListLinkHandler extends LinkHandler {
protected final List<String> contentFilters;
protected final String sortFilter;
public ListLinkHandler(String originalUrl,
String url,
String id,
List<String> contentFilters,
String sortFilter) {
super(originalUrl, url, id);
this.contentFilters = Collections.unmodifiableList(contentFilters);
this.sortFilter = sortFilter;
}
public List<String> getContentFilters() {
return contentFilters;
}
public String getSortFilter() {
return sortFilter;
}
}
Use Cases
Channel Tabs
Filter channel content by type (videos, playlists, live)
contentFilters: ["videos"]
sortFilter: "date"
Search Results
Filter search results by content type
contentFilters: ["video", "hd"]
sortFilter: "relevance"
Playlists
Apply sorting to playlist items
contentFilters: []
sortFilter: "popularity"
Comments
Sort comments by criteria
contentFilters: []
sortFilter: "top"
ListLinkHandlerFactory
public abstract class ListLinkHandlerFactory extends LinkHandlerFactory {
// Create from query with filters
public ListLinkHandler fromQuery(String id,
List<String> contentFilters,
String sortFilter)
throws ParsingException {
String url = getUrl(id, contentFilters, sortFilter);
return new ListLinkHandler(url, url, id, contentFilters, sortFilter);
}
// Override to include filters in URL
public abstract String getUrl(String id,
List<String> contentFilters,
String sortFilter)
throws ParsingException;
}
Example Implementation
public class YoutubeChannelLinkHandlerFactory
extends ListLinkHandlerFactory {
@Override
public String getUrl(String id,
List<String> contentFilters,
String sortFilter) throws ParsingException {
String url = "https://www.youtube.com/channel/" + id;
// Add tab filter
if (!contentFilters.isEmpty()) {
String tab = contentFilters.get(0);
if (tab.equals("videos")) {
url += "/videos";
} else if (tab.equals("playlists")) {
url += "/playlists";
}
}
// Add sort parameter
if (sortFilter != null && !sortFilter.isEmpty()) {
url += "?sort=" + sortFilter;
}
return url;
}
@Override
public String getId(String url) throws ParsingException {
// Extract channel ID from various URL formats
if (url.contains("/channel/")) {
return Parser.matchGroup1("/channel/([^/?]+)", url);
} else if (url.contains("/@")) {
// Handle @username format - requires API lookup
String username = Parser.matchGroup1("/@([^/?]+)", url);
return resolveUsernameToId(username);
}
throw new ParsingException("Could not extract channel ID");
}
@Override
public boolean onAcceptUrl(String url) {
return url.contains("/channel/") ||
url.contains("/@");
}
}
SearchQueryHandler
SearchQueryHandler is specialized for search queries:
public class SearchQueryHandler extends ListLinkHandler {
protected final String searchString;
public SearchQueryHandler(String originalUrl,
String url,
String searchString,
List<String> contentFilters,
String sortFilter) {
super(originalUrl, url, searchString, contentFilters, sortFilter);
this.searchString = searchString;
}
public String getSearchString() {
return searchString;
}
}
SearchQueryHandlerFactory
public abstract class SearchQueryHandlerFactory
extends ListLinkHandlerFactory {
public SearchQueryHandler fromQuery(String query,
List<String> contentFilters,
String sortFilter)
throws ParsingException {
String url = getUrl(query, contentFilters, sortFilter);
return new SearchQueryHandler(
url, url, query, contentFilters, sortFilter
);
}
}
Example: YouTube Search
public class YoutubeSearchQueryHandlerFactory
extends SearchQueryHandlerFactory {
@Override
public String getUrl(String query,
List<String> contentFilters,
String sortFilter) throws ParsingException {
try {
String url = "https://www.youtube.com/results?search_query="
+ URLEncoder.encode(query, "UTF-8");
// Add content type filter
if (!contentFilters.isEmpty()) {
String filter = contentFilters.get(0);
if (filter.equals("video")) {
url += "&sp=EgIQAQ%253D%253D"; // Video filter
} else if (filter.equals("channel")) {
url += "&sp=EgIQAg%253D%253D"; // Channel filter
}
}
return url;
} catch (UnsupportedEncodingException e) {
throw new ParsingException("Could not encode query", e);
}
}
@Override
public String getId(String url) {
// For search, ID is the query itself
try {
return URLDecoder.decode(
Parser.matchGroup1("search_query=([^&]+)", url),
"UTF-8"
);
} catch (Exception e) {
return "";
}
}
@Override
public boolean onAcceptUrl(String url) {
return url.contains("youtube.com/results");
}
}
Best Practices
Use Singleton Pattern
Use Singleton Pattern
Link handler factories should be singletons to avoid unnecessary instantiation:
private static final MyLinkHandlerFactory INSTANCE =
new MyLinkHandlerFactory();
public static MyLinkHandlerFactory getInstance() {
return INSTANCE;
}
Handle Multiple URL Formats
Handle Multiple URL Formats
Support all variations of URLs your platform uses:
@Override
public String getId(String url) throws ParsingException {
// youtube.com/watch?v=ID
if (url.contains("youtube.com/watch")) {
return extractFromWatchUrl(url);
}
// youtu.be/ID
else if (url.contains("youtu.be/")) {
return extractFromShortUrl(url);
}
// youtube.com/embed/ID
else if (url.contains("/embed/")) {
return extractFromEmbedUrl(url);
}
throw new ParsingException("Unknown URL format");
}
Follow URL Redirects
Follow URL Redirects
The framework automatically handles Google redirects, but you may need to handle service-specific redirects:
public LinkHandler fromUrl(String url) throws ParsingException {
// Google redirects handled automatically
String polishedUrl = Utils.followGoogleRedirectIfNeeded(url);
// Handle service-specific redirects if needed
polishedUrl = followServiceRedirects(polishedUrl);
return super.fromUrl(polishedUrl);
}
Validate IDs
Validate IDs
Ensure extracted IDs match expected format:
@Override
public String getId(String url) throws ParsingException {
String id = Parser.matchGroup1("v=([a-zA-Z0-9_-]{11})", url);
// Validate ID format
if (id == null || id.length() != 11) {
throw new ParsingException("Invalid video ID: " + id);
}
return id;
}
Support Federated Services
Support Federated Services
For federated platforms (PeerTube, Mastodon), include base URL:
@Override
public String getUrl(String id, String baseUrl)
throws ParsingException {
if (baseUrl == null || baseUrl.isEmpty()) {
throw new ParsingException("Base URL required");
}
return baseUrl + "/videos/" + id;
}
Common Patterns
URL Pattern Matching
// Regex matching
String id = Parser.matchGroup1("pattern_here", url);
// String contains check
if (url.contains("/watch")) {
// ...
}
// Multiple patterns
for (String pattern : ACCEPTED_PATTERNS) {
try {
return Parser.matchGroup1(pattern, url);
} catch (Parser.RegexException e) {
// Try next pattern
}
}
Base URL Handling
// Extract base URL
String baseUrl = Utils.getBaseUrl(url);
// "https://www.youtube.com"
// Use in URL construction
String fullUrl = baseUrl + "/watch?v=" + id;
Error Messages
Provide clear error messages for debugging:throw new ParsingException(
"Could not extract ID from URL: " + url +
". Expected format: youtube.com/watch?v=ID"
);
Related Documentation
Services
Learn about StreamingService
Extractors
Understand data extraction
Overview
Architecture overview