Overview
Target URLs define the content sources that Web Scraping Hub will scrape. Each target represents a category of content (movies, series, anime, etc.) and is configured in theTARGET_URLS list in config.py.
Default Configuration
The default target URLs are configured for the SoloLatino.net platform:Target URL Structure
Each target URL is a dictionary with the following fields:The display name for this content category. This name appears in the UI and is used in API endpoints.Examples:
"Películas", "Series", "Netflix"The full URL to scrape for this category. Typically constructed using the
BASE_URL.Example: f"{BASE_URL}/peliculas"Content Categories
Media Type Categories
These categories organize content by media type:| Category | Name | URL Pattern |
|---|---|---|
| Movies | Películas | /peliculas |
| TV Series | Series | /series |
| Anime Series | Anime | /animes |
| Anime Movies | Peliculas de Anime | /genero/anime |
| Cartoons | Caricaturas | /genre_series/toons |
| Korean Dramas | K-Drama | /genre_series/kdramas/ |
Streaming Platform Categories
These categories filter content by streaming platform:| Platform | URL Pattern |
|---|---|
| Amazon | /network/amazon |
| Apple TV | /network/apple-tv |
| Disney | /network/disney |
| HBO | /network/hbo |
| HBO Max | /network/hbo-max |
| Hulu | /network/hulu |
| Netflix | /network/netflix |
Adding New Target URLs
To add a new content category:Pagination Handling
The application automatically handles pagination for target URLs:The
K-Drama category uses a different URL pattern for pagination without a leading slash.URL Normalization
The backend normalizes section names to handle case-insensitive and accent-insensitive matching:"peliculas", "Películas", or "PELICULAS" and get the same results.
Custom Target URL Example
For a different scraping source:API Integration
Target URLs are exposed through the API:List All Sections
Get Content from Section
URL Validation
To test a target URL configuration:Special URL Patterns
Search URLs
Search functionality uses a different URL pattern:Content-Specific URLs
When accessing specific content:Troubleshooting
Section not appearing in API
Section not appearing in API
- Verify the target URL is properly added to
TARGET_URLS - Restart the Flask backend server
- Check for syntax errors in
config.py
Empty results from target URL
Empty results from target URL
- Verify the URL is accessible
- Check if the site structure has changed
- Ensure the extractor matches the HTML structure
- Test with
fetch_html()function
Pagination not working
Pagination not working
- Check if the URL pattern matches the site’s pagination format
- Some sections may use different pagination patterns
- Verify the URL construction in
app.py
Related Resources
Backend Configuration
Configure the Flask backend server
Extractors
Learn about content extraction logic