lncrawl a URL, it looks up the matching source and uses it to download content from that site.
Source organization
Sources live in thesources/ directory and are organized by language code. English sources are further split alphabetically into subdirectories.
en/ — English
The largest group. Hundreds of English-language light novel and manga sites, organized alphabetically (
a/, b/, c/, …).zh/ — Chinese
Chinese-language novel sites (simplified and traditional).
ja/ — Japanese
Japanese sources, including Syosetu and related platforms.
ar/ — Arabic
Arabic novel sites such as arnovel.me and kolnovel.com.
es/ — Spanish
Spanish-language novel sites.
fr/ — French
French-language novel sites.
id/ — Indonesian
Indonesian novel sites.
pt/ — Portuguese
Portuguese-language novel sites.
ru/ — Russian
Russian-language novel sites.
tr/ — Turkish
Turkish novel sites.
vi/ — Vietnamese
Vietnamese novel sites.
multi/ — Multi-language
Sites that serve content in multiple languages (e.g., mtlnovels.com, foxaholic.com, wattpad.com).
Source metadata fields
Every crawler class defines a set of metadata attributes. These are read by the source service to index and filter sources.| Field | Type | Description |
|---|---|---|
base_url | str | list[str] | One or more URLs that this crawler handles |
language | str | BCP-47 language code (e.g., "en", "zh") |
has_manga | bool | True if the site serves manga, manhua, or manhwa |
has_mtl | bool | True if the site serves machine-translated content |
version | int | Monotonically increasing version number (bumped on each edit) |
| Flag | Meaning |
|---|---|
can_search | The crawler implements search_novel() |
can_login | The crawler implements login() |
is_disabled | The source’s domain appears in the rejected list |
Feature icons
The README and web UI use icons to indicate source capabilities at a glance:| Icon | Meaning |
|---|---|
| 🤖 | Contains machine-translated (MTL) content |
| 🔍 | Supports searching for novels by keyword |
| 🔑 | Requires a login / account to access content |
| 🖼️ | Serves manga, manhua, or manhwa (image-based content) |
Managing sources
Thelncrawl sources command is the entry point for all source-related tasks.
List available sources
Print every supported source URL to stdout:Search for a source
Filter by domain name or keyword:Create a new source with AI assistance
Generate a crawler scaffold using ChatGPT:Source discovery mechanism
Sources are auto-discovered at startup. TheSources service (in lncrawl/services/sources/service.py) scans all *.py files under the configured source directories and imports them with a thread pool. Any class that extends Crawler and sets a base_url is automatically registered.
The discovery order is:
- Bundled sources — the
sources/directory shipped with the package (loaded from the compiled_index.json) - User sources — a user-configured directory for private or custom crawlers
- Remote updates — an optional background sync that downloads the latest source index from GitHub
Sources are loaded lazily — the source service does not block startup. If you call
lncrawl sources list immediately after launch the list will be populated asynchronously.