Skip to main content
A source is a crawler that knows how to scrape a specific website. Each source is a Python module that understands a site’s structure: how to find the novel title and cover, how to list chapters, and how to extract chapter text. When you give lncrawl a URL, it looks up the matching source and uses it to download content from that site.

Source organization

Sources live in the sources/ directory and are organized by language code. English sources are further split alphabetically into subdirectories.

en/ — English

The largest group. Hundreds of English-language light novel and manga sites, organized alphabetically (a/, b/, c/, …).

zh/ — Chinese

Chinese-language novel sites (simplified and traditional).

ja/ — Japanese

Japanese sources, including Syosetu and related platforms.

ar/ — Arabic

Arabic novel sites such as arnovel.me and kolnovel.com.

es/ — Spanish

Spanish-language novel sites.

fr/ — French

French-language novel sites.

id/ — Indonesian

Indonesian novel sites.

pt/ — Portuguese

Portuguese-language novel sites.

ru/ — Russian

Russian-language novel sites.

tr/ — Turkish

Turkish novel sites.

vi/ — Vietnamese

Vietnamese novel sites.

multi/ — Multi-language

Sites that serve content in multiple languages (e.g., mtlnovels.com, foxaholic.com, wattpad.com).

Source metadata fields

Every crawler class defines a set of metadata attributes. These are read by the source service to index and filter sources.
FieldTypeDescription
base_urlstr | list[str]One or more URLs that this crawler handles
languagestrBCP-47 language code (e.g., "en", "zh")
has_mangaboolTrue if the site serves manga, manhua, or manhwa
has_mtlboolTrue if the site serves machine-translated content
versionintMonotonically increasing version number (bumped on each edit)
Additional flags are derived at runtime:
FlagMeaning
can_searchThe crawler implements search_novel()
can_loginThe crawler implements login()
is_disabledThe source’s domain appears in the rejected list

Feature icons

The README and web UI use icons to indicate source capabilities at a glance:
IconMeaning
🤖Contains machine-translated (MTL) content
🔍Supports searching for novels by keyword
🔑Requires a login / account to access content
🖼️Serves manga, manhua, or manhwa (image-based content)

Managing sources

The lncrawl sources command is the entry point for all source-related tasks.
$ lncrawl sources --help

List available sources

Print every supported source URL to stdout:
lncrawl sources list

Search for a source

Filter by domain name or keyword:
# Check whether a specific domain is supported
lncrawl sources list | grep "novelbin.com"

# Search by keyword using the built-in search
lncrawl sources list | grep "wuxia"

Create a new source with AI assistance

Generate a crawler scaffold using ChatGPT:
lncrawl sources create
To add a source manually, copy sources/_examples/_01_general_soup.py into the correct sources/{lang}/ folder and implement parse_title, parse_cover, parse_chapter_list, and select_chapter_body. See the contributor guide for the full walkthrough.

Source discovery mechanism

Sources are auto-discovered at startup. The Sources service (in lncrawl/services/sources/service.py) scans all *.py files under the configured source directories and imports them with a thread pool. Any class that extends Crawler and sets a base_url is automatically registered. The discovery order is:
  1. Bundled sources — the sources/ directory shipped with the package (loaded from the compiled _index.json)
  2. User sources — a user-configured directory for private or custom crawlers
  3. Remote updates — an optional background sync that downloads the latest source index from GitHub
Each source is indexed by host name so URL lookups are O(1). A full-text search (FTS) index is also built over file names, class names, and URLs to support keyword queries.
Sources are loaded lazily — the source service does not block startup. If you call lncrawl sources list immediately after launch the list will be populated asynchronously.

Build docs developers (and LLMs) love