Sources overview

A source is a crawler that knows how to scrape a specific website. Each source is a Python module that understands a site’s structure: how to find the novel title and cover, how to list chapters, and how to extract chapter text. When you give lncrawl a URL, it looks up the matching source and uses it to download content from that site.

Source organization

Sources live in the sources/ directory and are organized by language code. English sources are further split alphabetically into subdirectories.

en/ — English

The largest group. Hundreds of English-language light novel and manga sites, organized alphabetically (a/, b/, c/, …).

zh/ — Chinese

Chinese-language novel sites (simplified and traditional).

ja/ — Japanese

Japanese sources, including Syosetu and related platforms.

ar/ — Arabic

Arabic novel sites such as arnovel.me and kolnovel.com.

es/ — Spanish

Spanish-language novel sites.

fr/ — French

French-language novel sites.

id/ — Indonesian

Indonesian novel sites.

pt/ — Portuguese

Portuguese-language novel sites.

ru/ — Russian

Russian-language novel sites.

tr/ — Turkish

Turkish novel sites.

vi/ — Vietnamese

Vietnamese novel sites.

multi/ — Multi-language

Sites that serve content in multiple languages (e.g., mtlnovels.com, foxaholic.com, wattpad.com).

Source metadata fields

Every crawler class defines a set of metadata attributes. These are read by the source service to index and filter sources.

Field	Type	Description
`base_url`	`str \| list[str]`	One or more URLs that this crawler handles
`language`	`str`	BCP-47 language code (e.g., `"en"`, `"zh"`)
`has_manga`	`bool`	`True` if the site serves manga, manhua, or manhwa
`has_mtl`	`bool`	`True` if the site serves machine-translated content
`version`	`int`	Monotonically increasing version number (bumped on each edit)

Additional flags are derived at runtime:

Flag	Meaning
`can_search`	The crawler implements `search_novel()`
`can_login`	The crawler implements `login()`
`is_disabled`	The source’s domain appears in the rejected list

Feature icons

The README and web UI use icons to indicate source capabilities at a glance:

Icon	Meaning
🤖	Contains machine-translated (MTL) content
🔍	Supports searching for novels by keyword
🔑	Requires a login / account to access content
🖼️	Serves manga, manhua, or manhwa (image-based content)

Managing sources

The lncrawl sources command is the entry point for all source-related tasks.

$ lncrawl sources --help

List available sources

Print every supported source URL to stdout:

lncrawl sources list

Search for a source

Filter by domain name or keyword:

# Check whether a specific domain is supported
lncrawl sources list | grep "novelbin.com"

# Search by keyword using the built-in search
lncrawl sources list | grep "wuxia"

Create a new source with AI assistance

Generate a crawler scaffold using ChatGPT:

lncrawl sources create

To add a source manually, copy sources/_examples/_01_general_soup.py into the correct sources/{lang}/ folder and implement parse_title, parse_cover, parse_chapter_list, and select_chapter_body. See the contributor guide for the full walkthrough.

Source discovery mechanism

Sources are auto-discovered at startup. The Sources service (in lncrawl/services/sources/service.py) scans all *.py files under the configured source directories and imports them with a thread pool. Any class that extends Crawler and sets a base_url is automatically registered. The discovery order is:

Bundled sources — the sources/ directory shipped with the package (loaded from the compiled _index.json)
User sources — a user-configured directory for private or custom crawlers
Remote updates — an optional background sync that downloads the latest source index from GitHub

Each source is indexed by host name so URL lookups are O(1). A full-text search (FTS) index is also built over file names, class names, and URLs to support keyword queries.

Sources are loaded lazily — the source service does not block startup. If you call lncrawl sources list immediately after launch the list will be populated asynchronously.

Get Started

Usage

Configuration

Sources

Contributing

Source organization

en/ — English

zh/ — Chinese

ja/ — Japanese

ar/ — Arabic

es/ — Spanish

fr/ — French

id/ — Indonesian

pt/ — Portuguese

ru/ — Russian

tr/ — Turkish

vi/ — Vietnamese

multi/ — Multi-language

Source metadata fields

Feature icons

Managing sources

List available sources

Search for a source

Create a new source with AI assistance

Source discovery mechanism

Build docs developers (and LLMs) love

Get Started

Usage

Configuration

Sources

Contributing

​Source organization

en/ — English

zh/ — Chinese

ja/ — Japanese

ar/ — Arabic

es/ — Spanish

fr/ — French

id/ — Indonesian

pt/ — Portuguese

ru/ — Russian

tr/ — Turkish

vi/ — Vietnamese

multi/ — Multi-language

​Source metadata fields

​Feature icons

​Managing sources

​List available sources

​Search for a source

​Create a new source with AI assistance

​Source discovery mechanism

Build docs developers (and LLMs) love

Source organization

Source metadata fields

Feature icons

Managing sources

List available sources

Search for a source

Create a new source with AI assistance

Source discovery mechanism