Skip to main content
Templates are pre-built base classes that handle the boilerplate of read_novel_info() and download_chapter_body() for you. Instead of implementing the full crawl loop, you implement small, focused parsing methods and the template wires them together. All soup-based templates live in lncrawl/templates/soup/. For JavaScript-heavy sites there are matching browser templates in lncrawl/templates/browser/.
For most new sources, start with GeneralSoupTemplate (_01_general_soup.py). If the site has a search page, use SearchableSoupTemplate (_02_searchable_soup.py) instead.

Soup templates

Import: lncrawl.templates.soup.generalWhen to use: The default choice for any site that serves plain HTML. Handles the full novel info flow — title, cover, authors, genres, synopsis, chapter list — through small focused methods you implement.
from lncrawl.templates.soup.general import GeneralSoupTemplate

Required methods

parse_title(soup)
method
required
Parse and return the novel title from the novel info page.
def parse_title(self, soup: PageSoup) -> str:
    return soup.select_one("h1.novel-title").text.strip()
  • soup (PageSoup) — Result of self.get_soup(self.novel_url).
Returns: str
parse_cover(soup)
method
required
Parse and return the absolute URL of the novel cover image.
def parse_cover(self, soup: PageSoup) -> Optional[str]:
    tag = soup.select_one(".cover img")
    return self.absolute_url(tag["src"]) if tag else None
  • soup (PageSoup) — Result of self.get_soup(self.novel_url).
Returns: Optional[str]
parse_chapter_list(soup)
method
required
Yield Chapter and/or Volume objects that make up the table of contents. Chapters must have a 1-based id. Volumes are optional.
def parse_chapter_list(
    self, soup: PageSoup
) -> Generator[Union[Chapter, Volume], None, None]:
    chap_id = 0
    for a in soup.select(".chapter-list li a"):
        chap_id += 1
        yield Chapter(
            id=chap_id,
            title=a.text.strip(),
            url=self.absolute_url(a["href"]),
        )
  • soup (PageSoup) — Result of self.get_soup(self.novel_url).
Returns: Generator[Union[Chapter, Volume], None, None]
select_chapter_body(soup)
method
required
Return the single PageSoup tag that contains the chapter text. The template passes it to parse_chapter_body() which calls self.cleaner.extract_contents() to produce clean HTML.
def select_chapter_body(self, soup: PageSoup) -> PageSoup:
    return soup.select_one("div.chapter-content")
  • soup (PageSoup) — Result of self.get_soup(chapter.url).
Returns: PageSoup

Optional methods

parse_authors(soup)
method
Yield author names. Default yields nothing.
def parse_authors(self, soup: PageSoup) -> Generator[str, None, None]:
    for a in soup.select(".author-list a"):
        yield a.text.strip()
Returns: Generator[str, None, None]
parse_genres(soup)
method
Yield genre/category names. Default yields nothing.Returns: Generator[str, None, None]
parse_summary(soup)
method
Return the novel synopsis as a string. Default returns None.Returns: Optional[str]
get_novel_soup()
method
Return the PageSoup for the novel info page. Default calls self.get_soup(self.novel_url). Override when you need to navigate to a different page, handle pagination, or pass custom headers.
def get_novel_soup(self) -> PageSoup:
    return self.get_soup(self.novel_url)
Returns: PageSoup
parse_chapter_body(tag)
method
Post-process the tag returned by select_chapter_body(). Default calls self.cleaner.extract_contents(tag). Override if you need custom HTML extraction logic.
  • tag (PageSoup) — The tag selected by select_chapter_body().
Returns: str

Named site templates

Several popular site platforms have dedicated templates that implement all parsing logic. You only need to set base_url and any site-specific overrides.
Template classModulePlatform
MadaraTemplatelncrawl.templates.madaraWordPress sites using the Madara theme
NovelFullTemplatelncrawl.templates.novelfullNovelFull-style sites
NovelPubTemplatelncrawl.templates.novelpubNovelPub and clones
NovelMTLTemplatelncrawl.templates.novelmtlNovelMTL and clones
MangaStreamTemplatelncrawl.templates.mangastreamMangaStream-style sites
from lncrawl.templates.novelfull import NovelFullTemplate

class MyNovelFullSite(NovelFullTemplate):
    base_url = ["https://mynovelfullclone.com/"]
Platform templates set is_template = True internally. Do not set this on your own crawler classes.

Complete example: GeneralSoupTemplate

This example is based on sources/_examples/_01_general_soup.py.
import logging
from typing import Generator, Union

from lncrawl.core import PageSoup
from lncrawl.models import Chapter, Volume
from lncrawl.templates.soup.general import GeneralSoupTemplate

logger = logging.getLogger(__name__)


class MyCrawler(GeneralSoupTemplate):
    base_url = ["https://example.com/"]
    has_manga = False
    has_mtl = False

    def initialize(self) -> None:
        self.cleaner.bad_css.update([".ad-slot", ".donation-banner"])

    def parse_title(self, soup: PageSoup) -> str:
        return soup.select_one("h1.novel-title").text.strip()

    def parse_cover(self, soup: PageSoup) -> str:
        tag = soup.select_one(".cover-image img")
        if not tag:
            return ""
        return self.absolute_url(tag.get("data-src") or tag["src"])

    def parse_authors(self, soup: PageSoup) -> Generator[str, None, None]:
        for a in soup.select(".author-list a"):
            yield a.text.strip()

    def parse_genres(self, soup: PageSoup) -> Generator[str, None, None]:
        for a in soup.select(".genre-tags a"):
            yield a.text.strip()

    def parse_summary(self, soup: PageSoup) -> str:
        return self.cleaner.extract_contents(soup.select_one(".synopsis"))

    def parse_chapter_list(
        self, soup: PageSoup
    ) -> Generator[Union[Chapter, Volume], None, None]:
        chap_id = 0
        for a in soup.select(".chapter-list li a"):
            chap_id += 1
            yield Chapter(
                id=chap_id,
                title=a.text.strip(),
                url=self.absolute_url(a["href"]),
            )

    def select_chapter_body(self, soup: PageSoup) -> PageSoup:
        return soup.select_one("div.chapter-content")

Complete example: SearchableSoupTemplate

This example is based on sources/_examples/_02_searchable_soup.py.
import logging
from typing import Generator, Union
from urllib.parse import urlencode

from lncrawl.core import PageSoup
from lncrawl.models import Chapter, SearchResult, Volume
from lncrawl.templates.soup.searchable import SearchableSoupTemplate

logger = logging.getLogger(__name__)


class MyCrawler(SearchableSoupTemplate):
    base_url = ["https://example.com/"]

    def select_search_items(
        self, query: str
    ) -> Generator[PageSoup, None, None]:
        params = {"searchkey": query}
        soup = self.post_soup(f"{self.home_url}search?{urlencode(params)}")
        yield from soup.select(".col-content .con .txt h3 a")

    def parse_search_item(self, tag: PageSoup) -> SearchResult:
        return SearchResult(
            title=tag.get_text(strip=True),
            url=self.absolute_url(tag["href"]),
        )

    def parse_title(self, soup: PageSoup) -> str:
        return soup.select_one("h1.novel-title").text.strip()

    def parse_cover(self, soup: PageSoup) -> str:
        tag = soup.select_one(".cover img")
        return self.absolute_url(tag["src"]) if tag else ""

    def parse_chapter_list(
        self, soup: PageSoup
    ) -> Generator[Union[Chapter, Volume], None, None]:
        chap_id = 0
        for a in soup.select(".chapter-list li a"):
            chap_id += 1
            yield Chapter(
                id=chap_id,
                title=a.text.strip(),
                url=self.absolute_url(a["href"]),
            )

    def select_chapter_body(self, soup: PageSoup) -> PageSoup:
        return soup.select_one("div.chapter-text")

Testing your crawler

After writing your crawler, test it from the source:
# Download the first 3 chapters from a novel URL
uv run python -m lncrawl -s "https://example.com/novel/my-novel" --first 3 -f

# Verify your source appears in the registry
uv run python -m lncrawl sources list | grep example.com

Build docs developers (and LLMs) love