What the parser does
The parser accepts wikitext — the[[link]], '''bold''', {{template}} syntax that editors write — and produces HTML output along with metadata about the page. Its primary entry point is Parser::parse(), which returns a ParserOutput object containing both the rendered HTML and all associated metadata.
From the Parser.php docblock:
PHP Parser — Processes wiki markup (which uses a more user-friendly syntax, such as [[link]] for making links), and provides a one-way transformation of that wiki markup into (X)HTML output / markup (which in turn the browser understands, and can display).
The seven main entry points into Parser are:
| Method | Purpose |
|---|---|
Parser::parse() | Produces HTML output from wikitext |
Parser::preSaveTransform() | Produces altered wikitext before saving |
Parser::preprocess() | Removes HTML comments and expands templates |
Parser::cleanSig() | Cleans a signature before saving to preferences |
Parser::getSection() | Returns the content of a section for section editing |
Parser::replaceSection() | Replaces a section by number inside an article |
Parser::getPreloadText() | Removes <noinclude> sections and <includeonly> tags |
Parser stages
Parsing wikitext happens in distinct stages:Preprocessing
The
Preprocessor_Hash class handles the first pass. It strips HTML comments, expands templates by recursively fetching and substituting their content, processes <noinclude> and <includeonly> tags, and resolves parser function calls. The result is a preprocessed tree ready for the main parse.Parsing
The main parse pass runs
Parser::parse(). It processes the preprocessed tree, handles wiki markup (bold, italics, links, headings, tables), invokes tag hooks for custom tags like <ref> or <syntaxhighlight>, and resolves magic words and variables.ParserOutput: the result of parsing
ParserOutput is the object returned by the parser. It combines the rendered HTML with metadata collected during parsing:
ParserOutput:
HTML content
The rendered HTML of the page, accessible via
getContentHolderText().Categories
Categories the page belongs to, collected from
[[Category:...]] links during parsing.Internal links
All links to other wiki pages found in the content, used to maintain the
pagelinks table.Page properties
Arbitrary key-value data set by templates and parser functions via
{{DISPLAYTITLE:}}, __NOINDEX__, etc.Templates used
Every template transcluded during parsing, enabling cache invalidation when templates change.
Extension data
Arbitrary data stored by extensions using
setExtensionData() / getExtensionData().ParserOutput objects for page revisions are created by ParserOutputAccess, which automatically caches them via ParserCache.
Parsoid: the new parser
Parsoid is a rewrite of the MediaWiki parser that produces semantically rich HTML+RDFa output rather than the ad-hoc HTML produced by the PHP parser. It was developed to support the VisualEditor and is now integrated into MediaWiki core asParsoidParser.
mediawiki.skinning.content.parsoid ResourceLoader module:
Parsoid is declared
@unstable since 1.41 while the long-term plan (tracked in T236809) for full integration is completed. For new extensions, prefer the existing PHP parser hooks while Parsoid stabilizes.Parser hooks for extending the parser
Extensions extend the parser through two types of hooks registered duringParserFirstCallInit.
Tag hooks
Tag hooks handle custom XML-style tags. The core parser registers its own tags inCoreTagHooks::register():
Function hooks
Function hooks implement parser functions called as{{#myfunction: arg1 | arg2 }}. They are registered with Parser::setFunctionHook():
extension.json:
ParserCache
TheParserCache stores ParserOutput objects to avoid reparsing pages on every request. It uses a two-tiered cache backed by BagOStuff:
ParserOptions that actually influenced the output during a given parse. For example, if only dateformat and userlang were accessed, only those values are included in the cache key. A lookup with the same dateformat and userlang hits the same entry regardless of other options.
