CSS tokenizer and parser
CSS parsing begins with tokenization, converting the raw stylesheet text into tokens, followed by parsing those tokens according to CSS grammar rules.Tokenization process
The CSS tokenizer breaks the input stream into meaningful tokens:Read input stream
Process the stylesheet character by character, handling Unicode and escape sequences.
Recognize token patterns
Identify different token types based on character patterns:
- Identifiers (property names, selector names)
- Functions (e.g.,
rgb(),calc()) - Strings (quoted values)
- Numbers and dimensions (e.g.,
10px,1.5em) - Delimiters (
:,;,{,},,) - Operators (
+,-,*,/) - Hash tokens (
#id,#color) - At-keywords (
@media,@keyframes)
Handle whitespace and comments
CSS comments (
/* ... */) are discarded during tokenization. Whitespace is significant in some contexts (e.g., selector combinators) but not others.Grammar implementation
The CSS parser constructs a structured representation (CSSOM - CSS Object Model) by following CSS grammar rules.CSS uses a relatively simple grammar compared to programming languages, but it must handle error recovery gracefully since invalid CSS rules should be ignored rather than causing parsing to fail.
Stylesheet structure
Stylesheet structure
A stylesheet consists of a list of rules and at-rules:At-rules include
@import, @media, @keyframes, @font-face, etc.Rule structure
Rule structure
Each rule consists of selectors and a declaration block:Example:
h1, h2 { color: blue; font-size: 2em; }Selector grammar
Selector grammar
Selectors are composed of simple selectors and combinators:Examples:
div.container > p(descendant and child combinators)a:hover::before(pseudo-class and pseudo-element)input[type="text"](attribute selector)
Declaration grammar
Declaration grammar
Declarations consist of property-value pairs:The value can be keywords, numbers, colors, functions, or combinations.
Media query evaluation
Media queries allow conditional application of styles based on device characteristics.- Media types
- Media features
- Logical operators
- Range syntax
Basic media types target different output devices:
You now understand the CSS engine’s tokenization and parsing phases. Next, we’ll explore how parsed CSS rules are matched to HTML elements through selector matching.