Lexical analysis, parsing, and Abstract Syntax Tree construction in rustc
The frontend of the Rust compiler transforms source code text into structured data that the rest of the compiler can analyze. This involves lexical analysis (tokenization), parsing, and AST construction.
The rustc_lexer crate performs tokenization without any dependencies on the rest of the compiler:
The lexer is intentionally kept simple and standalone. It produces a stream of tokens with minimal semantic analysis, making it reusable for other tools like syntax highlighters.
The rustc_parse crate provides the main parser interface:
/// Creates a new parser from a source stringpub fn new_parser_from_source_str( psess: &ParseSess, name: FileName, source: String, strip_tokens: StripTokens,) -> Result<Parser<'_>, Vec<Diag<'_>>>/// Creates a new parser from a filepub fn new_parser_from_file( psess: &ParseSess, path: &Path, strip_tokens: StripTokens,) -> Result<Parser<'_>, Vec<Diag<'_>>>
The rustc_ast crate contains the AST definitions. The AST represents the syntactic structure of Rust code:
The rustc_ast crate contains things concerned purely with syntax – the AST (“abstract syntax tree”), token streams, definitions for tokens, and shared definitions for other AST-related parts of the compiler.
/// A "Path" is essentially Rust's notion of a name./// E.g., `std::cmp::PartialEq`pub struct Path { pub span: Span, /// The segments in the path: the things separated by `::` /// Global paths begin with `kw::PathRoot` pub segments: ThinVec<PathSegment>, pub tokens: Option<LazyAttrTokenStream>,}
/// A "Lifetime" is an annotation of the scope in which a variable/// can be used, e.g. `'a` in `&'a i32`pub struct Lifetime { pub id: NodeId, pub ident: Ident,}/// A "Label" is an identifier of some point in sources,/// e.g. in `'outer: loop { break 'outer; }`pub struct Label { pub ident: Ident,}