The parser module converts raw shell input strings into an Abstract Syntax Tree (AST) that the runtime can execute. It’s a hand-written recursive-descent parser with no external parser dependencies.
Module Structure
The parser is split into three files:
parser/
├── mod.rs # Parser entrypoint, recursive-descent implementation
├── ast.rs # AST type definitions (Expr, Word, WordPart, RedirectMode)
└── lexer.rs # Tokenizer (string → token stream)
The parser and runtime are completely decoupled. The parser produces an Expr tree and knows nothing about execution semantics.
Parsing Pipeline
Parsing happens in two phases:
┌──────────────────┐
│ Raw Input │ "echo hello | grep world"
└────────┬─────────┘
│
▼
┌──────────────────┐
│ Lexer │ Tokenize into symbols
│ (lexer.rs) │
└────────┬─────────┘
│
▼
┌──────────────────┐
│ Token Stream │ [Word("echo"), Word("hello"), Pipe, Word("grep"), Word("world"), Eof]
└────────┬─────────┘
│
▼
┌──────────────────┐
│ Parser │ Build AST tree
│ (mod.rs) │
└────────┬─────────┘
│
▼
┌──────────────────┐
│ AST │ Expr::Pipe {
│ (Expr tree) │ left: Expr::Command { name: "echo", args: ["hello"] },
└──────────────────┘ right: Expr::Command { name: "grep", args: ["world"] }
}
Lexer (lexer.rs)
The lexer converts a raw string into a flat token stream. It handles:
Whitespace skipping
Quote parsing (single and double)
Escape sequences (\n, \t, etc.)
Variable expansion ($VAR, ${VAR})
Command substitution ($(command))
Operator recognition (|, &&, ||, ;, >, >>)
Token Types
From src/parser/lexer.rs:10-34:
pub enum Token {
Word ( Word ), // A (possibly compound) word
Pipe , // |
And , // &&
Or , // ||
Semi , // ;
LParen , // (
RParen , // )
RedirectOut , // >
RedirectAppend , // >>
RedirectIn , // <
Eof , // End of input
}
Tokenization Process
The lexer reads character by character, building Word tokens that may contain multiple parts:
// From src/parser/lexer.rs:46-119
pub fn tokenize ( mut self ) -> Result < Vec < Token >, ParseError > {
let mut tokens = Vec :: new ();
loop {
self . skip_whitespace ();
if self . pos >= self . src . len () {
tokens . push ( Token :: Eof );
break ;
}
let ch = self . current_char ();
// Handle operators
let tok = match ch {
'|' => { /* ... pipe or || ... */ }
'&' => { /* ... && ... */ }
';' => Token :: Semi ,
'>' => { /* ... > or >> ... */ }
'<' => Token :: RedirectIn ,
_ => {
let word = self . read_word () ? ; // Read compound word
Token :: Word ( word )
}
};
tokens . push ( tok );
}
Ok ( tokens )
}
Quote Handling
Quotes are handled during word reading (src/parser/lexer.rs:156-227):
Single Quotes (no expansion)
' \' ' => {
self . advance (); // opening '
while self . pos < self . src . len () && self . current_char () != ' \' ' {
literal_buf . push ( self . current_char ());
self . advance ();
}
if self . pos >= self . src . len () {
return Err ( ParseError :: Unmatched ( ' \' ' ));
}
self . advance (); // closing '
}
Single quotes disable all expansions: '$HOME' → literal $HOME
Double Quotes (expansion active)
'"' => {
self . advance (); // opening "
while self . pos < self . src . len () && self . current_char () != '"' {
if self . current_char () == '$' {
// Variable or command substitution
if ! literal_buf . is_empty () {
parts . push ( WordPart :: Literal ( std :: mem :: take ( & mut literal_buf )));
}
let part = self . read_dollar () ? ;
parts . push ( part );
} else {
literal_buf . push ( self . current_char ());
self . advance ();
}
}
self . advance (); // closing "
}
Double quotes allow $VAR and $(cmd) expansion: "Hello $USER" → Hello alice
Variable Expansion
The lexer recognizes three forms (src/parser/lexer.rs:229-261):
fn read_dollar ( & mut self ) -> Result < WordPart , ParseError > {
self . advance (); // consume '$'
match self . current_char () {
'(' => {
// Command substitution: $(cmd)
self . advance ();
let sub_src = self . read_until_close_paren () ? ;
let expr = super :: parse ( & sub_src ) ? ; // Recursive parse!
Ok ( WordPart :: CommandSubst ( Box :: new ( expr )))
}
'{' => {
// Braced variable: ${VAR}
self . advance ();
let name = self . read_identifier ();
if self . current_char () == '}' { self . advance (); }
Ok ( WordPart :: Variable ( name ))
}
c if c . is_alphabetic () || c == '_' => {
// Simple variable: $VAR
let name = self . read_identifier ();
Ok ( WordPart :: Variable ( name ))
}
_ => Ok ( WordPart :: Literal ( "$" . to_string ())),
}
}
AST Types (ast.rs)
The AST represents the parsed shell command structure.
Expr — Top-Level Expression
From src/parser/ast.rs:30-57:
pub enum Expr {
/// Simple command with arguments
Command { name : Word , args : Vec < Word > },
/// Pipe: left | right
Pipe { left : Box < Expr >, right : Box < Expr > },
/// Redirect: cmd > file, cmd < file, cmd >> file
Redirect { expr : Box < Expr >, file : Word , mode : RedirectMode },
/// Sequence: left ; right (always run both)
Sequence { left : Box < Expr >, right : Box < Expr > },
/// And: left && right (run right only if left succeeds)
And { left : Box < Expr >, right : Box < Expr > },
/// Or: left || right (run right only if left fails)
Or { left : Box < Expr >, right : Box < Expr > },
/// Subshell: ( expr )
Subshell { expr : Box < Expr > },
}
Word — Compound Word
A word may contain multiple parts that need expansion (src/parser/ast.rs:3-17):
pub struct Word ( pub Vec < WordPart >);
pub enum WordPart {
Literal ( String ), // Plain text or quoted string
Variable ( String ), // $VAR or ${VAR}
CommandSubst ( Box < Expr >), // $(command)
}
Example: "Hello $USER" becomes:
Word ( vec! [
WordPart :: Literal ( "Hello " . to_string ()),
WordPart :: Variable ( "USER" . to_string ()),
])
RedirectMode
From src/parser/ast.rs:19-28:
pub enum RedirectMode {
Overwrite , // > — truncate/create and write
Append , // >> — append to file
Input , // < — read from file
}
Parser (mod.rs)
The parser uses recursive descent to build the AST from tokens.
Parse Entry Point
From src/parser/mod.rs:30-39:
pub fn parse ( input : & str ) -> ParseResult < Expr > {
let tokens = Lexer :: new ( input ) . tokenize () ? ;
let mut parser = Parser :: new ( tokens );
let expr = parser . parse_list () ? ;
if ! parser . is_done () {
return Err ( ParseError :: UnexpectedToken ( parser . peek () . clone ()));
}
Ok ( expr )
}
Grammar Structure
The parser implements this grammar hierarchy (src/parser/mod.rs:75-229):
list = pipeline ( ('&&'|'||'|';') pipeline )*
pipeline = redirect ('|' redirect)*
redirect = command (redirection)*
command = subshell | simple_command
subshell = '(' list ')'
simple_command = word+
Operator Precedence
From highest to lowest:
Words — echo hello
Redirects — > file, < input
Pipes — |
Sequences — ;
Conditionals — &&, ||
Example: Parse 'echo hello | grep h && wc -l'
Tokens: [Word("echo"), Word("hello"), Pipe, Word("grep"), Word("h"), And, Word("wc"), Word("-l"), Eof]
Parse tree:
Expr::And {
left: Expr::Pipe {
left: Expr::Command { name: "echo", args: ["hello"] },
right: Expr::Command { name: "grep", args: ["h"] }
},
right: Expr::Command { name: "wc", args: ["-l"] }
}
The parser correctly groups the pipe before the &&.
Example Parsing Flow
Let’s trace parsing echo $USER > /tmp/out.txt:
// 1. Lexer tokenizes input
Tokens : [
Word ([ Literal ( "echo" )]),
Word ([ Variable ( "USER" )]),
RedirectOut ,
Word ([ Literal ( "/tmp/out.txt" )]),
Eof
]
// 2. Parser builds AST
Expr :: Redirect {
expr : Box :: new ( Expr :: Command {
name : Word ([ Literal ( "echo" )]),
args : vec! [ Word ([ Variable ( "USER" )])]
}),
file : Word ([ Literal ( "/tmp/out.txt" )]),
mode : RedirectMode :: Overwrite
}
Testing
The parser module includes extensive tests (src/parser/mod.rs:232-431):
#[test]
fn test_pipe () {
let expr = parse ( "cat file.txt | grep foo" ) . unwrap ();
assert_eq! (
expr ,
Expr :: Pipe {
left : Box :: new ( cmd ( "cat" , & [ "file.txt" ])),
right : Box :: new ( cmd ( "grep" , & [ "foo" ])),
}
);
}
#[test]
fn test_variable_expansion () {
let expr = parse ( "echo $HOME" ) . unwrap ();
assert_eq! (
expr ,
Expr :: Command {
name : Word ( vec! [ WordPart :: Literal ( "echo" . to_string ())]),
args : vec! [ Word ( vec! [ WordPart :: Variable ( "HOME" . to_string ())])],
}
);
}
Key Design Decisions
1. Two-Phase Parsing
Separating lexing from parsing simplifies both:
Lexer handles quotes, escapes, and low-level string manipulation
Parser focuses on grammar and tree construction
2. Hand-Written Parser
No parser generator dependencies:
Full control over error messages
Easy to debug and extend
No macro magic or generated code
3. Compound Words
Words are Vec<WordPart> to support mixed literal and expansion:
// "Hello $USER!" becomes:
Word ([
Literal ( "Hello " ),
Variable ( "USER" ),
Literal ( "!" )
])
The runtime expands each part and concatenates the result.
Next Steps