sequence rule type identifies specific patterns of tokens using optional part-of-speech (POS) tagging. It’s the most sophisticated rule type, enabling detection of complex grammatical patterns.
How It Works
Thesequence rule defines a series of token requirements, optionally using POS tags for grammatical matching. It searches for an “anchor” token, then validates that surrounding tokens match the sequence requirements.
Parameters
An array of token specifications. Each token can have
pattern, tag, skip, and negate properties.Makes pattern matching case-insensitive when set to
true.Token Properties
Each token in thetokens array can have:
A regex pattern to match the token’s text content.
A POS tag or tag pattern to match (e.g.,
NN for noun, VB.* for any verb form).Number of optional tokens that can appear before this token.
When
true, matches tokens that DON’T match the pattern or tag.POS Tags Reference
Common Penn Treebank POS tags:| Tag | Description | Example |
|---|---|---|
NN | Noun, singular | ”dog”, “car” |
NNS | Noun, plural | ”dogs”, “cars” |
NNP | Proper noun, singular | ”John”, “London” |
NNPS | Proper noun, plural | ”Americans” |
VB | Verb, base form | ”run”, “go” |
VBD | Verb, past tense | ”ran”, “went” |
VBG | Verb, gerund | ”running”, “going” |
VBN | Verb, past participle | ”run”, “gone” |
JJ | Adjective | ”big”, “green” |
RB | Adverb | ”quickly”, “very” |
PRP | Pronoun, personal | ”he”, “she”, “it” |
PRP$ | Pronoun, possessive | ”his”, “her”, “its” |
DT | Determiner | ”the”, “a”, “an” |
Examples
Ambiguous Pronoun Detection
Flag pronouns that follow nouns with too much distance:Passive Voice Detection
Identify passive voice constructions:- “was written”
- “are being reviewed”
- “has been approved”
Weak Modifiers
Detect weak writing with unnecessary modifiers:Noun + Of + Noun
Suggest more concise phrasing:Repeated Prepositions
Flag awkward constructions:Complex Sentence Patterns
Detect multiple clauses that might be confusing:Negated Pattern
Match tokens that DON’T have specific properties:Use Cases
The
sequence rule is ideal for:- Detecting grammatical patterns (passive voice, nominalizations)
- Enforcing style preferences (active voice, conciseness)
- Identifying ambiguous pronoun references
- Catching weak or vague constructions
- Advanced style checking beyond simple pattern matching
Scope Behavior
Technical Details
Internally, thesequence rule (internal/check/sequence.go:246-292):
- Tokenizes the text using
nlp.TextToTokenswith POS tagging - Searches for the first non-negated token with a pattern (the “anchor”)
- For each anchor match, validates the left-hand side tokens
- Validates the right-hand side tokens
- If the full sequence matches, creates an alert spanning all matched tokens
Skip Parameter
Theskip parameter creates optional token slots:
- “was written” (0 tokens between)
- “was being written” (1 token between)
- “was completely being written” (2 tokens between)
Multiple Message Placeholders
The%s placeholders in messages are filled with the matched tokens:
Performance Considerations
The
sequence rule uses NLP processing, which is computationally expensive:- POS tagging is slower than regex matching
- Only use POS tags when necessary
- Use specific patterns where possible
- Consider limiting to specific scopes
- Test performance on large documents
existence or substitution instead.Pattern vs Tag
You can use either or both:Debugging Sequences
To understand why a sequence isn’t matching:- Start with just the anchor token
- Add surrounding tokens one at a time
- Use
skip: 10initially, then reduce - Check POS tags using Vale’s debug mode
- Test patterns separately with simpler rules
Related Rule Types
- existence: Use for simple pattern matching without grammar
- conditional: Use for presence-based dependencies
- repetition: Use for consecutive repeated tokens
- substitution: Use for simple pattern-to-replacement mappings