Overview
This guide walks you through running the complete NL2FOL pipeline: from translating natural language to first-order logic, converting to SMT format, and evaluating results.Basic Workflow
Convert Natural Language to First-Order Logic
Use
nl_to_fol.py to process a dataset and generate logical formulasConvert FOL to SMT and Run Solver
Use
fol_to_cvc.py to convert formulas to SMT format and verify themStep 1: Natural Language to FOL
The main entry point isnl_to_fol.py, which processes datasets and generates first-order logic formulas.
Command Syntax
Example: Using GPT-4
GPT models (starting with “gpt-”) use the OpenAI API. Ensure your OPENAI_API_KEY environment variable is set.
Example: Using Llama
Available Datasets
Output
This generates a CSV file atresults/<run_name>.csv with columns:
- articles: Original natural language text
- Claim: Extracted claim from the sentence
- Implication: Extracted implication
- Referring Expressions - Claim/Implication: Entity references
- Property Implications: Property relationships
- Equal Entities: Entities identified as equal
- Subset Entities: Subset relationships between entities
- Claim Lfs: Logical form for claim
- Implication Lfs: Logical form for implication
- Logical Form: Final combined formula (version 1)
- Logical Form 2: Final combined formula (version 2)
- label: Ground truth (0=fallacy, 1=valid)
Step 2: FOL to SMT Verification
Convert the generated logical formulas to SMT format and verify them using CVC4/CVC5.Command Syntax
Example
- Read the FOL formulas from
results/my_first_run.csv - Generate SMT-LIB files in
results/my_first_run_smt/ - Run CVC4 on each formula
- Save results to
results/my_first_run_results.csv
SMT Results
The solver produces one of three results:Valid
Formula is unsatisfiable (negation fails) - the argument is logically valid
LF
Formula is satisfiable or unknown - the argument contains a logical fallacy
Empty
Processing error occurred
The code uses “Logical Form 2” by default, which applies different quantifier scoping heuristics than “Logical Form”.
Step 3: Evaluate Results
Calculate performance metrics on the results.Command Syntax
Example
Output Metrics
The script outputs:- Accuracy: 85% of predictions are correct
- Precision: 82% of predicted fallacies are actual fallacies
- Recall: 88% of actual fallacies are detected
- F1 Score: 85% harmonic mean of precision and recall
Complete Example
Here’s a complete end-to-end example:Understanding the NL2FOL Class
The core logic is implemented in theNL2FOL class from src/nl_to_fol.py:15. Here’s how it works:
Initialization
Conversion Pipeline
Theconvert_to_first_order_logic() method executes these steps:
Advanced Usage
Custom Sentence Processing
You can process individual sentences programmatically:Interpreting SMT Results
To understand why a formula was classified a certain way:Troubleshooting
Rate limits with OpenAI API
Rate limits with OpenAI API
Add delays between requests or reduce dataset size:
GPU memory issues
GPU memory issues
Use smaller models or CPU inference:
CVC4 timeouts
CVC4 timeouts
The code automatically enables finite model finding for “unknown” results. If still timing out, the formula may be too complex.
Empty results column
Empty results column
Check that:
- CVC4/CVC5 binary exists and is executable
- SMT files were generated in results/
<run_name>_smt/ - FOL formulas are well-formed in the CSV
Next Steps
API Reference
Explore detailed API documentation
Examples
View more complex usage examples
