Quick Start

Overview

This guide walks you through running the complete NL2FOL pipeline: from translating natural language to first-order logic, converting to SMT format, and evaluating results.

Basic Workflow

Convert Natural Language to First-Order Logic

Use nl_to_fol.py to process a dataset and generate logical formulas

Convert FOL to SMT and Run Solver

Use fol_to_cvc.py to convert formulas to SMT format and verify them

Evaluate Results

Use get_metrics.py to calculate accuracy, precision, recall, and F1 score

Step 1: Natural Language to FOL

The main entry point is nl_to_fol.py, which processes datasets and generates first-order logic formulas.

Command Syntax

python3 src/nl_to_fol.py \
  --model_name <model_name> \
  --nli_model_name <nli_model_name> \
  --run_name <run_name> \
  --dataset <logic|logicclimate|nli|folio> \
  --length <number_of_samples>

Example: Using GPT-4

python3 src/nl_to_fol.py \
  --model_name gpt-4o \
  --nli_model_name roberta-large-mnli \
  --run_name my_first_run \
  --dataset logic \
  --length 10

GPT models (starting with “gpt-”) use the OpenAI API. Ensure your OPENAI_API_KEY environment variable is set.

Example: Using Llama

python3 src/nl_to_fol.py \
  --model_name meta-llama/Llama-2-7b-hf \
  --nli_model_name roberta-large-mnli \
  --run_name llama_run \
  --dataset logic \
  --length 10

Available Datasets

# General logical fallacies (3,227 examples)
python3 src/nl_to_fol.py \
  --model_name gpt-4o \
  --nli_model_name roberta-large-mnli \
  --run_name logic_test \
  --dataset logic \
  --length 50

Output

This generates a CSV file at results/<run_name>.csv with columns:

articles: Original natural language text
Claim: Extracted claim from the sentence
Implication: Extracted implication
Referring Expressions - Claim/Implication: Entity references
Property Implications: Property relationships
Equal Entities: Entities identified as equal
Subset Entities: Subset relationships between entities
Claim Lfs: Logical form for claim
Implication Lfs: Logical form for implication
Logical Form: Final combined formula (version 1)
Logical Form 2: Final combined formula (version 2)
label: Ground truth (0=fallacy, 1=valid)

Step 2: FOL to SMT Verification

Convert the generated logical formulas to SMT format and verify them using CVC4/CVC5.

Command Syntax

python3 src/fol_to_cvc.py <run_name>

Example

python3 src/fol_to_cvc.py my_first_run

This will:

Read the FOL formulas from results/my_first_run.csv
Generate SMT-LIB files in results/my_first_run_smt/
Run CVC4 on each formula
Save results to results/my_first_run_results.csv

SMT Results

The solver produces one of three results:

Valid

Formula is unsatisfiable (negation fails) - the argument is logically valid

LF

Formula is satisfiable or unknown - the argument contains a logical fallacy

Empty

Processing error occurred

The code uses “Logical Form 2” by default, which applies different quantifier scoping heuristics than “Logical Form”.

Step 3: Evaluate Results

Calculate performance metrics on the results.

Command Syntax

python3 eval/get_metrics.py <results_csv_path>

Example

python3 eval/get_metrics.py results/my_first_run_results.csv

Output Metrics

The script outputs:

(accuracy, precision, recall, f1_score)

Example output:

(0.85, 0.82, 0.88, 0.85)

This represents:

Accuracy: 85% of predictions are correct
Precision: 82% of predicted fallacies are actual fallacies
Recall: 88% of actual fallacies are detected
F1 Score: 85% harmonic mean of precision and recall

Complete Example

Here’s a complete end-to-end example:

# Step 1: Generate FOL formulas from 20 examples
python3 src/nl_to_fol.py \
  --model_name gpt-4o \
  --nli_model_name roberta-large-mnli \
  --run_name quickstart_demo \
  --dataset logic \
  --length 20

# Step 2: Convert to SMT and verify
python3 src/fol_to_cvc.py quickstart_demo

# Step 3: Evaluate results
python3 eval/get_metrics.py results/quickstart_demo_results.csv

Processing can take several minutes per example due to multiple LLM calls. Start with small dataset sizes (10-20 examples) for testing.

Understanding the NL2FOL Class

The core logic is implemented in the NL2FOL class from src/nl_to_fol.py:15. Here’s how it works:

Initialization

from nl_to_fol import NL2FOL

nl2fol = NL2FOL(
    sentence="All birds can fly. Penguins are birds. Therefore, penguins can fly.",
    model_type='gpt',
    pipeline=None,
    tokenizer=None,
    nli_model=nli_model,
    nli_tokenizer=nli_tokenizer,
    debug=True
)

Conversion Pipeline

The convert_to_first_order_logic() method executes these steps:

Extract Claim and Implication

Separates the premise (claim) from the conclusion (implication)

nl2fol.extract_claim_and_implication()
# claim: "All birds can fly. Penguins are birds."
# implication: "Penguins can fly."

Get Referring Expressions

Identifies entities mentioned in the text

nl2fol.get_referring_expressions()
# claim_ref_exp: "birds, penguins"

Determine Entity Relations

Uses NLI to find relationships between entities

nl2fol.get_entity_relations()
# subset_entities: [("penguins", "birds")]

Map Entities to Variables

Creates logical variable mappings

nl2fol.get_entity_mapping()
# entity_mappings: {"birds": "a", "penguins": "b"}

Extract Properties

Identifies predicates and properties

nl2fol.get_properties()
# claim_properties: "canFly(a)"

Generate Logical Forms

Creates FOL formulas for claim and implication

nl2fol.get_fol()
# claim_lf: "canFly(a) and isPenguin(b)"
# implication_lf: "canFly(b)"

Combine into Final Formula

Applies quantifiers and creates the final implication

nl2fol.get_final_lf()
# final_lf: "(exists a (canFly(a))) -> (forall b (canFly(b)))"

Advanced Usage

Custom Sentence Processing

You can process individual sentences programmatically:

import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from nl_to_fol import NL2FOL
from openai import OpenAI

# Initialize NLI model
nli_tokenizer = AutoTokenizer.from_pretrained("roberta-large-mnli")
nli_model = AutoModelForSequenceClassification.from_pretrained("roberta-large-mnli")

# Process a sentence
sentence = "All politicians are corrupt. John is a politician. Therefore, John is corrupt."

nl2fol = NL2FOL(
    sentence=sentence,
    model_type='gpt',
    pipeline=None,
    tokenizer=None,
    nli_model=nli_model,
    nli_tokenizer=nli_tokenizer,
    debug=True
)

final_lf, final_lf2 = nl2fol.convert_to_first_order_logic()
print(f"Logical Form: {final_lf}")

Interpreting SMT Results

To understand why a formula was classified a certain way:

python3 src/interpret_smt_results.py \
  results/my_run_smt/5_out.txt \
  results/my_run.csv

This provides detailed explanations of the SMT solver’s reasoning.

Troubleshooting

Rate limits with OpenAI API

Add delays between requests or reduce dataset size:

--length 5

GPU memory issues

Use smaller models or CPU inference:

export CUDA_VISIBLE_DEVICES=""

CVC4 timeouts

The code automatically enables finite model finding for “unknown” results. If still timing out, the formula may be too complex.

Empty results column

Check that:

CVC4/CVC5 binary exists and is executable
SMT files were generated in results/<run_name>_smt/
FOL formulas are well-formed in the CSV

Next Steps

API Reference

Explore detailed API documentation

Examples

View more complex usage examples

Get Started

Concepts

Usage Guide

Examples

Overview

Basic Workflow

Step 1: Natural Language to FOL

Command Syntax

Example: Using GPT-4

Example: Using Llama

Available Datasets

Output

Step 2: FOL to SMT Verification

Command Syntax

Example

SMT Results

Valid

LF

Empty

Step 3: Evaluate Results

Command Syntax

Example

Output Metrics

Complete Example

Understanding the NL2FOL Class

Initialization

Conversion Pipeline

Advanced Usage

Custom Sentence Processing

Interpreting SMT Results

Troubleshooting

Next Steps

API Reference

Examples

Build docs developers (and LLMs) love

Get Started

Concepts

Usage Guide

Examples

​Overview

​Basic Workflow

​Step 1: Natural Language to FOL

​Command Syntax

​Example: Using GPT-4

​Example: Using Llama

​Available Datasets

​Output

​Step 2: FOL to SMT Verification

​Command Syntax

​Example

​SMT Results

Valid

LF

Empty

​Step 3: Evaluate Results

​Command Syntax

​Example

​Output Metrics

​Complete Example

​Understanding the NL2FOL Class

​Initialization

​Conversion Pipeline

​Advanced Usage

​Custom Sentence Processing

​Interpreting SMT Results

​Troubleshooting

​Next Steps

API Reference

Examples

Build docs developers (and LLMs) love

Overview

Basic Workflow

Step 1: Natural Language to FOL

Command Syntax

Example: Using GPT-4

Example: Using Llama

Available Datasets

Output

Step 2: FOL to SMT Verification

Command Syntax

Example

SMT Results

Step 3: Evaluate Results

Command Syntax

Example

Output Metrics

Complete Example

Understanding the NL2FOL Class

Initialization

Conversion Pipeline

Advanced Usage

Custom Sentence Processing

Interpreting SMT Results

Troubleshooting

Next Steps