Skip to main content

Overview

The nl_to_fol.py script converts natural language arguments into first-order logic (FOL) formulas. This is the first and most critical step in the NL2FOL pipeline for detecting logical fallacies.

How It Works

The translation process follows a multi-stage pipeline:
1

Claim and Implication Extraction

The system identifies the claim and implication components of the argument using LLM prompts.
2

Referring Expression Identification

Extracts referring expressions (entities) from both the claim and implication.
3

Entity Relationship Detection

Determines relationships between entities (equal, subset, or unrelated) using NLI models or LLM reasoning.
4

Entity Mapping

Maps entities to logical variables (a, b, c, etc.) for formal representation.
5

Property Extraction

Identifies properties and predicates associated with each entity.
6

FOL Formula Generation

Generates first-order logic formulas for claim and implication with proper quantifiers.
7

Final Formula Assembly

Combines claim and implication into a final formula of the form: (Claim) → (Implication)

Command Usage

Basic Command

python3 src/nl_to_fol.py \
  --model_name <your_model_name> \
  --nli_model_name <your_nli_model_name> \
  --run_name <run_name> \
  --dataset <logic or logicclimate> \
  --length <number_of_datapoints>

Parameters

model_name
string
required
The LLM model to use for translation. Supports:
  • HuggingFace model paths (e.g., meta-llama/Llama-2-7b-hf)
  • OpenAI models (e.g., gpt-4o) - prefix with ‘gpt’ to use GPT mode
nli_model_name
string
The Natural Language Inference model for entity relationship detection. Example: microsoft/deberta-v2-xlarge-mnli
run_name
string
required
Name for this run. Output will be saved to results/<run_name>.csv
dataset
string
required
Dataset to use for testing. Options:
  • logic - Standard logical fallacy dataset
  • logicclimate - Climate-specific logical fallacy dataset
  • folio - FOLIO dataset for evaluation
  • nli - NLI-based dataset
length
integer
required
Number of datapoints to sample from the dataset for processing.

Example Usage

python3 src/nl_to_fol.py \
  --model_name meta-llama/Llama-2-13b-hf \
  --nli_model_name microsoft/deberta-v2-xlarge-mnli \
  --run_name llama_experiment_1 \
  --dataset logic \
  --length 100

Output Format

The script generates a CSV file in results/<run_name>.csv with the following columns:
ColumnDescription
articlesOriginal natural language text
labelGround truth label (0=fallacy, 1=valid)
ClaimExtracted claim statement
ImplicationExtracted implication statement
Referring Expressions - ClaimEntities in the claim
Referring Expressions - ImplicationEntities in the implication
Property ImplicationsRelationships between properties
Equal EntitiesEntities determined to be equal
Subset EntitiesSubset relationships between entities
Claim LfsLogical form of the claim
Implication LfsLogical form of the implication
Logical FormFinal FOL formula (version 1)
Logical Form 2Final FOL formula (version 2, with existential quantifiers)

Implementation Details

NL2FOL Class

The core translation logic is implemented in the NL2FOL class:
nl2fol = NL2FOL(
    sentence=text,
    model_type='gpt',  # or 'llama'
    pipeline=pipeline,
    tokenizer=tokenizer,
    nli_model=nli_model,
    nli_tokenizer=nli_tokenizer,
    debug=True
)

final_lf1, final_lf2 = nl2fol.convert_to_first_order_logic()

Debug Mode

The script runs with debug=True by default, printing intermediate results:
  • Extracted claim and implication
  • Referring expressions
  • Entity relationships
  • Property assignments
  • Intermediate logical forms
Processing time varies based on:
  • Model size and type (LLaMA models require GPU)
  • Dataset length
  • Complexity of arguments
Expect ~30-60 seconds per datapoint for LLM-based processing.

GPU Requirements

LLaMA models require GPU acceleration. The script automatically detects CUDA availability and displays GPU information at startup.Minimum recommended: 16GB GPU memory for 7B models, 24GB+ for 13B models.

Quantifier Strategies

The system generates two versions of the logical form: Logical Form 1: Uses heuristics for quantifier assignment based on entity relationships
  • Subset entities: subset gets exists, superset gets forall
  • Property implications: universally quantified
Logical Form 2: More conservative approach
  • All entity variables get existential quantifiers
  • Property implications remain universally quantified
Logical Form 2 is typically used for SMT solving as it provides better satisfiability checking results.

Prompt Engineering

The system uses specialized prompts located in prompts/:
  • prompt_nl_to_ci.txt - Claim/implication extraction
  • prompt_referring_expressions.txt - Entity identification
  • prompt_entity_relation.txt - Entity relationship detection
  • prompt_properties.txt - Property extraction
  • prompt_fol.txt - FOL formula generation

Troubleshooting

Common Issues

CUDA Out of Memory
  • Reduce model size or batch processing
  • Use model quantization or CPU offloading
Empty Results
  • Check that prompts directory exists in working directory
  • Verify model has necessary permissions and access
NLI Model Errors
  • Ensure NLI model is compatible (sequence classification models)
  • Check tokenizer compatibility

Next Steps

After generating FOL formulas:
  1. Convert to SMT format using fol_to_cvc.py
  2. Run SMT solver for satisfiability checking
  3. Interpret results and evaluate performance
See FOL to SMT Conversion for the next step in the pipeline.

Build docs developers (and LLMs) love