Overview
NL2FOL provides utilities to process datasets of natural language statements for logical fallacy detection. This guide shows you how to work with built-in datasets and create your own.
Dataset Structure
Datasets in NL2FOL use a simple CSV format with the following columns:
The natural language statement or argument to analyze
Binary label: 0 for logical fallacy, 1 for valid logical reasoning
Alternative field name for the text content (merged with articles)
Alternative field name for the text content (merged with articles)
Built-in Datasets
NL2FOL includes several pre-configured datasets in the data/ directory:
Logic Fallacies File: data/fallacies.csvGeneral logical fallacies from various domains with labeled fallacy types.
Climate Fallacies File: data/fallacies_climate.csvClimate change-related arguments with identified logical fallacies.
NLI Fallacies File: data/nli_fallacies_test.csvNatural Language Inference-based fallacious reasoning examples.
NLI Entailments File: data/nli_entailments_test.csvValid logical entailments for comparison and evaluation.
Using Built-in Datasets
The setup_dataset() function (defined in src/nl_to_fol.py:389) loads and prepares datasets:
import pandas as pd
from nl_to_fol import setup_dataset
# Load a balanced dataset of fallacies and valid arguments
df = setup_dataset( fallacy_set = 'logic' , length = 100 )
print (df.head())
print ( f "Dataset shape: { df.shape } " )
print ( f "Label distribution: \n { df[ 'label' ].value_counts() } " )
Available Dataset Types
logic
logicclimate
nli
folio
General logical fallacies dataset df = setup_dataset( fallacy_set = 'logic' , length = 100 )
Loads from data/fallacies.csv and data/nli_entailments_test.csv, creating a balanced dataset with:
100 fallacious arguments (label=0)
100 valid arguments (label=1)
Climate change fallacies dataset df = setup_dataset( fallacy_set = 'logicclimate' , length = 50 )
Loads climate-related arguments from data/fallacies_climate.csv:
50 climate fallacies (label=0)
50 valid climate arguments (label=1)
NLI-based fallacies dataset df = setup_dataset( fallacy_set = 'nli' , length = 200 )
Loads from data/nli_fallacies_test.csv and data/nli_entailments_test.csv:
200 NLI fallacies (label=0)
200 valid NLI entailments (label=1)
FOLIO dataset df = setup_dataset( fallacy_set = 'folio' , length = 100 )
Loads from data/folio.csv for first-order logic inference tasks.
Processing a Dataset
Here’s a complete example of processing a dataset:
import pandas as pd
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from nl_to_fol import NL2FOL , setup_dataset
def process_custom_dataset ():
# Initialize models (GPT-4 example)
model_type = 'gpt'
pipeline = None
tokenizer = None
nli_model_name = "microsoft/deberta-large-mnli"
nli_tokenizer = AutoTokenizer.from_pretrained(nli_model_name)
nli_model = AutoModelForSequenceClassification.from_pretrained(nli_model_name)
# Load dataset
df = setup_dataset( fallacy_set = 'logic' , length = 10 )
# Storage for results
claims = []
implications = []
final_lfs = []
final_lfs2 = []
# Process each row
for i, row in df.iterrows():
print ( f "Processing { i + 1 } / { len (df) } ..." )
nl2fol = NL2FOL(
sentence = row[ 'articles' ],
model_type = model_type,
pipeline = pipeline,
tokenizer = tokenizer,
nli_model = nli_model,
nli_tokenizer = nli_tokenizer,
debug = False # Set to True for detailed output
)
lf1, lf2 = nl2fol.convert_to_first_order_logic()
claims.append(nl2fol.claim)
implications.append(nl2fol.implication)
final_lfs.append(lf1)
final_lfs2.append(lf2)
# Add results to dataframe
df[ 'Claim' ] = claims
df[ 'Implication' ] = implications
df[ 'Logical Form 1' ] = final_lfs
df[ 'Logical Form 2' ] = final_lfs2
# Save results
df.to_csv( 'results/processed_dataset.csv' , index = False )
print ( " \n Processing complete! Results saved to results/processed_dataset.csv" )
return df
if __name__ == "__main__" :
results = process_custom_dataset()
print (results[[ 'articles' , 'label' , 'Claim' , 'Implication' ]].head())
The setup_dataset() function automatically balances the dataset by sampling an equal number of fallacies and valid arguments.
Creating Custom Datasets
Prepare your CSV file
Create a CSV file with the required columns: articles, label, fallacy_type
"All politicians lie. Sarah is a politician. Therefore Sarah lies.", 0, hasty_generalization
"If it rains, the ground is wet. The ground is wet. Therefore it rained.", 0, affirming_consequent
"All mammals have lungs. Whales are mammals. Therefore whales have lungs.", 1, valid_syllogism
"Either we ban cars or pollution will kill us all.", 0, false_dilemma
Load your dataset
Use pandas to load and prepare your data: import pandas as pd
# Load custom dataset
df = pd.read_csv( 'custom_fallacies.csv' )
# Ensure required columns exist
if 'articles' not in df.columns:
df[ 'articles' ] = df[ 'sentence' ] # or other text column
if 'label' not in df.columns:
df[ 'label' ] = 0 # default to fallacy if not specified
print ( f "Loaded { len (df) } examples" )
Process your dataset
Use the same processing loop as shown above: for i, row in df.iterrows():
nl2fol = NL2FOL(
sentence = row[ 'articles' ],
model_type = 'gpt' ,
pipeline = None ,
tokenizer = None ,
nli_model = nli_model,
nli_tokenizer = nli_tokenizer
)
lf1, lf2 = nl2fol.convert_to_first_order_logic()
# Store results...
Batch Processing with Command Line
For large datasets, use the command-line interface:
python src/nl_to_fol.py \
--model_name gpt-4o \
--nli_model_name microsoft/deberta-large-mnli \
--run_name my_experiment \
--length 500 \
--dataset logic
Command-Line Arguments
Model name for text generation (gpt-4o, meta-llama/Llama-2-13b-hf, etc.)
HuggingFace model name for NLI (e.g., microsoft/deberta-large-mnli)
Name for the output CSV file (saved to results/{run_name}.csv)
Number of examples to process from each class (total will be 2x this)
Dataset type: logic, logicclimate, nli, or folio
Processed datasets are saved with the following additional columns:
Output Columns:
- Claim: Extracted claim from the sentence
- Implication: Extracted implication
- Referring Expressions - Claim: Entities in the claim
- Referring Expressions - Implication: Entities in the implication
- Property Implications: Property relationships
- Equal Entities: Entity equivalences
- Subset Entities: Subset relationships
- Claim Lfs: Logical form of the claim
- Implication Lfs: Logical form of the implication
- Logical Form: Final first - order logic formula (method 1 )
- Logical Form 2 : Final first - order logic formula (method 2 )
Example Output
articles, label
"All birds fly. Penguins are birds. Thus penguins fly.", 0
Dataset Statistics
Analyze your processed dataset:
import pandas as pd
df = pd.read_csv( 'results/my_experiment.csv' )
print ( "Dataset Statistics:" )
print ( f "Total examples: { len (df) } " )
print ( f "Fallacies: { (df[ 'label' ] == 0 ).sum() } " )
print ( f "Valid arguments: { (df[ 'label' ] == 1 ).sum() } " )
print ( f " \n Successful conversions: { df[ 'Logical Form' ].notna().sum() } " )
print ( f "Failed conversions: { df[ 'Logical Form' ].isna().sum() } " )
# Average formula complexity
df[ 'formula_length' ] = df[ 'Logical Form' ].str.len()
print ( f " \n Average formula length: { df[ 'formula_length' ].mean() :.1f} characters" )
Working with Multiple Datasets
Combine multiple datasets for comprehensive analysis:
import pandas as pd
from nl_to_fol import setup_dataset
# Load multiple dataset types
logic_df = setup_dataset( fallacy_set = 'logic' , length = 100 )
climate_df = setup_dataset( fallacy_set = 'logicclimate' , length = 50 )
nli_df = setup_dataset( fallacy_set = 'nli' , length = 150 )
# Add source labels
logic_df[ 'source' ] = 'logic'
climate_df[ 'source' ] = 'climate'
nli_df[ 'source' ] = 'nli'
# Combine
combined_df = pd.concat([logic_df, climate_df, nli_df], ignore_index = True )
print ( f "Combined dataset size: { len (combined_df) } " )
print (combined_df[ 'source' ].value_counts())
Processing large datasets can be time-consuming:
GPT-4: ~15-20 seconds per example (API rate limits apply)
Llama 13B: ~5-10 seconds per example (GPU-dependent)
For a 1000-example dataset:
GPT-4: ~4-6 hours
Llama 13B: ~2-3 hours
Optimization Tips
Batch processing
Process datasets in smaller batches and save intermediate results: batch_size = 50
for i in range ( 0 , len (df), batch_size):
batch_df = df.iloc[i:i + batch_size]
# Process batch...
batch_df.to_csv( f 'results/batch_ { i } .csv' , index = False )
Enable multiprocessing
For Llama models, process multiple examples in parallel if you have multiple GPUs.
Cache results
Store intermediate results to avoid reprocessing on failures: if os.path.exists( 'cache/intermediate.csv' ):
df = pd.read_csv( 'cache/intermediate.csv' )
Use debug mode selectively
Only enable debug=True for troubleshooting, not production runs.
Next Steps
SMT Solving Convert logical forms to SMT and verify with CVC5
Evaluation Measure accuracy and performance metrics
Model Backends Choose the right model for your dataset
API Reference Explore advanced configuration options