Overview
NL2FOL supports two model backends for the translation pipeline:
GPT-4 via OpenAI API (recommended for best accuracy)
Llama models via HuggingFace Transformers (for local/self-hosted deployments)
This guide shows you how to configure and use each backend.
Model Selection
The model backend is determined by the model_type parameter when initializing the NL2FOL class:
model_type = 'gpt' # Use GPT-4
# or
model_type = 'llama' # Use Llama
The model type is set in src/nl_to_fol.py:19 during initialization and affects how get_llm_result() processes prompts throughout the pipeline.
Using GPT-4 (OpenAI)
Set up OpenAI API
First, configure your OpenAI API key: export OPENAI_API_KEY = "your-api-key-here"
Initialize with GPT backend
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from nl_to_fol import NL2FOL
from openai import OpenAI
client = OpenAI()
# GPT-4 configuration
model_type = 'gpt'
pipeline = None # Not needed for GPT
tokenizer = None # Not needed for GPT
# Initialize NLI model (still required for entity relations)
nli_model_name = "microsoft/deberta-large-mnli"
nli_tokenizer = AutoTokenizer.from_pretrained(nli_model_name)
nli_model = AutoModelForSequenceClassification.from_pretrained(nli_model_name)
Create NL2FOL instance
sentence = "If all mammals have lungs, and whales are mammals, then whales have lungs."
nl2fol = NL2FOL(
sentence = sentence,
model_type = model_type,
pipeline = pipeline,
tokenizer = tokenizer,
nli_model = nli_model,
nli_tokenizer = nli_tokenizer,
debug = True
)
final_lf, final_lf2 = nl2fol.convert_to_first_order_logic()
GPT Implementation Details
When using GPT, the get_llm_result() method (defined in src/nl_to_fol.py:49-67) calls the OpenAI API:
if model_type == 'gpt' :
completion = client.chat.completions.create(
model = "gpt-4o" ,
messages = [
{ "role" : "user" , "content" : prompt}
]
)
return completion.choices[ 0 ].message.content
GPT-4 provides the most accurate results for complex logical reasoning tasks. The system uses gpt-4o by default.
Using Llama Models
Install dependencies
Ensure you have the required packages: pip install torch transformers accelerate
Initialize Llama pipeline
import torch
import transformers
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from nl_to_fol import NL2FOL
# Llama model configuration
model_name = "meta-llama/Llama-2-13b-hf" # or your preferred Llama variant
model_type = 'llama'
# Initialize text generation pipeline
tokenizer = AutoTokenizer.from_pretrained(model_name)
pipeline = transformers.pipeline(
"text-generation" ,
model = model_name,
torch_dtype = torch.float16,
max_length = 1024 ,
device_map = "auto" ,
)
# Initialize NLI model
nli_model_name = "microsoft/deberta-large-mnli"
nli_tokenizer = AutoTokenizer.from_pretrained(nli_model_name)
nli_model = AutoModelForSequenceClassification.from_pretrained(nli_model_name)
Create NL2FOL instance
sentence = "All humans are mortal. Socrates is a human. Therefore, Socrates is mortal."
nl2fol = NL2FOL(
sentence = sentence,
model_type = model_type,
pipeline = pipeline,
tokenizer = tokenizer,
nli_model = nli_model,
nli_tokenizer = nli_tokenizer,
debug = True
)
final_lf, final_lf2 = nl2fol.convert_to_first_order_logic()
Llama Implementation Details
For Llama models, the get_llm_result() method uses HuggingFace pipelines:
if model_type == 'llama' :
sequences = self .pipeline(
prompt,
do_sample = False ,
num_return_sequences = 1 ,
eos_token_id = self .tokenizer.eos_token_id
)
return sequences[ 0 ][ "generated_text" ].removeprefix(prompt)
Llama models require significant GPU memory. A 13B parameter model needs ~26GB GPU memory with float16 precision.
Command-Line Usage
You can specify the model backend when running the main script:
python src/nl_to_fol.py \
--model_name gpt-4o \
--nli_model_name microsoft/deberta-large-mnli \
--run_name experiment_gpt4 \
--length 100 \
--dataset logic
The script automatically detects GPT models when model_name starts with 'gpt' (see src/nl_to_fol.py:446). python src/nl_to_fol.py \
--model_name meta-llama/Llama-2-13b-hf \
--nli_model_name microsoft/deberta-large-mnli \
--run_name experiment_llama \
--length 100 \
--dataset logic
Any model name that doesn’t start with 'gpt' is treated as a Llama model and loaded via HuggingFace.
Model Comparison
GPT-4 Pros:
Highest accuracy for complex reasoning
No local GPU required
Faster setup
Better prompt following
Cons:
Requires API key and internet
Per-token costs
Rate limits apply
Llama Pros:
Self-hosted (no API costs)
Data privacy
Unlimited usage
Customizable
Cons:
Requires GPU infrastructure
Lower accuracy than GPT-4
Slower inference
More complex setup
Entity Relation Detection
Both backends use the same NLI model for entity relation detection. The model choice only affects the main translation steps:
Claim and implication extraction (extract_claim_and_implication())
Referring expressions (get_referring_expressions())
Property extraction (get_properties())
First-order logic generation (get_fol())
Property entailment checking (check_entailment() in src/nl_to_fol.py:154) always uses GPT regardless of the main model backend for consistency.
GPU Requirements
GPU: Not required (API-based)
RAM: 8GB+ (for NLI model)
NLI Model: ~1.4GB VRAM
GPU: 1x A100 40GB or 2x RTX 3090
VRAM: ~14GB with float16
RAM: 16GB+
GPU: 1x A100 80GB or 2x A6000
VRAM: ~26GB with float16
RAM: 32GB+
GPU: 4x A100 80GB or 8x A6000
VRAM: ~140GB with float16
RAM: 64GB+
Switching Models at Runtime
You can dynamically switch models by changing the model_type parameter:
# Force a specific method to use GPT even if instance uses Llama
result = nl2fol.get_llm_result(prompt, model_type = 'gpt' )
# Or use the instance's default model
result = nl2fol.get_llm_result(prompt)
This is useful for hybrid approaches where you want to use Llama for most operations but GPT-4 for critical reasoning steps.
Best Practices
Start with GPT-4
Use GPT-4 for initial experiments to establish a quality baseline.
Optimize prompts
Fine-tune your prompts with GPT-4 before trying Llama models.
Test with smaller Llama
If moving to Llama, start with the 7B model to validate your pipeline.
Scale up as needed
Move to 13B or 70B Llama models only if 7B performance is insufficient.
Next Steps
Basic Usage Get started with a simple example
Custom Datasets Process your own data
Translation Pipeline Understand the conversion process
API Reference Explore the NL2FOL class methods