Skip to main content

Overview

NL2FOL supports two model backends for the translation pipeline:
  • GPT-4 via OpenAI API (recommended for best accuracy)
  • Llama models via HuggingFace Transformers (for local/self-hosted deployments)
This guide shows you how to configure and use each backend.

Model Selection

The model backend is determined by the model_type parameter when initializing the NL2FOL class:
model_type = 'gpt'   # Use GPT-4
# or
model_type = 'llama' # Use Llama
The model type is set in src/nl_to_fol.py:19 during initialization and affects how get_llm_result() processes prompts throughout the pipeline.

Using GPT-4 (OpenAI)

1

Set up OpenAI API

First, configure your OpenAI API key:
export OPENAI_API_KEY="your-api-key-here"
2

Initialize with GPT backend

from transformers import AutoModelForSequenceClassification, AutoTokenizer
from nl_to_fol import NL2FOL
from openai import OpenAI

client = OpenAI()

# GPT-4 configuration
model_type = 'gpt'
pipeline = None  # Not needed for GPT
tokenizer = None  # Not needed for GPT

# Initialize NLI model (still required for entity relations)
nli_model_name = "microsoft/deberta-large-mnli"
nli_tokenizer = AutoTokenizer.from_pretrained(nli_model_name)
nli_model = AutoModelForSequenceClassification.from_pretrained(nli_model_name)
3

Create NL2FOL instance

sentence = "If all mammals have lungs, and whales are mammals, then whales have lungs."

nl2fol = NL2FOL(
    sentence=sentence,
    model_type=model_type,
    pipeline=pipeline,
    tokenizer=tokenizer,
    nli_model=nli_model,
    nli_tokenizer=nli_tokenizer,
    debug=True
)

final_lf, final_lf2 = nl2fol.convert_to_first_order_logic()

GPT Implementation Details

When using GPT, the get_llm_result() method (defined in src/nl_to_fol.py:49-67) calls the OpenAI API:
if model_type == 'gpt':
    completion = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "user", "content": prompt}
        ]
    )
    return completion.choices[0].message.content
GPT-4 provides the most accurate results for complex logical reasoning tasks. The system uses gpt-4o by default.

Using Llama Models

1

Install dependencies

Ensure you have the required packages:
pip install torch transformers accelerate
2

Initialize Llama pipeline

import torch
import transformers
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from nl_to_fol import NL2FOL

# Llama model configuration
model_name = "meta-llama/Llama-2-13b-hf"  # or your preferred Llama variant
model_type = 'llama'

# Initialize text generation pipeline
tokenizer = AutoTokenizer.from_pretrained(model_name)
pipeline = transformers.pipeline(
    "text-generation",
    model=model_name,
    torch_dtype=torch.float16,
    max_length=1024,
    device_map="auto",
)

# Initialize NLI model
nli_model_name = "microsoft/deberta-large-mnli"
nli_tokenizer = AutoTokenizer.from_pretrained(nli_model_name)
nli_model = AutoModelForSequenceClassification.from_pretrained(nli_model_name)
3

Create NL2FOL instance

sentence = "All humans are mortal. Socrates is a human. Therefore, Socrates is mortal."

nl2fol = NL2FOL(
    sentence=sentence,
    model_type=model_type,
    pipeline=pipeline,
    tokenizer=tokenizer,
    nli_model=nli_model,
    nli_tokenizer=nli_tokenizer,
    debug=True
)

final_lf, final_lf2 = nl2fol.convert_to_first_order_logic()

Llama Implementation Details

For Llama models, the get_llm_result() method uses HuggingFace pipelines:
if model_type == 'llama':
    sequences = self.pipeline(
        prompt,
        do_sample=False,
        num_return_sequences=1,
        eos_token_id=self.tokenizer.eos_token_id
    )
    return sequences[0]["generated_text"].removeprefix(prompt)
Llama models require significant GPU memory. A 13B parameter model needs ~26GB GPU memory with float16 precision.

Command-Line Usage

You can specify the model backend when running the main script:
python src/nl_to_fol.py \
  --model_name gpt-4o \
  --nli_model_name microsoft/deberta-large-mnli \
  --run_name experiment_gpt4 \
  --length 100 \
  --dataset logic
The script automatically detects GPT models when model_name starts with 'gpt' (see src/nl_to_fol.py:446).

Model Comparison

GPT-4

Pros:
  • Highest accuracy for complex reasoning
  • No local GPU required
  • Faster setup
  • Better prompt following
Cons:
  • Requires API key and internet
  • Per-token costs
  • Rate limits apply

Llama

Pros:
  • Self-hosted (no API costs)
  • Data privacy
  • Unlimited usage
  • Customizable
Cons:
  • Requires GPU infrastructure
  • Lower accuracy than GPT-4
  • Slower inference
  • More complex setup

Entity Relation Detection

Both backends use the same NLI model for entity relation detection. The model choice only affects the main translation steps:
  1. Claim and implication extraction (extract_claim_and_implication())
  2. Referring expressions (get_referring_expressions())
  3. Property extraction (get_properties())
  4. First-order logic generation (get_fol())
Property entailment checking (check_entailment() in src/nl_to_fol.py:154) always uses GPT regardless of the main model backend for consistency.

GPU Requirements

  • GPU: Not required (API-based)
  • RAM: 8GB+ (for NLI model)
  • NLI Model: ~1.4GB VRAM
  • GPU: 1x A100 40GB or 2x RTX 3090
  • VRAM: ~14GB with float16
  • RAM: 16GB+
  • GPU: 1x A100 80GB or 2x A6000
  • VRAM: ~26GB with float16
  • RAM: 32GB+
  • GPU: 4x A100 80GB or 8x A6000
  • VRAM: ~140GB with float16
  • RAM: 64GB+

Switching Models at Runtime

You can dynamically switch models by changing the model_type parameter:
# Force a specific method to use GPT even if instance uses Llama
result = nl2fol.get_llm_result(prompt, model_type='gpt')

# Or use the instance's default model
result = nl2fol.get_llm_result(prompt)
This is useful for hybrid approaches where you want to use Llama for most operations but GPT-4 for critical reasoning steps.

Best Practices

1

Start with GPT-4

Use GPT-4 for initial experiments to establish a quality baseline.
2

Optimize prompts

Fine-tune your prompts with GPT-4 before trying Llama models.
3

Test with smaller Llama

If moving to Llama, start with the 7B model to validate your pipeline.
4

Scale up as needed

Move to 13B or 70B Llama models only if 7B performance is insufficient.

Next Steps

Basic Usage

Get started with a simple example

Custom Datasets

Process your own data

Translation Pipeline

Understand the conversion process

API Reference

Explore the NL2FOL class methods

Build docs developers (and LLMs) love