Model Backends

Overview

NL2FOL supports two model backends for the translation pipeline:

GPT-4 via OpenAI API (recommended for best accuracy)
Llama models via HuggingFace Transformers (for local/self-hosted deployments)

This guide shows you how to configure and use each backend.

Model Selection

The model backend is determined by the model_type parameter when initializing the NL2FOL class:

model_type = 'gpt'   # Use GPT-4
# or
model_type = 'llama' # Use Llama

The model type is set in src/nl_to_fol.py:19 during initialization and affects how get_llm_result() processes prompts throughout the pipeline.

Using GPT-4 (OpenAI)

Set up OpenAI API

First, configure your OpenAI API key:

export OPENAI_API_KEY="your-api-key-here"

Initialize with GPT backend

from transformers import AutoModelForSequenceClassification, AutoTokenizer
from nl_to_fol import NL2FOL
from openai import OpenAI

client = OpenAI()

# GPT-4 configuration
model_type = 'gpt'
pipeline = None  # Not needed for GPT
tokenizer = None  # Not needed for GPT

# Initialize NLI model (still required for entity relations)
nli_model_name = "microsoft/deberta-large-mnli"
nli_tokenizer = AutoTokenizer.from_pretrained(nli_model_name)
nli_model = AutoModelForSequenceClassification.from_pretrained(nli_model_name)

Create NL2FOL instance

sentence = "If all mammals have lungs, and whales are mammals, then whales have lungs."

nl2fol = NL2FOL(
    sentence=sentence,
    model_type=model_type,
    pipeline=pipeline,
    tokenizer=tokenizer,
    nli_model=nli_model,
    nli_tokenizer=nli_tokenizer,
    debug=True
)

final_lf, final_lf2 = nl2fol.convert_to_first_order_logic()

GPT Implementation Details

When using GPT, the get_llm_result() method (defined in src/nl_to_fol.py:49-67) calls the OpenAI API:

if model_type == 'gpt':
    completion = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "user", "content": prompt}
        ]
    )
    return completion.choices[0].message.content

GPT-4 provides the most accurate results for complex logical reasoning tasks. The system uses gpt-4o by default.

Using Llama Models

Install dependencies

Ensure you have the required packages:

pip install torch transformers accelerate

Initialize Llama pipeline

import torch
import transformers
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from nl_to_fol import NL2FOL

# Llama model configuration
model_name = "meta-llama/Llama-2-13b-hf"  # or your preferred Llama variant
model_type = 'llama'

# Initialize text generation pipeline
tokenizer = AutoTokenizer.from_pretrained(model_name)
pipeline = transformers.pipeline(
    "text-generation",
    model=model_name,
    torch_dtype=torch.float16,
    max_length=1024,
    device_map="auto",
)

# Initialize NLI model
nli_model_name = "microsoft/deberta-large-mnli"
nli_tokenizer = AutoTokenizer.from_pretrained(nli_model_name)
nli_model = AutoModelForSequenceClassification.from_pretrained(nli_model_name)

Create NL2FOL instance

sentence = "All humans are mortal. Socrates is a human. Therefore, Socrates is mortal."

nl2fol = NL2FOL(
    sentence=sentence,
    model_type=model_type,
    pipeline=pipeline,
    tokenizer=tokenizer,
    nli_model=nli_model,
    nli_tokenizer=nli_tokenizer,
    debug=True
)

final_lf, final_lf2 = nl2fol.convert_to_first_order_logic()

Llama Implementation Details

For Llama models, the get_llm_result() method uses HuggingFace pipelines:

if model_type == 'llama':
    sequences = self.pipeline(
        prompt,
        do_sample=False,
        num_return_sequences=1,
        eos_token_id=self.tokenizer.eos_token_id
    )
    return sequences[0]["generated_text"].removeprefix(prompt)

Llama models require significant GPU memory. A 13B parameter model needs ~26GB GPU memory with float16 precision.

Command-Line Usage

You can specify the model backend when running the main script:

GPT-4
Llama

python src/nl_to_fol.py \
  --model_name gpt-4o \
  --nli_model_name microsoft/deberta-large-mnli \
  --run_name experiment_gpt4 \
  --length 100 \
  --dataset logic

The script automatically detects GPT models when model_name starts with 'gpt' (see src/nl_to_fol.py:446).

python src/nl_to_fol.py \
  --model_name meta-llama/Llama-2-13b-hf \
  --nli_model_name microsoft/deberta-large-mnli \
  --run_name experiment_llama \
  --length 100 \
  --dataset logic

Any model name that doesn’t start with 'gpt' is treated as a Llama model and loaded via HuggingFace.

Model Comparison

GPT-4

Pros:

Highest accuracy for complex reasoning
No local GPU required
Faster setup
Better prompt following

Cons:

Requires API key and internet
Per-token costs
Rate limits apply

Llama

Pros:

Self-hosted (no API costs)
Data privacy
Unlimited usage
Customizable

Cons:

Requires GPU infrastructure
Lower accuracy than GPT-4
Slower inference
More complex setup

Entity Relation Detection

Both backends use the same NLI model for entity relation detection. The model choice only affects the main translation steps:

Claim and implication extraction (extract_claim_and_implication())
Referring expressions (get_referring_expressions())
Property extraction (get_properties())
First-order logic generation (get_fol())

Property entailment checking (check_entailment() in src/nl_to_fol.py:154) always uses GPT regardless of the main model backend for consistency.

GPU Requirements

GPT-4 Backend

GPU: Not required (API-based)
RAM: 8GB+ (for NLI model)
NLI Model: ~1.4GB VRAM

Llama 7B

GPU: 1x A100 40GB or 2x RTX 3090
VRAM: ~14GB with float16
RAM: 16GB+

Llama 13B

GPU: 1x A100 80GB or 2x A6000
VRAM: ~26GB with float16
RAM: 32GB+

Llama 70B

GPU: 4x A100 80GB or 8x A6000
VRAM: ~140GB with float16
RAM: 64GB+

Switching Models at Runtime

You can dynamically switch models by changing the model_type parameter:

# Force a specific method to use GPT even if instance uses Llama
result = nl2fol.get_llm_result(prompt, model_type='gpt')

# Or use the instance's default model
result = nl2fol.get_llm_result(prompt)

This is useful for hybrid approaches where you want to use Llama for most operations but GPT-4 for critical reasoning steps.

Best Practices

Start with GPT-4

Use GPT-4 for initial experiments to establish a quality baseline.

Optimize prompts

Fine-tune your prompts with GPT-4 before trying Llama models.

Test with smaller Llama

If moving to Llama, start with the 7B model to validate your pipeline.

Scale up as needed

Move to 13B or 70B Llama models only if 7B performance is insufficient.

Next Steps

Basic Usage

Get started with a simple example

Custom Datasets

Process your own data

Translation Pipeline

Understand the conversion process

API Reference

Explore the NL2FOL class methods

Get Started

Concepts

Usage Guide

Examples

Model Backends

Overview

Model Selection

Using GPT-4 (OpenAI)

GPT Implementation Details

Using Llama Models

Llama Implementation Details

Command-Line Usage

Model Comparison

GPT-4

Llama

Entity Relation Detection

GPU Requirements

Switching Models at Runtime

Best Practices

Next Steps

Basic Usage

Custom Datasets

Translation Pipeline

API Reference

Build docs developers (and LLMs) love

Get Started

Concepts

Usage Guide

Examples

​Overview

​Model Selection

​Using GPT-4 (OpenAI)

​GPT Implementation Details

​Using Llama Models

​Llama Implementation Details

​Command-Line Usage

​Model Comparison

GPT-4

Llama

​Entity Relation Detection

​GPU Requirements

​Switching Models at Runtime

​Best Practices

​Next Steps

Basic Usage

Custom Datasets

Translation Pipeline

API Reference

Build docs developers (and LLMs) love

Overview

Model Selection

Using GPT-4 (OpenAI)

GPT Implementation Details

Using Llama Models

Llama Implementation Details

Command-Line Usage

Model Comparison

Entity Relation Detection

GPU Requirements

Switching Models at Runtime

Best Practices

Next Steps