Helper Functions

Overview

The helpers module provides utility functions for processing strings, extracting symbols from logical formulas, and manipulating predicate expressions. These functions are used throughout the NL2FOL pipeline.

String Processing

label_values

def label_values(input_string, map)

Labels values in a comma-separated string with their mapped characters. Parameters:

input_string (str): Comma-separated string of values
map (dict | str): Mapping dictionary or string representation of dict

Returns: str - Labeled string in format "value1: a, value2: b"

map = {"cats": "a", "animals": "b"}
result = label_values("cats, animals", map)
print(result)
# Output: "cats: a, animals: b"

first_non_empty_line

def first_non_empty_line(string)

Extracts the first non-empty line from a multi-line string. Parameters:

string (str): Input string with potential multiple lines

Returns: str | None - First non-empty line, or None if all lines are empty

text = "\n\nThis is the first line\nThis is second"
result = first_non_empty_line(text)
print(result)
# Output: "This is the first line"

This function is particularly useful for extracting clean output from LLM responses that may include leading whitespace or empty lines.

remove_text_after_last_parenthesis

def remove_text_after_last_parenthesis(input_string)

Removes all text after the last closing parenthesis in a string. Parameters:

input_string (str): String to process

Returns: str - String up to and including the last ), or original string if no ) found

formula = "forall x (P(x)) some extra text"
cleaned = remove_text_after_last_parenthesis(formula)
print(cleaned)
# Output: "forall x (P(x))"

Logical Formula Processing

extract_propositional_symbols

def extract_propositional_symbols(logical_form)

Extracts all single lowercase letter variables from a logical formula. Parameters:

logical_form (str): Logical formula string

Returns: set[str] - Set of single-character variable names (a-z)

formula = "exists a (P(a) and forall b (Q(b) -> R(a, b)))"
symbols = extract_propositional_symbols(formula)
print(symbols)
# Output: {'a', 'b'}

This function only extracts single lowercase letters. Multi-character variable names or uppercase variables are ignored.

split_string_except_in_brackets

def split_string_except_in_brackets(string, delimiter)

Splits a string by delimiter, but ignores delimiters inside parentheses. Parameters:

string (str): String to split
delimiter (str): Delimiter character

Returns: list[str] - List of split segments

text = "P(a, b), Q(c, d), R(e)"
parts = split_string_except_in_brackets(text, ',')
print(parts)
# Output: ['P(a, b)', ' Q(c, d)', ' R(e)']

This function is essential for parsing comma-separated predicates where commas inside predicate arguments should not trigger splits.

Predicate Manipulation

fix_inconsistent_arities

def fix_inconsistent_arities(clauses1, clauses2)

Ensures predicates have consistent arities across two lists of clauses by truncating extra arguments. Parameters:

clauses1 (list[str]): First list of predicate clauses
clauses2 (list[str]): Second list of predicate clauses

Returns: tuple[str, str] - Two comma-separated strings with consistent arities Algorithm:

Extract all predicates and their arities from both lists
For predicates with multiple arities, keep the minimum
Truncate arguments in clauses that exceed the minimum arity

clauses1 = ["P(a, b, c)", "Q(x)"]
clauses2 = ["P(d)", "Q(y, z)"]

fixed1, fixed2 = fix_inconsistent_arities(clauses1, clauses2)
print(fixed1)  # Output: "P(a), Q(x)"
print(fixed2)  # Output: "P(d), Q(y)"

replace_variables

def replace_variables(mapping, clause)

Replaces variable letters in a clause with their corresponding entity names. Parameters:

mapping (dict): Dictionary mapping entity names to variable letters
clause (str): Predicate clause with variables

Returns: str - Clause with variables replaced by entity names Process:

Reverses the mapping (variable → entity)
Parses predicate name and arguments
Replaces variables in arguments with entity names
Reconstructs the clause

mapping = {"cats": "a", "animals": "b"}
clause = "IsSubsetOf(a, b)"

result = replace_variables(mapping, clause)
print(result)
# Output: "IsSubsetOf(cats, animals)"

substitute_variables

def substitute_variables(clause1, clause2, start_char)

Substitutes variables in two clauses with new variable names starting from a given character. Parameters:

clause1 (str): First clause
clause2 (str): Second clause
start_char (str): Starting character for new variable names

Returns: tuple[str, str, str] - (substituted_clause1, substituted_clause2, next_char) Process:

Creates a mapping for variables encountered
Assigns new variables starting from start_char
Replaces variables in both clauses
Returns next available character for further use

clause1 = "P(x, y)"
clause2 = "Q(x, z)"

c1, c2, next_char = substitute_variables(clause1, clause2, 'a')
print(c1)        # Output: "P(a, b)"
print(c2)        # Output: "Q(a, c)"
print(next_char) # Output: "d"

Variables that appear in both clauses are assigned the same new variable name, preserving their identity across clauses.

Usage in Pipeline

These helper functions are used throughout the NL2FOL pipeline:

from helpers import label_values, first_non_empty_line

# Label referring expressions with variables
labeled = label_values(claim_ref_exp, entity_mappings)
prompt = f"Extract properties for: {labeled}"

# Get clean LLM response
response = get_llm_result(prompt)
properties = first_non_empty_line(response)

Type Handling

Show String vs Dict Parameters

Several functions (like label_values) accept both string and dictionary representations of mappings:

# Both work:
label_values("a, b", {"a": "x", "b": "y"})
label_values("a, b", "{'a': 'x', 'b': 'y'}")

The functions use ast.literal_eval() to safely parse string representations.

Show None Returns

Some functions may return None when input is invalid:

result = first_non_empty_line("")  # Returns None

Always check for None before using return values.

Best Practices

Arity Consistency: Always use fix_inconsistent_arities() before comparing properties across claim and implication
Variable Extraction: Use extract_propositional_symbols() to find quantifiable variables in generated formulas
Predicate Parsing: Use split_string_except_in_brackets() when splitting comma-separated predicates to avoid breaking arguments
Clean LLM Output: Apply first_non_empty_line() to LLM responses to get clean, usable text
Formula Cleanup: Use remove_text_after_last_parenthesis() to remove trailing text from LLM-generated formulas

Dependencies

The module requires only Python standard library:

ast: For safe evaluation of string-represented dictionaries
re: For regular expression pattern matching

Core Classes

Utilities

Helper Functions

Overview

String Processing

label_values

first_non_empty_line

remove_text_after_last_parenthesis

Logical Formula Processing

extract_propositional_symbols

split_string_except_in_brackets

Predicate Manipulation

fix_inconsistent_arities

replace_variables

substitute_variables

Usage in Pipeline

Type Handling

Best Practices

Dependencies

See Also

Build docs developers (and LLMs) love

Core Classes

Utilities

​Overview

​String Processing

​label_values

​first_non_empty_line

​remove_text_after_last_parenthesis

​Logical Formula Processing

​extract_propositional_symbols

​split_string_except_in_brackets

​Predicate Manipulation

​fix_inconsistent_arities

​replace_variables

​substitute_variables

​Usage in Pipeline

​Type Handling

​Best Practices

​Dependencies

​See Also

Build docs developers (and LLMs) love

Overview

String Processing

label_values

first_non_empty_line

remove_text_after_last_parenthesis

Logical Formula Processing

extract_propositional_symbols

split_string_except_in_brackets

Predicate Manipulation

fix_inconsistent_arities

replace_variables

substitute_variables

Usage in Pipeline

Type Handling

Best Practices

Dependencies

See Also