Skip to main content

Overview

The helpers module provides utility functions for processing strings, extracting symbols from logical formulas, and manipulating predicate expressions. These functions are used throughout the NL2FOL pipeline.

String Processing

label_values

def label_values(input_string, map)
Labels values in a comma-separated string with their mapped characters. Parameters:
  • input_string (str): Comma-separated string of values
  • map (dict | str): Mapping dictionary or string representation of dict
Returns: str - Labeled string in format "value1: a, value2: b"
map = {"cats": "a", "animals": "b"}
result = label_values("cats, animals", map)
print(result)
# Output: "cats: a, animals: b"

first_non_empty_line

def first_non_empty_line(string)
Extracts the first non-empty line from a multi-line string. Parameters:
  • string (str): Input string with potential multiple lines
Returns: str | None - First non-empty line, or None if all lines are empty
text = "\n\nThis is the first line\nThis is second"
result = first_non_empty_line(text)
print(result)
# Output: "This is the first line"
This function is particularly useful for extracting clean output from LLM responses that may include leading whitespace or empty lines.

remove_text_after_last_parenthesis

def remove_text_after_last_parenthesis(input_string)
Removes all text after the last closing parenthesis in a string. Parameters:
  • input_string (str): String to process
Returns: str - String up to and including the last ), or original string if no ) found
formula = "forall x (P(x)) some extra text"
cleaned = remove_text_after_last_parenthesis(formula)
print(cleaned)
# Output: "forall x (P(x))"

Logical Formula Processing

extract_propositional_symbols

def extract_propositional_symbols(logical_form)
Extracts all single lowercase letter variables from a logical formula. Parameters:
  • logical_form (str): Logical formula string
Returns: set[str] - Set of single-character variable names (a-z)
formula = "exists a (P(a) and forall b (Q(b) -> R(a, b)))"
symbols = extract_propositional_symbols(formula)
print(symbols)
# Output: {'a', 'b'}
This function only extracts single lowercase letters. Multi-character variable names or uppercase variables are ignored.

split_string_except_in_brackets

def split_string_except_in_brackets(string, delimiter)
Splits a string by delimiter, but ignores delimiters inside parentheses. Parameters:
  • string (str): String to split
  • delimiter (str): Delimiter character
Returns: list[str] - List of split segments
text = "P(a, b), Q(c, d), R(e)"
parts = split_string_except_in_brackets(text, ',')
print(parts)
# Output: ['P(a, b)', ' Q(c, d)', ' R(e)']
This function is essential for parsing comma-separated predicates where commas inside predicate arguments should not trigger splits.

Predicate Manipulation

fix_inconsistent_arities

def fix_inconsistent_arities(clauses1, clauses2)
Ensures predicates have consistent arities across two lists of clauses by truncating extra arguments. Parameters:
  • clauses1 (list[str]): First list of predicate clauses
  • clauses2 (list[str]): Second list of predicate clauses
Returns: tuple[str, str] - Two comma-separated strings with consistent arities Algorithm:
  1. Extract all predicates and their arities from both lists
  2. For predicates with multiple arities, keep the minimum
  3. Truncate arguments in clauses that exceed the minimum arity
clauses1 = ["P(a, b, c)", "Q(x)"]
clauses2 = ["P(d)", "Q(y, z)"]

fixed1, fixed2 = fix_inconsistent_arities(clauses1, clauses2)
print(fixed1)  # Output: "P(a), Q(x)"
print(fixed2)  # Output: "P(d), Q(y)"

replace_variables

def replace_variables(mapping, clause)
Replaces variable letters in a clause with their corresponding entity names. Parameters:
  • mapping (dict): Dictionary mapping entity names to variable letters
  • clause (str): Predicate clause with variables
Returns: str - Clause with variables replaced by entity names Process:
  1. Reverses the mapping (variable → entity)
  2. Parses predicate name and arguments
  3. Replaces variables in arguments with entity names
  4. Reconstructs the clause
mapping = {"cats": "a", "animals": "b"}
clause = "IsSubsetOf(a, b)"

result = replace_variables(mapping, clause)
print(result)
# Output: "IsSubsetOf(cats, animals)"

substitute_variables

def substitute_variables(clause1, clause2, start_char)
Substitutes variables in two clauses with new variable names starting from a given character. Parameters:
  • clause1 (str): First clause
  • clause2 (str): Second clause
  • start_char (str): Starting character for new variable names
Returns: tuple[str, str, str] - (substituted_clause1, substituted_clause2, next_char) Process:
  1. Creates a mapping for variables encountered
  2. Assigns new variables starting from start_char
  3. Replaces variables in both clauses
  4. Returns next available character for further use
clause1 = "P(x, y)"
clause2 = "Q(x, z)"

c1, c2, next_char = substitute_variables(clause1, clause2, 'a')
print(c1)        # Output: "P(a, b)"
print(c2)        # Output: "Q(a, c)"
print(next_char) # Output: "d"
Variables that appear in both clauses are assigned the same new variable name, preserving their identity across clauses.

Usage in Pipeline

These helper functions are used throughout the NL2FOL pipeline:
from helpers import label_values, first_non_empty_line

# Label referring expressions with variables
labeled = label_values(claim_ref_exp, entity_mappings)
prompt = f"Extract properties for: {labeled}"

# Get clean LLM response
response = get_llm_result(prompt)
properties = first_non_empty_line(response)

Type Handling

Best Practices

  1. Arity Consistency: Always use fix_inconsistent_arities() before comparing properties across claim and implication
  2. Variable Extraction: Use extract_propositional_symbols() to find quantifiable variables in generated formulas
  3. Predicate Parsing: Use split_string_except_in_brackets() when splitting comma-separated predicates to avoid breaking arguments
  4. Clean LLM Output: Apply first_non_empty_line() to LLM responses to get clean, usable text
  5. Formula Cleanup: Use remove_text_after_last_parenthesis() to remove trailing text from LLM-generated formulas

Dependencies

The module requires only Python standard library:
  • ast: For safe evaluation of string-represented dictionaries
  • re: For regular expression pattern matching

See Also

Build docs developers (and LLMs) love