Skip to main content
Semantic functions provide LLM-based operations for extracting, transforming, and analyzing unstructured data. All functions are available via fc.semantic.*.

map

Applies a generation prompt to one or more columns, enabling rich summarization and generation tasks.
fc.semantic.map(
    prompt: str,
    /,
    *,
    strict: bool = True,
    examples: Optional[MapExampleCollection] = None,
    response_format: Optional[type[BaseModel]] = None,
    model_alias: Optional[Union[str, ModelAlias]] = None,
    temperature: float = 0.0,
    max_output_tokens: int = 512,
    request_timeout: TimeoutParam = None,
    **columns: Column,
) -> Column
prompt
str
required
A Jinja2 template for the generation prompt. References column values using {{ column_name }} syntax. Each placeholder is replaced with the corresponding value from the current row during execution.
strict
bool
default:"True"
If True, when any of the provided columns has a None value for a row, the entire row’s output will be None (template is not rendered). If False, None values are handled using Jinja2’s null rendering behavior.
examples
Optional[MapExampleCollection]
Optional few-shot examples to guide the model’s output format and style.
response_format
Optional[type[BaseModel]]
Optional Pydantic model to enforce structured output. Must include descriptions for each field.
model_alias
Optional[Union[str, ModelAlias]]
Optional language model alias. If None, uses the default model.
temperature
float
default:"0.0"
Language model temperature.
max_output_tokens
int
default:"512"
Maximum tokens to generate.
request_timeout
TimeoutParam
Optional timeout in seconds for a single LLM request. If None, uses the default timeout (120 seconds).
**columns
Column
required
Named column arguments that correspond to template variables. Keys must match the variable names used in the template.
return
Column
A column expression representing the semantic mapping operation.

Examples

fc.semantic.map(
    "Write a compelling one-line description for {{ name }}: {{ details }}",
    name=fc.col("name"),
    details=fc.col("details")
)

extract

Extracts structured information from unstructured text using a provided Pydantic model schema.
fc.semantic.extract(
    column: ColumnOrName,
    response_format: type[BaseModel],
    max_output_tokens: int = 1024,
    temperature: float = 0.0,
    model_alias: Optional[Union[str, ModelAlias]] = None,
    request_timeout: TimeoutParam = None,
) -> Column
column
ColumnOrName
required
Column containing text to extract from.
response_format
type[BaseModel]
required
A Pydantic model type that defines the output structure with descriptions for each field.
max_output_tokens
int
default:"1024"
Optional parameter to constrain the model to generate at most this many tokens.
temperature
float
default:"0.0"
Optional temperature parameter for the language model.
model_alias
Optional[Union[str, ModelAlias]]
Optional alias for the language model to use for the extraction. If None, will use the language model configured as the default.
request_timeout
TimeoutParam
Optional timeout in seconds for a single LLM request. If None, uses the default timeout (120 seconds).
return
Column
A new column with structured values (a struct) based on the provided schema.

Example

Extracting knowledge graph triples
class Triple(BaseModel):
    subject: str = Field(description="The subject of the triple")
    predicate: str = Field(description="The predicate or relation")
    object: str = Field(description="The object of the triple")

class KGResult(BaseModel):
    triples: List[Triple] = Field(description="List of extracted knowledge graph triples")
    entities: list[str] = Field(description="Flat list of all detected named entities")

df.select(fc.semantic.extract("blurb", KGResult))

predicate

Applies a boolean predicate to one or more columns, typically used for filtering.
fc.semantic.predicate(
    predicate: str,
    /,
    *,
    strict: bool = True,
    examples: Optional[PredicateExampleCollection] = None,
    model_alias: Optional[Union[str, ModelAlias]] = None,
    temperature: float = 0.0,
    request_timeout: TimeoutParam = None,
    **columns: Column,
) -> Column
predicate
str
required
A Jinja2 template containing a yes/no question or boolean claim. Should reference column values using {{ column_name }} syntax. The model will evaluate this condition for each row and return True or False.
strict
bool
default:"True"
If True, when any of the provided columns has a None value for a row, the entire row’s output will be None.
examples
Optional[PredicateExampleCollection]
Optional few-shot examples showing how to evaluate the predicate.
model_alias
Optional[Union[str, ModelAlias]]
Optional language model alias. If None, uses the default model.
temperature
float
default:"0.0"
Language model temperature.
request_timeout
TimeoutParam
Optional timeout in seconds for a single LLM request.
**columns
Column
required
Named column arguments that correspond to template variables.
return
Column
A boolean column expression.

Example

Filtering wireless products
wireless_products = df.filter(
    fc.semantic.predicate(
        """Product: {{ description }}
        Is this product wireless or battery-powered?""",
        description=fc.col("product_description")
    )
)

reduce

Aggregate function: reduces a set of strings in a column to a single string using a natural language instruction.
fc.semantic.reduce(
    prompt: str,
    column: ColumnOrName,
    *,
    group_context: Optional[Dict[str, Column]] = None,
    order_by: List[ColumnOrName] = None,
    model_alias: Optional[Union[str, ModelAlias]] = None,
    temperature: float = 0,
    max_output_tokens: int = 512,
    request_timeout: TimeoutParam = None,
) -> Column
prompt
str
required
A string containing the semantic.reduce prompt. The instruction can optionally include Jinja2 template variables (e.g., {{variable}}) that reference columns from the group_context parameter.
column
ColumnOrName
required
The column containing documents/strings to reduce.
group_context
Optional[Dict[str, Column]]
Optional dictionary mapping variable names to columns. These columns provide context for each group and can be referenced in the instruction template.
order_by
List[ColumnOrName]
Optional list of columns to sort grouped documents by before reduction.
model_alias
Optional[Union[str, ModelAlias]]
Optional alias for the language model to use. If None, uses the default model.
temperature
float
default:"0.0"
Temperature parameter for the language model.
max_output_tokens
int
default:"512"
Maximum tokens the model can generate.
request_timeout
TimeoutParam
Optional timeout in seconds for a single LLM request.
return
Column
A column expression representing the semantic reduction operation.

Examples

df.group_by("category").agg(
    fc.semantic.reduce("Summarize the documents", fc.col("document_text"))
)

classify

Classifies a string column into one of the provided classes.
fc.semantic.classify(
    column: ColumnOrName,
    classes: Union[List[str], List[ClassDefinition]],
    examples: Optional[ClassifyExampleCollection] = None,
    model_alias: Optional[Union[str, ModelAlias]] = None,
    temperature: float = 0,
    request_timeout: TimeoutParam = None,
) -> Column
column
ColumnOrName
required
Column or column name containing text to classify.
classes
Union[List[str], List[ClassDefinition]]
required
List of class labels or ClassDefinition objects defining the available classes. Use ClassDefinition objects to provide descriptions for the classes.
examples
Optional[ClassifyExampleCollection]
Optional collection of example classifications to guide the model.
model_alias
Optional[Union[str, ModelAlias]]
Optional alias for the language model to use.
temperature
float
default:"0.0"
Optional temperature parameter for the language model.
request_timeout
TimeoutParam
Optional timeout in seconds for a single LLM request.
return
Column
Expression containing the classification results.

Examples

fc.semantic.classify("message", ["Account Access", "Billing Issue", "Technical Problem"])

analyze_sentiment

Analyzes the sentiment of a string column. Returns one of ‘positive’, ‘negative’, or ‘neutral’.
fc.semantic.analyze_sentiment(
    column: ColumnOrName,
    model_alias: Optional[Union[str, ModelAlias]] = None,
    temperature: float = 0,
    request_timeout: TimeoutParam = None,
) -> Column
column
ColumnOrName
required
Column or column name containing text for sentiment analysis.
model_alias
Optional[Union[str, ModelAlias]]
Optional alias for the language model to use.
temperature
float
default:"0.0"
Optional temperature parameter for the language model.
request_timeout
TimeoutParam
Optional timeout in seconds for a single LLM request.
return
Column
Expression containing sentiment results (‘positive’, ‘negative’, or ‘neutral’).

Example

fc.semantic.analyze_sentiment(fc.col('user_comment'))

embed

Generate embeddings for the specified string column.
fc.semantic.embed(
    column: ColumnOrName,
    model_alias: Optional[Union[str, ModelAlias]] = None,
) -> Column
column
ColumnOrName
required
Column or column name containing the values to generate embeddings for.
model_alias
Optional[Union[str, ModelAlias]]
Optional alias for the embedding model to use. If None, will use the embedding model configured as the default.
return
Column
A Column expression that represents the embeddings for each value in the input column.

Example

df.select(fc.semantic.embed(fc.col("text_column")).alias("text_embeddings"))

summarize

Summarizes strings from a column.
fc.semantic.summarize(
    column: ColumnOrName,
    format: Union[KeyPoints, Paragraph, None] = None,
    temperature: float = 0,
    model_alias: Optional[Union[str, ModelAlias]] = None,
    request_timeout: TimeoutParam = None,
) -> Column
column
ColumnOrName
required
Column or column name containing text for summarization.
format
Union[KeyPoints, Paragraph, None]
Format of the summary to generate. Can be either KeyPoints or Paragraph. If None, will default to Paragraph with a maximum of 120 words.
temperature
float
default:"0.0"
Optional temperature parameter for the language model.
model_alias
Optional[Union[str, ModelAlias]]
Optional alias for the language model to use for the summarization.
request_timeout
TimeoutParam
Optional timeout in seconds for a single LLM request.
return
Column
Expression containing the summarized string.

Example

fc.semantic.summarize(fc.col('user_comment'))

parse_pdf

Parses a column of PDF paths into markdown.
fc.semantic.parse_pdf(
    column: ColumnOrName,
    model_alias: Optional[Union[str, ModelAlias]] = None,
    page_separator: Optional[str] = None,
    describe_images: bool = False,
    max_output_tokens: Optional[int] = None,
    request_timeout: TimeoutParam = None,
) -> Column
column
ColumnOrName
required
Column or column name containing the PDF to parse.
model_alias
Optional[Union[str, ModelAlias]]
Optional alias for the language model to use for the parsing.
page_separator
Optional[str]
Optional page separator to use for the parsing. If the separator includes the {page} placeholder, the model will replace it with the current page number.
describe_images
bool
default:"False"
Flag to describe images in the PDF. If True, the prompt will ask the model to include a description of the image in the markdown output. If False, the prompt asks the model to ignore images that aren’t tables or charts.
max_output_tokens
Optional[int]
Optional maximum number of output tokens per ~3 pages of PDF (does not include reasoning tokens). If None, don’t constrain the model’s output.
request_timeout
TimeoutParam
Optional timeout in seconds for a single LLM request.
return
Column
A dataframe with markdown strings for each PDF file.
For Gemini models, this function uses the google file API, uploading PDF files to Google’s file store and deleting them after each request.

Examples

fc.semantic.parse_pdf(fc.col("pdf_path")).show()

Build docs developers (and LLMs) love