Semantic Functions

Semantic functions provide LLM-based operations for extracting, transforming, and analyzing unstructured data. All functions are available via fc.semantic.*.

map

Applies a generation prompt to one or more columns, enabling rich summarization and generation tasks.

fc.semantic.map(
    prompt: str,
    /,
    *,
    strict: bool = True,
    examples: Optional[MapExampleCollection] = None,
    response_format: Optional[type[BaseModel]] = None,
    model_alias: Optional[Union[str, ModelAlias]] = None,
    temperature: float = 0.0,
    max_output_tokens: int = 512,
    request_timeout: TimeoutParam = None,
    **columns: Column,
) -> Column

prompt

str

required

A Jinja2 template for the generation prompt. References column values using {{ column_name }} syntax. Each placeholder is replaced with the corresponding value from the current row during execution.

strict

bool

default:"True"

If True, when any of the provided columns has a None value for a row, the entire row’s output will be None (template is not rendered). If False, None values are handled using Jinja2’s null rendering behavior.

examples

Optional[MapExampleCollection]

Optional few-shot examples to guide the model’s output format and style.

response_format

Optional[type[BaseModel]]

Optional Pydantic model to enforce structured output. Must include descriptions for each field.

model_alias

Optional[Union[str, ModelAlias]]

Optional language model alias. If None, uses the default model.

temperature

float

default:"0.0"

Language model temperature.

max_output_tokens

int

default:"512"

Maximum tokens to generate.

request_timeout

TimeoutParam

Optional timeout in seconds for a single LLM request. If None, uses the default timeout (120 seconds).

**columns

Column

required

Named column arguments that correspond to template variables. Keys must match the variable names used in the template.

return

Column

A column expression representing the semantic mapping operation.

Examples

fc.semantic.map(
    "Write a compelling one-line description for {{ name }}: {{ details }}",
    name=fc.col("name"),
    details=fc.col("details")
)

extract

Extracts structured information from unstructured text using a provided Pydantic model schema.

fc.semantic.extract(
    column: ColumnOrName,
    response_format: type[BaseModel],
    max_output_tokens: int = 1024,
    temperature: float = 0.0,
    model_alias: Optional[Union[str, ModelAlias]] = None,
    request_timeout: TimeoutParam = None,
) -> Column

column

ColumnOrName

required

Column containing text to extract from.

response_format

type[BaseModel]

required

A Pydantic model type that defines the output structure with descriptions for each field.

max_output_tokens

int

default:"1024"

Optional parameter to constrain the model to generate at most this many tokens.

temperature

float

default:"0.0"

Optional temperature parameter for the language model.

model_alias

Optional[Union[str, ModelAlias]]

Optional alias for the language model to use for the extraction. If None, will use the language model configured as the default.

request_timeout

TimeoutParam

Optional timeout in seconds for a single LLM request. If None, uses the default timeout (120 seconds).

return

Column

A new column with structured values (a struct) based on the provided schema.

Example

Extracting knowledge graph triples

class Triple(BaseModel):
    subject: str = Field(description="The subject of the triple")
    predicate: str = Field(description="The predicate or relation")
    object: str = Field(description="The object of the triple")

class KGResult(BaseModel):
    triples: List[Triple] = Field(description="List of extracted knowledge graph triples")
    entities: list[str] = Field(description="Flat list of all detected named entities")

df.select(fc.semantic.extract("blurb", KGResult))

predicate

Applies a boolean predicate to one or more columns, typically used for filtering.

fc.semantic.predicate(
    predicate: str,
    /,
    *,
    strict: bool = True,
    examples: Optional[PredicateExampleCollection] = None,
    model_alias: Optional[Union[str, ModelAlias]] = None,
    temperature: float = 0.0,
    request_timeout: TimeoutParam = None,
    **columns: Column,
) -> Column

predicate

str

required

A Jinja2 template containing a yes/no question or boolean claim. Should reference column values using {{ column_name }} syntax. The model will evaluate this condition for each row and return True or False.

strict

bool

default:"True"

If True, when any of the provided columns has a None value for a row, the entire row’s output will be None.

examples

Optional[PredicateExampleCollection]

Optional few-shot examples showing how to evaluate the predicate.

model_alias

Optional[Union[str, ModelAlias]]

Optional language model alias. If None, uses the default model.

temperature

float

default:"0.0"

Language model temperature.

request_timeout

TimeoutParam

Optional timeout in seconds for a single LLM request.

**columns

Column

required

Named column arguments that correspond to template variables.

return

Column

A boolean column expression.

Example

Filtering wireless products

wireless_products = df.filter(
    fc.semantic.predicate(
        """Product: {{ description }}
        Is this product wireless or battery-powered?""",
        description=fc.col("product_description")
    )
)

reduce

Aggregate function: reduces a set of strings in a column to a single string using a natural language instruction.

fc.semantic.reduce(
    prompt: str,
    column: ColumnOrName,
    *,
    group_context: Optional[Dict[str, Column]] = None,
    order_by: List[ColumnOrName] = None,
    model_alias: Optional[Union[str, ModelAlias]] = None,
    temperature: float = 0,
    max_output_tokens: int = 512,
    request_timeout: TimeoutParam = None,
) -> Column

prompt

str

required

A string containing the semantic.reduce prompt. The instruction can optionally include Jinja2 template variables (e.g., {{variable}}) that reference columns from the group_context parameter.

column

ColumnOrName

required

The column containing documents/strings to reduce.

group_context

Optional[Dict[str, Column]]

Optional dictionary mapping variable names to columns. These columns provide context for each group and can be referenced in the instruction template.

order_by

List[ColumnOrName]

Optional list of columns to sort grouped documents by before reduction.

model_alias

Optional[Union[str, ModelAlias]]

Optional alias for the language model to use. If None, uses the default model.

temperature

float

default:"0.0"

Temperature parameter for the language model.

max_output_tokens

int

default:"512"

Maximum tokens the model can generate.

request_timeout

TimeoutParam

Optional timeout in seconds for a single LLM request.

return

Column

A column expression representing the semantic reduction operation.

Examples

df.group_by("category").agg(
    fc.semantic.reduce("Summarize the documents", fc.col("document_text"))
)

classify

Classifies a string column into one of the provided classes.

fc.semantic.classify(
    column: ColumnOrName,
    classes: Union[List[str], List[ClassDefinition]],
    examples: Optional[ClassifyExampleCollection] = None,
    model_alias: Optional[Union[str, ModelAlias]] = None,
    temperature: float = 0,
    request_timeout: TimeoutParam = None,
) -> Column

column

ColumnOrName

required

Column or column name containing text to classify.

classes

Union[List[str], List[ClassDefinition]]

required

List of class labels or ClassDefinition objects defining the available classes. Use ClassDefinition objects to provide descriptions for the classes.

examples

Optional[ClassifyExampleCollection]

Optional collection of example classifications to guide the model.

model_alias

Optional[Union[str, ModelAlias]]

Optional alias for the language model to use.

temperature

float

default:"0.0"

Optional temperature parameter for the language model.

request_timeout

TimeoutParam

Optional timeout in seconds for a single LLM request.

return

Column

Expression containing the classification results.

Examples

fc.semantic.classify("message", ["Account Access", "Billing Issue", "Technical Problem"])

analyze_sentiment

Analyzes the sentiment of a string column. Returns one of ‘positive’, ‘negative’, or ‘neutral’.

fc.semantic.analyze_sentiment(
    column: ColumnOrName,
    model_alias: Optional[Union[str, ModelAlias]] = None,
    temperature: float = 0,
    request_timeout: TimeoutParam = None,
) -> Column

column

ColumnOrName

required

Column or column name containing text for sentiment analysis.

model_alias

Optional[Union[str, ModelAlias]]

Optional alias for the language model to use.

temperature

float

default:"0.0"

Optional temperature parameter for the language model.

request_timeout

TimeoutParam

Optional timeout in seconds for a single LLM request.

return

Column

Expression containing sentiment results (‘positive’, ‘negative’, or ‘neutral’).

Example

fc.semantic.analyze_sentiment(fc.col('user_comment'))

embed

Generate embeddings for the specified string column.

fc.semantic.embed(
    column: ColumnOrName,
    model_alias: Optional[Union[str, ModelAlias]] = None,
) -> Column

column

ColumnOrName

required

Column or column name containing the values to generate embeddings for.

model_alias

Optional[Union[str, ModelAlias]]

Optional alias for the embedding model to use. If None, will use the embedding model configured as the default.

return

Column

A Column expression that represents the embeddings for each value in the input column.

Example

df.select(fc.semantic.embed(fc.col("text_column")).alias("text_embeddings"))

summarize

Summarizes strings from a column.

fc.semantic.summarize(
    column: ColumnOrName,
    format: Union[KeyPoints, Paragraph, None] = None,
    temperature: float = 0,
    model_alias: Optional[Union[str, ModelAlias]] = None,
    request_timeout: TimeoutParam = None,
) -> Column

column

ColumnOrName

required

Column or column name containing text for summarization.

format

Union[KeyPoints, Paragraph, None]

Format of the summary to generate. Can be either KeyPoints or Paragraph. If None, will default to Paragraph with a maximum of 120 words.

temperature

float

default:"0.0"

Optional temperature parameter for the language model.

model_alias

Optional[Union[str, ModelAlias]]

Optional alias for the language model to use for the summarization.

request_timeout

TimeoutParam

Optional timeout in seconds for a single LLM request.

return

Column

Expression containing the summarized string.

Example

fc.semantic.summarize(fc.col('user_comment'))

parse_pdf

Parses a column of PDF paths into markdown.

fc.semantic.parse_pdf(
    column: ColumnOrName,
    model_alias: Optional[Union[str, ModelAlias]] = None,
    page_separator: Optional[str] = None,
    describe_images: bool = False,
    max_output_tokens: Optional[int] = None,
    request_timeout: TimeoutParam = None,
) -> Column

column

ColumnOrName

required

Column or column name containing the PDF to parse.

model_alias

Optional[Union[str, ModelAlias]]

Optional alias for the language model to use for the parsing.

page_separator

Optional[str]

Optional page separator to use for the parsing. If the separator includes the {page} placeholder, the model will replace it with the current page number.

describe_images

bool

default:"False"

Flag to describe images in the PDF. If True, the prompt will ask the model to include a description of the image in the markdown output. If False, the prompt asks the model to ignore images that aren’t tables or charts.

max_output_tokens

Optional[int]

Optional maximum number of output tokens per ~3 pages of PDF (does not include reasoning tokens). If None, don’t constrain the model’s output.

request_timeout

TimeoutParam

Optional timeout in seconds for a single LLM request.

return

Column

A dataframe with markdown strings for each PDF file.

For Gemini models, this function uses the google file API, uploading PDF files to Google’s file store and deleting them after each request.

Examples

fc.semantic.parse_pdf(fc.col("pdf_path")).show()

Core

Functions

I/O

Types

Configuration

MCP

Semantic Functions

map

Examples

extract

Example

predicate

Example

reduce

Examples

classify

Examples

analyze_sentiment

Example

embed

Example

summarize

Example

parse_pdf

Examples

Build docs developers (and LLMs) love

Core

Functions

I/O

Types

Configuration

MCP

​map

​Examples

​extract

​Example

​predicate

​Example

​reduce

​Examples

​classify

​Examples

​analyze_sentiment

​Example

​embed

​Example

​summarize

​Example

​parse_pdf

​Examples

Build docs developers (and LLMs) love

map

Examples

extract

Example

predicate

Example

reduce

Examples

classify

Examples

analyze_sentiment

Example

embed

Example

summarize

Example

parse_pdf

Examples