Skip to main content
This example demonstrates how to use MotherDuck’s AI prompt() function in a dbt pipeline to transform unstructured text data (reviews, feedback) into structured data.

Overview

The prompt() function allows you to extract structured information from text using AI, directly in your SQL queries. This is particularly useful for processing product reviews, customer feedback, or any unstructured text data.

Installation

1

Install dependencies

Install dbt-duckdb and DuckDB:
pip install dbt-duckdb duckdb==v1.4.3
2

Set MotherDuck token

Configure your MotherDuck authentication:
export MOTHERDUCK_TOKEN='your_token_here'
3

Run dbt

Execute your dbt pipeline:
dbt run
dbt test
dbt show --select reviews_attributes_by_product

Configuration

profiles.yml

Configure dbt to connect to MotherDuck:
profiles.yml
dbt_ai_prompt:
  outputs:
    dev:
      type: duckdb
      schema: main
      path: 'md:'
      threads: 1
  target: dev

dbt_project.yml

Configure model materialization:
dbt_project.yml
name: 'dbt_ai_prompt'
version: '1.0.0'
profile: 'dbt_ai_prompt'

model-paths: ["models"]

models:
  dbt_ai_prompt:
    reviews:
      +materialized: table
      +database: my_db

Models

Extract Review Attributes

The reviews_attributes model uses the prompt() function to extract structured data from review text:
models/reviews/reviews_attributes.sql
{{ config(materialized="table") }}

select parent_asin, prompt_struct_response.*
from
    (
        select
            parent_asin,
            prompt(
                'You are a very helpful assistant. You are given a product review title and test.\n'
                || 'You are required to extract information from the review.\n'
                || 'Here is the title of the review:'
                || '```'
                || title
                || '```'
                || 'Here is the review text:'
                || '```'
                || text
                || '```',
                struct := {
                    -- Sentiment
                    sentiment:'VARCHAR',
                    -- Feature mentions
                    product_features:'VARCHAR[]',
                    pros:'VARCHAR[]',
                    cons:'VARCHAR[]',
                    -- Quality indicators
                    has_size_info:'BOOLEAN',
                    mentions_price:'BOOLEAN',
                    mentions_shipping:'BOOLEAN',
                    mentions_packaging:'BOOLEAN',
                    -- Comparative analysis
                    competitor_mentions:'VARCHAR[]',
                    previous_version_comparison:'BOOLEAN',
                    -- Usage context
                    use_case:'VARCHAR[]',
                    purchase_reason:'VARCHAR[]',
                    time_owned:'VARCHAR',
                    -- Issues and concerns
                    reported_issues:'VARCHAR[]',
                    quality_concerns:'VARCHAR[]',
                    -- Customer service interaction
                    customer_service_interaction:'BOOLEAN',
                    customer_service_sentiment:'VARCHAR'
                },
                struct_descr := {
                    sentiment:'the sentiment of the review, can only take values `positive`, `neutral` or `negative`',
                    product_features:'a list of features mentioned in the review, if none mentioned return empty array',
                    pros:'a list of pros or positive aspects mentioned in the review, if none mentioned return empty array',
                    cons:'a list of cons or negative aspects mentioned in the review, if none mentioned return empty array',
                    has_size_info:'indicates if the review mentions size information',
                    mentions_price:'indicates if the review mentions price information',
                    mentions_shipping:'indicates if the review mentions shipping information',
                    mentions_packaging:'indicates if the review mentions packaging information',
                    competitor_mentions:'a list of competitors mentioned in the review, if none mentioned return empty array',
                    previous_version_comparison:'indicates if the review compares the product to a previous version',
                    use_case:'a list of use cases mentioned in the review, if none return empty array',
                    purchase_reason:'a list of purchase reasons mentioned in the review, if none return empty array',
                    time_owned:'the time the reviewer has owned the product, if mentioned return the time what ever was written in text, if not mentioned return empty string',
                    reported_issues:'a list of issues reported in the review, if none return empty array',
                    quality_concerns:'a list of quality concerns mentioned in the review, if none return empty array',
                    customer_service_interaction:'indicates if the review mentions customer service interaction',
                    customer_service_sentiment:'the sentiment of the customer service interaction, can only take values `positive`, `neutral` or `negative`'
                }
            ) as prompt_struct_response
        from 
        (select * from {{ source('reviews', 'reviews_raw') }} limit 10)
    )
The prompt() function uses the struct parameter to define the output schema and struct_descr to provide descriptions that guide the AI in extracting the right information.

Aggregate Attributes by Product

The reviews_attributes_by_product model aggregates extracted attributes:
models/reviews/reviews_attributes_by_product.sql
{{ config(materialized="view") }}

with
    unnested_array_attributes as (
        select
            parent_asin,
            unnest(product_features) as product_features,
            unnest(pros) as pros,
            unnest(cons) as cons,
            unnest(competitor_mentions) as competitor_mentions,
            unnest(use_case) as use_case,
            unnest(purchase_reason) as purchase_reason,
            unnest(reported_issues) as reported_issues,
            unnest(quality_concerns) as quality_concerns
        from {{ ref("reviews_attributes") }}
    )
select
    parent_asin,
    array_distinct(array_agg(product_features)) as product_features,
    array_distinct(array_agg(pros)) as pros,
    array_distinct(array_agg(cons)) as cons,
    array_distinct(array_agg(competitor_mentions)) as competitor_mentions,
    array_distinct(array_agg(use_case)) as use_case,
    array_distinct(array_agg(purchase_reason)) as purchase_reason,
    array_distinct(array_agg(reported_issues)) as reported_issues,
    array_distinct(array_agg(quality_concerns)) as quality_concerns
from unnested_array_attributes
group by parent_asin

Calculate Sentiment Scores

The reviews_attributes_sentiment_by_product model computes sentiment metrics:
models/reviews/reviews_attributes_sentiment_by_product.sql
{{ config(materialized="view") }}

SELECT 
    parent_asin,
    COUNT(CASE WHEN sentiment = 'positive' THEN 1 END) as positive_count,
    COUNT(CASE WHEN sentiment = 'neutral' THEN 1 END) as neutral_count,
    COUNT(CASE WHEN sentiment = 'negative' THEN 1 END) as negative_count,
    (positive_count - negative_count)::FLOAT / NULLIF(positive_count + neutral_count + negative_count, 0) as sentiment_score,
    COUNT(CASE WHEN customer_service_sentiment = 'positive' THEN 1 END) as positive_service_count,
    COUNT(CASE WHEN customer_service_sentiment = 'neutral' THEN 1 END) as neutral_service_count,
    COUNT(CASE WHEN customer_service_sentiment = 'negative' THEN 1 END) as negative_service_count,
    (positive_service_count - negative_service_count)::FLOAT / NULLIF(positive_service_count + neutral_service_count + negative_service_count, 0) as service_sentiment_score,
FROM {{ ref("reviews_attributes") }}
GROUP BY parent_asin

How It Works

The prompt() function takes three key parameters:
  • Prompt text: Instructions for the AI on what to extract from the text
  • struct: Defines the output schema with field names and types
  • struct_descr: Provides descriptions for each field to guide extraction
The AI processes the input text and returns a struct matching your specified schema.
  1. Raw review data is sourced from a table
  2. The prompt() function extracts structured attributes from each review
  3. Results are aggregated by product to get consolidated views
  4. Sentiment scores are calculated for overall and service sentiment

Use Cases

  • Extract sentiment from customer reviews
  • Identify product features mentioned in feedback
  • Analyze customer service interactions
  • Track competitive mentions
  • Understand purchase motivations and use cases
The prompt() function is particularly powerful for normalizing unstructured text data at scale, making it queryable and analyzable in your data warehouse.

Build docs developers (and LLMs) love