SpeechGraph

Overview

SpeechGraph is a scraping pipeline that scrapes the web, provides an answer to a given prompt, and generates an audio file from the extracted information. It combines web scraping with text-to-speech capabilities.

Class Signature

class SpeechGraph(AbstractGraph):
    def __init__(
        self,
        prompt: str,
        source: str,
        config: dict,
        schema: Optional[Type[BaseModel]] = None,
    )

Constructor Parameters

prompt

str

required

The natural language prompt describing what information to extract and convert to speech.

source

str

required

The source to scrape. Can be:

A URL starting with http:// or https://
A local directory path for offline HTML files

config

dict

required

Configuration parameters for the graph. Must include:

llm: LLM configuration (e.g., {"model": "openai/gpt-4o"})
tts_model: Text-to-speech model configuration

Optional parameters:

output_path (str): Path to save the audio file (default: “output.mp3”)
verbose (bool): Enable detailed logging
headless (bool): Run browser in headless mode
additional_info (str): Extra context for the LLM

schema

Type[BaseModel]

default:"None"

Optional Pydantic model defining the expected output structure.

Attributes

prompt

str

The user’s extraction prompt.

source

str

The source URL or local directory path.

config

dict

Configuration dictionary for the graph.

schema

BaseModel

Optional output schema for structured data extraction.

llm_model

object

The configured language model instance.

input_key

str

Either “url” or “local_dir” based on the source type.

Methods

run()

Executes the scraping process, generates audio, and returns the text answer.

def run(self) -> str

return

str

The extracted information as a string. The audio file is saved to disk.

Raises:

ValueError: If no audio was generated from the text.

Basic Usage

from scrapegraphai.graphs import SpeechGraph

graph_config = {
    "llm": {
        "model": "openai/gpt-4o",
        "api_key": "your-openai-key"
    },
    "tts_model": {
        "api_key": "your-openai-key",
        "model": "tts-1",
        "voice": "alloy"
    },
    "output_path": "summary.mp3"
}

speech_graph = SpeechGraph(
    prompt="List all the attractions in Chioggia and generate an audio summary.",
    source="https://en.wikipedia.org/wiki/Chioggia",
    config=graph_config
)

result = speech_graph.run()
print(result)  # Prints the text
# Audio is saved to summary.mp3

Text-to-Speech Configuration

Using OpenAI TTS

config = {
    "llm": {
        "model": "openai/gpt-4o",
        "api_key": "your-api-key"
    },
    "tts_model": {
        "api_key": "your-api-key",
        "model": "tts-1",        # or "tts-1-hd" for higher quality
        "voice": "alloy"         # alloy, echo, fable, onyx, nova, shimmer
    },
    "output_path": "output.mp3"
}

speech_graph = SpeechGraph(
    prompt="Summarize the key points",
    source="https://example.com/article",
    config=config
)

Available Voices

OpenAI TTS offers six voice options:

alloy: Neutral and balanced
echo: Male voice
fable: British accent
onyx: Deep male voice
nova: Female voice
shimmer: Soft female voice

Advanced Usage

Custom Output Path

import os
from datetime import datetime

# Generate timestamped filename
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
output_path = f"./audio_summaries/summary_{timestamp}.mp3"

# Ensure directory exists
os.makedirs("./audio_summaries", exist_ok=True)

config = {
    "llm": {"model": "openai/gpt-4o"},
    "tts_model": {"model": "tts-1", "voice": "nova"},
    "output_path": output_path
}

speech_graph = SpeechGraph(
    prompt="Create a brief audio summary",
    source="https://example.com",
    config=config
)

result = speech_graph.run()
print(f"Audio saved to: {output_path}")

High-Quality Audio

config = {
    "llm": {"model": "openai/gpt-4o"},
    "tts_model": {
        "model": "tts-1-hd",     # High-definition audio
        "voice": "onyx",
        "speed": 1.0             # Playback speed (0.25 to 4.0)
    },
    "output_path": "hq_summary.mp3"
}

Graph Workflow

The SpeechGraph uses the following node pipeline:

FetchNode → ParseNode → GenerateAnswerNode → TextToSpeechNode

FetchNode: Fetches the web page content
ParseNode: Parses and chunks the content
GenerateAnswerNode: Extracts information based on the prompt
TextToSpeechNode: Converts the answer to audio

Use Cases

Accessibility: Convert web content to audio for visually impaired users
Learning: Create audio summaries of educational content
News Briefings: Generate audio news summaries
Podcast Generation: Create podcast episodes from articles
Audiobooks: Convert written content to audio format

Example: News Briefing

from typing import List
from pydantic import BaseModel, Field

class NewsBrief(BaseModel):
    headline: str = Field(description="Main headline")
    summary: str = Field(description="Brief summary in 2-3 sentences")
    key_points: List[str] = Field(description="3-5 key points")

config = {
    "llm": {"model": "openai/gpt-4o"},
    "tts_model": {
        "model": "tts-1",
        "voice": "nova"
    },
    "output_path": "news_brief.mp3",
    "additional_info": "Create a concise news briefing suitable for audio"
}

speech_graph = SpeechGraph(
    prompt="Create a news briefing with headline, summary, and key points",
    source="https://example.com/news-article",
    config=config,
    schema=NewsBrief
)

result = speech_graph.run()
print("Text version:", result)
print("Audio saved to: news_brief.mp3")

Example: Educational Summary

config = {
    "llm": {"model": "openai/gpt-4o"},
    "tts_model": {
        "model": "tts-1",
        "voice": "alloy"
    },
    "output_path": "lesson.mp3",
    "additional_info": "Explain concepts clearly as if teaching a student"
}

speech_graph = SpeechGraph(
    prompt="Explain the key concepts of quantum computing in simple terms",
    source="https://example.com/quantum-computing-intro",
    config=config
)

result = speech_graph.run()
print(f"Lesson audio created: lesson.mp3")

Accessing Results

result = speech_graph.run()

# Get the text answer
print("Text:", result)

# Access full state
final_state = speech_graph.get_state()
text_answer = final_state.get("answer")
audio_bytes = final_state.get("audio")

print(f"Text length: {len(text_answer)} characters")
print(f"Audio size: {len(audio_bytes)} bytes")

# Execution info
exec_info = speech_graph.get_execution_info()
for node_info in exec_info:
    print(f"{node_info['node_name']}: {node_info['exec_time']:.2f}s")

Error Handling

try:
    result = speech_graph.run()
    print(f"Success! Text: {result}")
    print(f"Audio saved to: {config['output_path']}")
    
except ValueError as e:
    if "No audio generated" in str(e):
        print("Failed to generate audio from text")
    else:
        raise
        
except Exception as e:
    print(f"Error during processing: {e}")

Cost Considerations

OpenAI TTS pricing (as of 2024):

tts-1: $0.015 per 1,000 characters
tts-1-hd: $0.030 per 1,000 characters

# Estimate cost
text_length = len(result)
cost_per_char = 0.015 / 1000  # for tts-1
estimated_cost = text_length * cost_per_char
print(f"Estimated TTS cost: ${estimated_cost:.4f}")

Performance Tips

Use tts-1 for faster generation and lower cost
Use tts-1-hd for higher quality audio when needed
Keep prompts concise to reduce text length and TTS costs
Use additional_info to guide the LLM toward audio-friendly output
Test different voices to find the best fit for your use case

Limitations

Audio generation adds processing time and cost
Maximum text length depends on TTS provider limits
Audio quality depends on the TTS model used
Requires OpenAI API key with TTS access

SmartScraperGraph - Text-only extraction
SearchGraph - Search and scrape multiple sources
OmniScraperGraph - Include image analysis

Graphs

Nodes

Models

Utilities

Overview

Class Signature

Constructor Parameters

Attributes

Methods

run()

Basic Usage

Text-to-Speech Configuration

Using OpenAI TTS

Available Voices

Advanced Usage

Custom Output Path

High-Quality Audio

Graph Workflow

Use Cases

Example: News Briefing

Example: Educational Summary

Accessing Results

Error Handling

Cost Considerations

Performance Tips

Limitations

Build docs developers (and LLMs) love

Graphs

Nodes

Models

Utilities

​Overview

​Class Signature

​Constructor Parameters

​Attributes

​Methods

​run()

​Basic Usage

​Text-to-Speech Configuration

​Using OpenAI TTS

​Available Voices

​Advanced Usage

​Custom Output Path

​High-Quality Audio

​Graph Workflow

​Use Cases

​Example: News Briefing

​Example: Educational Summary

​Accessing Results

​Error Handling

​Cost Considerations

​Performance Tips

​Limitations

​Related Graphs

Build docs developers (and LLMs) love

Overview

Class Signature

Constructor Parameters

Attributes

Methods

run()

Basic Usage

Text-to-Speech Configuration

Using OpenAI TTS

Available Voices

Advanced Usage

Custom Output Path

High-Quality Audio

Graph Workflow

Use Cases

Example: News Briefing

Example: Educational Summary

Accessing Results

Error Handling

Cost Considerations

Performance Tips

Limitations

Related Graphs