Skip to main content
DashScope provides cloud-hosted Qwen3-TTS APIs for fast, scalable, and production-ready text-to-speech generation. Skip infrastructure management and start generating speech in minutes.

Overview

The DashScope API offers:
  • Managed infrastructure: No GPU setup or maintenance required
  • Low latency: Optimized cloud deployment for fast generation
  • High availability: Production-grade reliability and uptime
  • Real-time streaming: Stream audio as it’s generated
  • All model variants: Access to CustomVoice, VoiceClone, and VoiceDesign models

API Types

DashScope provides three API endpoints corresponding to the three Qwen3-TTS model types:

1. Custom Voice API

Generate speech using predefined speaker voices with optional instruction control. Features:
  • 9 premium speaker voices (Vivian, Serena, Ryan, Aiden, etc.)
  • Multi-language support (Chinese, English, Japanese, Korean, and more)
  • Natural language instruction control (emotion, tone, speaking style)
  • Streaming and non-streaming modes
Documentation:

2. Voice Clone API

Clone any voice from a reference audio sample and generate new speech in that voice. Features:
  • 3-second rapid voice cloning
  • High-fidelity speaker similarity
  • Supports reference audio + transcript
  • Cross-lingual voice cloning
Documentation:

3. Voice Design API

Create custom voices from natural language descriptions. Features:
  • Design voices with text descriptions (age, gender, tone, emotion)
  • Fine-grained control over voice characteristics
  • Generate unique voice profiles on-demand
  • Support for creative and expressive voice styles
Documentation:

Quick Start

1. Get API Credentials

  1. Sign up for DashScope account:
  2. Create API key in the console
  3. Note your API endpoint and key

2. Make Your First Request

Refer to the official documentation for complete API reference, authentication, and code examples:

API Comparison

FeatureCustom VoiceVoice CloneVoice Design
Predefined speakers✅ 9 speakers
Custom voice from audio
Voice from description
Instruction control
Streaming
Languages10 languages10 languages10 languages
Best forProduction apps with consistent voicesVoice cloning, personalizationCreative voice design

Self-Hosted vs Cloud API

AspectSelf-Hosted (qwen-tts package)DashScope API
SetupInstall package, download modelsSign up, get API key
InfrastructureRequires GPU (16-24GB VRAM)Managed cloud infrastructure
ScalingManual scaling, load balancingAuto-scaling, built-in load balancing
CostGPU hardware/cloud compute costsPay-per-use API pricing
LatencyLocal: lowest; Cloud: depends on networkOptimized cloud infrastructure
MaintenanceSelf-maintained, model updatesFully managed, automatic updates
CustomizationFull control, fine-tuning supportAPI parameters only
Data privacyComplete control, data stays localData sent to cloud (check privacy policy)

When to Use Self-Hosted

  • Need complete data privacy/control
  • High-volume processing with predictable load
  • Custom fine-tuning requirements
  • Network-isolated environments
  • Long-term cost optimization for stable workloads

When to Use DashScope API

  • Fast prototyping and development
  • Variable or unpredictable traffic
  • No GPU infrastructure available
  • Need high availability without ops overhead
  • Small to medium scale deployments
  • Want automatic model updates and optimizations

Pricing

For pricing information, contact Alibaba Cloud or refer to the official documentation:

Rate Limits

API rate limits vary by account tier. Check the official documentation for current limits:

Regional Availability

DashScope APIs are available in multiple regions:
  • China: Beijing, Shanghai, Hangzhou, Shenzhen
  • International: Singapore, US, Europe
Select the region closest to your users for optimal latency.

Best Practices

Error Handling

  • Implement retry logic with exponential backoff
  • Handle rate limit errors gracefully
  • Log API errors for debugging
  • Validate inputs before API calls

Performance Optimization

  • Use streaming mode for real-time applications
  • Batch requests when possible
  • Cache frequently generated audio
  • Choose nearest regional endpoint

Cost Optimization

  • Cache common phrases and responses
  • Use appropriate audio quality settings
  • Monitor usage and set alerts
  • Batch process when real-time isn’t required

Security

  • Never expose API keys in client-side code
  • Rotate API keys regularly
  • Use environment variables for credentials
  • Implement rate limiting on your application layer
  • Validate and sanitize user inputs

Migration Guide

From Self-Hosted to API

Migrating from the qwen-tts package to DashScope API: Before (Self-Hosted):
from qwen_tts import Qwen3TTSModel
import torch

model = Qwen3TTSModel.from_pretrained(
    "Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice",
    device_map="cuda:0",
    dtype=torch.bfloat16,
)

wavs, sr = model.generate_custom_voice(
    text="Hello world",
    language="English",
    speaker="Ryan",
)
After (DashScope API): Refer to the official API documentation for equivalent code:

From API to Self-Hosted

Reverse migration for data privacy or cost reasons:
  1. Set up GPU infrastructure
  2. Install qwen-tts package
  3. Download model weights
  4. Adapt API calls to library calls
  5. Implement caching and optimization

Support

For DashScope API support:
  • Documentation: DashScope API Docs
  • Technical Support: Alibaba Cloud support portal
  • Community: Qwen GitHub Issues and Discord
For self-hosted model support:
For complete API reference, authentication details, request/response formats, and code examples, please refer to the official DashScope documentation linked above.

Next Steps

Build docs developers (and LLMs) love