Overview
The Music Generation API allows you to create original music from simple text prompts or detailed composition plans. Generate songs with vocals or instrumental tracks, control structure and style, and separate audio into stems.Methods
compose()
Generate a song from a text prompt or composition plan.A simple text prompt to generate a song from. Describes the desired style, mood, instruments, and characteristics. Cannot be used together with
composition_plan.Example prompts:- “upbeat pop song with guitar and drums”
- “relaxing ambient piano music”
- “energetic rock with electric guitar solo”
A detailed composition plan to guide music generation. Provides precise control over song structure, sections, instruments, and timing. Cannot be used together with
prompt.Use the composition plan API to create and refine plans before generation.The length of the song to generate in milliseconds. Used only with
prompt. Must be between 3000ms (3 seconds) and 600000ms (10 minutes). If not provided, the model chooses a length based on the prompt.Output format of the generated audio. Formatted as
codec_sample_rate_bitrate:mp3_44100_128- MP3 at 44.1kHz, 128kbps (recommended)mp3_22050_32- MP3 at 22.05kHz, 32kbpspcm_16000- PCM at 16kHzpcm_22050- PCM at 22.05kHzpcm_44100- PCM at 44.1kHz (requires Pro tier or above)ulaw_8000- μ-law at 8kHz
The model to use for generation. Currently only
music_v1 is available.Random seed to initialize the music generation process. Providing the same seed with the same parameters can help achieve more consistent results, though exact reproducibility is not guaranteed and outputs may change across system updates. Cannot be used with
prompt.If
True, guarantees the generated song will be instrumental (no vocals). If False, the song may or may not have vocals depending on the prompt. Can only be used with prompt.Controls how strictly section durations in the
composition_plan are enforced. Only used with composition_plan.True- Model precisely respects each section’sduration_msfrom the planFalse- Model may adjust individual section durations for better quality and latency, while preserving total song duration
Whether to store the generated song for inpainting. Only available to enterprise clients with access to the inpainting API.
Whether to sign the generated song with C2PA content credentials. Applicable only for MP3 files. Adds cryptographic proof of AI generation.
Request-specific configuration including chunk_size and other customizations.
An iterator yielding audio data chunks in the specified output format.
compose_detailed()
Generate a song with detailed metadata about the composition.Text prompt describing the desired music.
Detailed composition plan for structured generation.
Song length in milliseconds (3000-600000).
Audio output format.
Model to use (
music_v1).Random seed for generation.
Force instrumental output (no vocals).
Store for future inpainting operations.
Whether to return word-level timestamps for vocal tracks.
Sign with C2PA content credentials.
Request-specific configuration.
A response object containing:
json(dict) - Metadata including composition plan and song metadata:compositionPlan- Detailed composition structuresongMetadata- Title, description, genres, languages, explicit flag
audio(bytes) - The generated audio filefilename(str) - Suggested filename for the audiosong_id(str) - Unique identifier (ifstore_for_inpainting=True)
stream()
Stream music generation from a prompt or composition plan.Text prompt for music generation.
Detailed composition plan.
Song length in milliseconds.
Audio output format.
Model to use.
Random seed.
Force instrumental output.
Store for inpainting.
Request-specific configuration.
An iterator yielding streaming audio data chunks as they’re generated.
separate_stems()
Separate an audio file into individual instrument stems.The audio file to separate into stems. Supports common formats like MP3, WAV, M4A, etc.
Output format for the separated stems. Each stem will be saved in this format.
The ID of the stem variation to use. Different variations provide different stem separations (e.g., vocals, drums, bass, other).
Whether to sign the separated stems with C2PA. Applicable only for MP3 files.
Request-specific configuration.
An iterator yielding a ZIP archive containing separated audio stems. Each stem is provided as a separate audio file in the requested output format.
Async Methods
All methods have async equivalents:Advanced Usage
Using Composition Plans
Processing Stems
Use Cases
- Content creation: Generate background music for videos and podcasts
- Game development: Create dynamic game soundtracks
- Marketing: Produce custom music for advertisements
- Music production: Generate ideas and backing tracks
- Audio post-production: Separate and remix existing tracks
- Personalization: Create custom music based on user preferences
Best Practices
Prompt Engineering
- Be specific about style, instruments, and mood
- Mention tempo (e.g., “fast”, “slow”, “moderate”)
- Specify instrumentation (e.g., “piano and strings”)
- Include mood descriptors (e.g., “energetic”, “calm”, “dramatic”)
Quality Optimization
- Use higher bitrate formats (128kbps or higher) for final output
- Use
compose_detailed()to get metadata and composition information - Set
respect_sections_durations=Falsefor better quality with composition plans - Use C2PA signing for content authentication
Performance
- Use
stream()for real-time applications - Cache generated songs with
song_idfor future reference - Use appropriate
music_length_msto balance quality and generation time