Skip to main content

Introduction

The Moonshine C API is the low-level interface that all other language bindings (Python, Swift, Java, etc.) use to interact with the Moonshine Voice library. This API provides direct access to transcription, streaming, and intent recognition capabilities.
Most developers should use the language-specific bindings for their platform (Python, Swift, Java, etc.) rather than the C API directly. The C API is primarily intended for:
  • Creating new language bindings
  • Porting to new platforms
  • Low-level optimization and debugging

Key Features

  • Thread-safe: All API calls are thread-safe and can be called from multiple threads concurrently
  • Flexible input: Supports any length audio input at various sample rates (16kHz recommended)
  • Streaming support: Incremental transcription with caching for low-latency real-time applications
  • Multiple languages: English, Spanish, Mandarin, Japanese, Korean, Vietnamese, Ukrainian, and Arabic
  • Intent recognition: Semantic matching for voice command interfaces

Architecture Overview

The Moonshine library processes audio through these main components:
  1. Transcriber: Core engine that loads models and manages transcription
  2. Streams: Handlers for individual audio input sources (one transcriber can manage multiple streams)
  3. Transcripts: Collections of transcript lines representing detected speech segments
  4. Intent Recognizer: Semantic matching engine for command recognition

Basic Workflow

Non-Streaming Transcription

For transcribing complete audio files or recordings:
#include "moonshine-c-api.h"

int main(int argc, char *argv[]) {
  // Load the transcriber
  int32_t transcriber_handle = moonshine_load_transcriber_from_files(
    "path/to/models", MOONSHINE_MODEL_ARCH_BASE, NULL, 0,
    MOONSHINE_HEADER_VERSION);
  if (transcriber_handle < 0) {
    fprintf(stderr, "Failed to load transcriber\n");
    return 1;
  }

  // Prepare audio data (16kHz float PCM, values between -1.0 and 1.0)
  float audio_data[32000] = {};
  size_t audio_length = 32000;
  int32_t sample_rate = 16000;
  
  // Transcribe the audio
  transcript_t *transcript = NULL;
  int32_t error = moonshine_transcribe_without_streaming(transcriber_handle,
    audio_data, audio_length, sample_rate, 0, &transcript);
  if (error != 0) {
    fprintf(stderr, "Failed to transcribe\n");
    return 1;
  }
  
  // Process results
  for (size_t i = 0; i < transcript->line_count; i++) {
    printf("Line %zu at %f seconds: %s\n", i, transcript->lines[i].start_time,
      transcript->lines[i].text);
  }
  
  // Clean up
  moonshine_free_transcriber(transcriber_handle);
  return 0;
}

Streaming Transcription

For real-time transcription from microphones or live audio sources:
// Load transcriber
int32_t transcriber_handle = moonshine_load_transcriber_from_files(
    "path/to/models", MOONSHINE_MODEL_ARCH_BASE_STREAMING, NULL, 0,
    MOONSHINE_HEADER_VERSION);

// Create and start a stream
int32_t stream_handle = moonshine_create_stream(transcriber_handle, 0);
moonshine_start_stream(transcriber_handle, stream_handle);

// Feed audio chunks as they become available
float* latest_audio_data;
size_t latest_audio_data_length;
while (get_audio_from_microphone(&latest_audio_data, &latest_audio_data_length)) {
  moonshine_transcribe_add_audio_to_stream(transcriber_handle,
    stream_handle, latest_audio_data, latest_audio_data_length,
    microphone_sample_rate, 0);
  
  // Get updated transcript periodically
  transcript_t *partial_transcript = NULL;
  moonshine_transcribe_stream(transcriber_handle,
    stream_handle, 0, &partial_transcript);
  print_transcript(partial_transcript);
}

// Stop and get final results
moonshine_stop_stream(transcriber_handle, stream_handle);
transcript_t *final_transcript = NULL;
moonshine_transcribe_stream(transcriber_handle, stream_handle, 0,
  &final_transcript);

// Clean up
moonshine_free_stream(transcriber_handle, stream_handle);
moonshine_free_transcriber(transcriber_handle);

Audio Format Requirements

format
float PCM
Audio data must be floating-point PCM values between -1.0 and 1.0
channels
mono
Only mono (single channel) audio is supported
sample_rate
16000 Hz recommended
While the library supports various sample rates, 16kHz is recommended to avoid resampling overhead

Model Files

The transcriber expects three files in the model directory:
  • encoder_model.ort - Quantized ONNX encoder model
  • decoder_model_merged.ort - Quantized ONNX decoder model
  • tokenizer.bin - Token-to-character mapping in binary format
Use the Python package’s download script to obtain these files:
python -m moonshine_voice.download --language en

Error Handling

Most functions return error codes that can be converted to human-readable strings:
int32_t result = moonshine_transcribe_without_streaming(...);
if (result != MOONSHINE_ERROR_NONE) {
  const char* error_msg = moonshine_error_to_string(result);
  fprintf(stderr, "Error: %s\n", error_msg);
}

Common Error Codes

CodeConstantDescription
0MOONSHINE_ERROR_NONESuccess
-1MOONSHINE_ERROR_UNKNOWNUnknown error
-2MOONSHINE_ERROR_INVALID_HANDLEInvalid transcriber or stream handle
-3MOONSHINE_ERROR_INVALID_ARGUMENTInvalid function argument

Thread Safety

All API calls are thread-safe. However, calculations on a single transcriber are serialized, so concurrent calls to the same transcriber from multiple threads will be processed sequentially, affecting latency. For best performance with multiple audio sources:
  • Use multiple streams on a single transcriber (shares model resources)
  • Or create separate transcribers for truly parallel processing

Memory Management

Transcript data returned by the library is owned by the transcriber and is valid only until:
  • The next call to that transcriber
  • The transcriber is freed with moonshine_free_transcriber()
Make copies of any data you need to retain beyond these points.

Next Steps

Function Reference

Detailed documentation for all C API functions

Python API

Higher-level Python interface (recommended for most users)

Build docs developers (and LLMs) love