Skip to main content
curl --request POST \
--url https://app.altheastudio.ai/api/voices \
--header 'Content-Type: multipart/form-data' \
--header 'X-API-Key: <your-api-key>' \
--form 'file=@/path/to/voice.mp3' \
--form 'name=My Custom Voice' \
--form 'description=Voice recorded on Jan 1, 2024'
{
  "voiceId": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "name": "<string>",
  "languageLabel": "<string>",
  "previewUrl": "<string>",
  "ownership": "public",
  "billingStyle": "VOICE_BILLING_STYLE_INCLUDED",
  "provider": "<string>",
  "definition": {
    "elevenLabs": {
      "voiceId": "<string>",
      "model": "<string>",
      "speed": 123,
      "useSpeakerBoost": true,
      "style": 123,
      "similarityBoost": 123,
      "stability": 123,
      "pronunciationDictionaries": [
        {
          "dictionaryId": "<string>",
          "versionId": "<string>"
        }
      ],
      "optimizeStreamingLatency": 123,
      "maxSampleRate": 123
    },
    "cartesia": {
      "voiceId": "<string>",
      "model": "<string>",
      "speed": 123,
      "emotion": "<string>",
      "emotions": [
        "<string>"
      ],
      "generationConfig": {
        "volume": 123,
        "speed": 123,
        "emotion": "<string>",
        "pronunciationDictId": "<string>"
      }
    },
    "lmnt": {
      "voiceId": "<string>",
      "model": "<string>",
      "speed": 123,
      "conversational": true
    },
    "google": {
      "voiceId": "<string>",
      "speakingRate": 123
    },
    "inworld": {
      "voiceId": "<string>",
      "modelId": "<string>",
      "speakingRate": 123,
      "temperature": 123,
      "applyTextNormalization": true
    },
    "respeecher": {
      "voiceId": "<string>",
      "seed": 123,
      "temperature": 123,
      "topK": 123,
      "topP": 123,
      "minP": 123,
      "presencePenalty": 123,
      "repetitionPenalty": 123,
      "frequencyPenalty": 123
    },
    "generic": {
      "url": "<string>",
      "headers": {},
      "body": {},
      "responseSampleRate": 123,
      "responseWordsPerMinute": 123,
      "responseMimeType": "<string>",
      "jsonAudioFieldPath": "<string>",
      "jsonByteEncoding": "JSON_BYTE_ENCODING_UNSPECIFIED"
    }
  },
  "description": "<string>",
  "primaryLanguage": "<string>"
}
Any created voices are private to your account. Uses multipart/form-data encoding to provide the name of the voice along with an audio file containing the voice to be used for cloning.

Authorizations

X-API-Key
string
header
required

API key

Body

file
file
required

An audio file containing a sample of the voice to clone.

name
string
required

Name for the cloned voice. Must be unique within your account.

Example:

"My Custom Voice"

description
string

Optional description for the voice. If not provided, a default description will be generated.

Example:

"Voice recorded on Jan 1, 2024"

language
string
default:en

BCP47 language code for the language used in the recording.

Example:

"en-US"

Response

201 - application/json
voiceId
string<uuid>
required
name
string
required
Maximum string length: 40
languageLabel
string | null
required

Human-readable language label with flag emoji and English name (e.g., 'πŸ‡ΊπŸ‡Έ English (United States)').

previewUrl
string<uri>
required
ownership
enum<string>
required
Available options:
public,
private
billingStyle
enum<string>
required

How billing works for this voice. VOICE_BILLING_STYLE_INCLUDED - The cost of this voice is included in the call cost. There are no additional charges for it. VOICE_BILLING_STYLE_EXTERNAL - This voice requires an API key for its provider, who will bill for usage separately.

Available options:
VOICE_BILLING_STYLE_INCLUDED,
VOICE_BILLING_STYLE_EXTERNAL
provider
string | null
required
definition
object
required

A voice not known to Althea Realtime that can nonetheless be used for a call. Such voices are significantly less validated than normal voices and you'll be responsible for your own TTS-related errors. Exactly one field must be set.

description
string | null
Maximum string length: 240
primaryLanguage
string | null

BCP47 language code for the primary language supported by this voice.

Maximum string length: 10