Turn your AI assistant into a TTS and voice cloning powerhouse using the Verbatik API. Use when generating speech from text, cloning voices, managing cloned...
Autonomously generate speech, clone voices, and manage audio via the Verbatik API.
VERBATIK_API_KEY=vbt_your_api_key_here
All requests use Bearer token:
Authorization: Bearer <VERBATIK_API_KEY>
Base URL: https://api.verbatik.com
Verbatik also exposes an MCP server for direct AI assistant integration. Endpoint:
https://api.verbatik.com/api/mcp/mcp
Supports OAuth 2.1 (one-click connect in Claude Desktop) and API key auth via mcp-remote bridge.
GET /api/v1/voices
Query params:
language — filter by language code (e.g. en-US, fr-FR)gender — Male, Female, or Neutralsearch — search by voice nameReturns array of voices with id (slug), name, gender, language_code, language_name, is_neural, sample_url, styles.
POST /api/v1/tts
Content-Type: text/plain
Authorization: Bearer <key>
X-Voice-ID: jenny-en-us
X-Store-Audio: true
Hello, this is a test of the Verbatik text-to-speech API.
Headers:
Content-Type — text/plain or application/ssml+xml for SSMLX-Voice-ID — voice slug (e.g. jenny-en-us). Defaults to Jenny if omittedX-Store-Audio — true to get a stored URL back, false for binary audio streamMax text length: 50,000 characters. Large texts are automatically chunked.
Cost: $0.002 per 1,000 characters
Response (when X-Store-Audio: true):
{
"success": true,
"audio_url": "https://...",
"characters_processed": 52,
"chunks_processed": 1,
"response_time_ms": 1200,
"cost_cents": 1
}
Response (when X-Store-Audio: false): Binary audio with metadata in response headers (X-Characters-Processed, X-Cost-Cents, X-Balance-Cents).
POST /api/v1/voice-training
Content-Type: application/json
Authorization: Bearer <key>
{
"audio_url": "https://example.com/sample.mp3",
"name": "My Voice",
"noise_reduction": true,
"volume_normalization": true,
"preview_text": "Hello, this is a preview of my cloned voice!"
}
Requirements:
.mp3, .wav (max 20MB)Response:
{
"success": true,
"voice_id": "uuid-here",
"name": "My Voice",
"fal_voice_id": "...",
"preview_url": "https://...",
"cost_cents": 300
}
POST /api/v1/voice-cloning
Content-Type: text/plain
Authorization: Bearer <key>
X-Voice-ID: <cloned_voice_uuid>
X-Store-Audio: true
Hello from my cloned voice!
Optional headers for voice control:
X-Speed — 0.5 to 2.0 (default: 1)X-Volume — 0 to 10 (default: 1)X-Pitch — -12 to 12 (default: 0)X-Emotion — happy, sad, angry, fearful, disgusted, surprised, neutralX-Format — mp3, pcm, flac (default: mp3)X-Language-Boost — language to enhance (e.g. English, French, Japanese)X-Sample-Rate — 8000, 16000, 22050, 24000, 32000, 44100X-Bitrate — 32000, 64000, 128000, 256000Voice modification (Speech 2.8 Turbo):
X-Voice-Modify-Pitch — -100 to 100X-Voice-Modify-Intensity — -100 to 100X-Voice-Modify-Timbre — -100 to 100Supports interjection tags in text: (laughs), (sighs), (coughs), (clears throat), (gasps), (sniffs), (groans), (yawns)
Supports pause markers: <#x#> where x = 0.01–99.99 seconds
Max text length: 5,000 characters. Cost: $0.10 per 1,000 characters
List all cloned voices:
GET /api/v1/my-voices
Optional query param: status — pending, ready, or failed
Get a specific voice:
GET /api/v1/my-voices/<voice_id>
Delete a voice:
DELETE /api/v1/my-voices/<voice_id>
GET /api/v1/preview/<voice_slug>
Returns binary audio preview. No auth required. Cached for 24 hours.
| Action | Cost |
|---|---|
| Standard TTS (pre-trained voices) | $0.002 / 1,000 chars |
| Cloned Voice TTS | $0.10 / 1,000 chars |
| Voice Cloning | $3.00 / clone |
| List voices, check balance, estimate cost | Free |
All usage is deducted from your prepaid balance. Auto top-up is available.
| Status | Meaning |
|---|---|
| 401 | Invalid or missing API key |
| 402 | Insufficient balance — top up required |
| 400 | Bad request (invalid params, text too long, voice not found) |
| 404 | Voice not found or doesn't belong to your workspace |
| 429 | Rate limit exceeded — check Retry-After header |
| 500 | Server error |
X-Store-Audio: true when you need a shareable URL — binary mode is for streamingjenny-en-us) not internal IDs for pre-trained voicesvoice-training or my-voices for cloned voicesnoise_reduction: true when cloning from imperfect audioestimate_cost MCP toolContent-Type: application/ssml+xmlZIP package — ready to use