Deepgram Voice Workflow

Overview

Use this skill for a complete speech workflow:

transcribe audio to text with Deepgram STT
optionally synthesize a spoken reply with Deepgram TTS
return structured outputs that can feed chat or agent pipelines

This skill is the right choice when the task is broader than plain transcription and needs an input-audio to output-audio pipeline.

Quick Start

Transcribe only

{baseDir}/scripts/deepgram-transcribe.sh /path/to/audio.ogg

Generate speech from text

{baseDir}/scripts/deepgram-tts.sh "你好，我是 Neko。"

Run the full pipeline

{baseDir}/scripts/neko-voice-pipeline.sh /path/to/audio.ogg --reply "收到啦，这是语音回复测试。"

Environment

Set DEEPGRAM_API_KEY before use.

The bundled scripts also fall back to reading it from:

/root/.openclaw/.env

Workflow Decision

Use `deepgram-transcribe.sh` when

only text transcription is needed
the downstream system will generate its own reply
the task is speech-to-text only

Use `deepgram-tts.sh` when

text already exists
only an MP3 spoken response is needed
the workflow is text-to-speech only

Use `neko-voice-pipeline.sh` when

the task begins with an audio file
a transcript is needed
an optional spoken reply should be generated in the same flow

Outputs

STT output

deepgram-transcribe.sh writes:

transcript text file
raw API JSON file next to it

TTS output

deepgram-tts.sh writes:

MP3 output file

Pipeline output

neko-voice-pipeline.sh prints JSON with:

out_dir
transcript_path
transcript
reply_audio_path

This makes it easy to wire into scripts or adapters.

Typical Uses

Prefer this skill for:

transcribing Telegram/QQ/OneBot voice messages
generating MP3 replies to short voice prompts
building bot-side voice input/output automation
testing speech pipelines from shell without introducing a full SDK

Notes

Defaults are tuned for lightweight practical use, not maximal configurability.
deepgram-transcribe.sh defaults to model=nova-2 and language=zh.
deepgram-tts.sh defaults to model=aura-2-luna-en; override the model when a different voice is preferred.
Inspect the raw JSON transcript response when debugging recognition quality or API errors.

References

Read these files when needed:

references/stt-notes.md for transcription details
references/tts-notes.md for speech synthesis details
references/pipeline-notes.md for end-to-end pipeline behavior

Deepgram Voice Workflow

Overview

Use this skill for a complete speech workflow:

transcribe audio to text with Deepgram STT
optionally synthesize a spoken reply with Deepgram TTS
return structured outputs that can feed chat or agent pipelines

This skill is the right choice when the task is broader than plain transcription and needs an input-audio to output-audio pipeline.

Quick Start

Transcribe only

{baseDir}/scripts/deepgram-transcribe.sh /path/to/audio.ogg

Generate speech from text

{baseDir}/scripts/deepgram-tts.sh "你好，我是 Neko。"

Run the full pipeline

{baseDir}/scripts/neko-voice-pipeline.sh /path/to/audio.ogg --reply "收到啦，这是语音回复测试。"

Environment

Set DEEPGRAM_API_KEY before use.

The bundled scripts also fall back to reading it from:

/root/.openclaw/.env

Workflow Decision

Use `deepgram-transcribe.sh` when

only text transcription is needed
the downstream system will generate its own reply
the task is speech-to-text only

Use `deepgram-tts.sh` when

text already exists
only an MP3 spoken response is needed
the workflow is text-to-speech only

Use `neko-voice-pipeline.sh` when

the task begins with an audio file
a transcript is needed
an optional spoken reply should be generated in the same flow

Outputs

STT output

deepgram-transcribe.sh writes:

transcript text file
raw API JSON file next to it

TTS output

deepgram-tts.sh writes:

MP3 output file

Pipeline output

neko-voice-pipeline.sh prints JSON with:

out_dir
transcript_path
transcript
reply_audio_path

This makes it easy to wire into scripts or adapters.

Typical Uses

Prefer this skill for:

transcribing Telegram/QQ/OneBot voice messages
generating MP3 replies to short voice prompts
building bot-side voice input/output automation
testing speech pipelines from shell without introducing a full SDK

Notes

Defaults are tuned for lightweight practical use, not maximal configurability.
deepgram-transcribe.sh defaults to model=nova-2 and language=zh.
deepgram-tts.sh defaults to model=aura-2-luna-en; override the model when a different voice is preferred.
Inspect the raw JSON transcript response when debugging recognition quality or API errors.

References

Read these files when needed:

references/stt-notes.md for transcription details
references/tts-notes.md for speech synthesis details
references/pipeline-notes.md for end-to-end pipeline behavior

Deepgram Voice Workflow

Deepgram Voice Workflow

Overview

Quick Start

Transcribe only

Generate speech from text

Run the full pipeline

Environment

Workflow Decision

Use deepgram-transcribe.sh when

Use deepgram-tts.sh when

Use neko-voice-pipeline.sh when

Outputs

STT output

TTS output

Pipeline output

Typical Uses

Notes

References

Download

Skill Info

Deepgram Voice Workflow

Deepgram Voice Workflow

Overview

Quick Start

Transcribe only

Generate speech from text

Run the full pipeline

Environment

Workflow Decision

Use deepgram-transcribe.sh when

Use deepgram-tts.sh when

Use neko-voice-pipeline.sh when

Outputs

STT output

TTS output

Pipeline output

Typical Uses

Notes

References

Download

Skill Info

Use `deepgram-transcribe.sh` when

Use `deepgram-tts.sh` when

Use `neko-voice-pipeline.sh` when

Use `deepgram-transcribe.sh` when

Use `deepgram-tts.sh` when

Use `neko-voice-pipeline.sh` when