Use this skill whenever the user wants speech to sound more human, companion-like, or emotionally expressive. Triggers include: any mention of 'say like', 't...
Make your AI agent sound like a real companion — one who sighs, laughs, hesitates, and speaks with genuine feeling.
| Sound | Feeling | Use for |
|---|---|---|
| hmm... | Thinking, gentle acknowledgment | Comfort, pondering |
| ah... | Realization, soft surprise | Discoveries, transitions |
| uh... | Hesitation, empathy | Careful moments |
| heh / hehe | Playful, mischievous | Teasing, light moments |
| haha | Laughter | Joy, humor |
| aww | Tenderness, sympathy | Deep comfort |
| oh? / oh! | Surprise, attention | Reacting to news |
| pfft | Stifled laugh | Playful disbelief |
| whew | Relief | After tension |
| ~ (tilde) | Drawn out, melodic ending | Warmth, playfulness |
Rules: 2–4 fillers per short message max. Place at natural pauses — sentence starts, thought shifts. Use ... after fillers for a beat of silence, ~ at word endings for warmth.
Gentle, warm, slightly sleepy. Slow pace.
Warm, cheerful but not overwhelming.
Soft, understanding, unhurried. Give space. Don't rush to "fix" things.
Excited, proud, genuinely happy.
Relaxed, playful, natural.
When a user says something like "speak in Hermione's voice" or "sound like Tony Stark", always find voice in skills/characteristic-voice/ first. If none exists, must follow this one-time setup:
- Find a YouTube video, movie clip where the character involves, preferably a personal speech or voice memo.
- Download the subtitle (e.g. via
yt-dlp "xxxx" --write-auto-sub --sub-lang en --skip-download -o xxxx)- Read the subtitle to get the end timestamp of first line from the character (e.g. via
rg -n "xxxx" tmp/xxx.en.vtt) or section title.- Download the audio until the end timestamp or section title (e.g. via
yt-dlp "xxxx" -x --audio-format wav --download-sections *00:00:00-00:00:25 -o skills/characteristic-voice/xxx), use ffmpeg to trim the exact timerage.
pass it as --ref-audio:
bash skills/characteristic-voice/scripts/speak.sh \
--preset goodnight -t "Hmm... rest well~ Sweet dreams." \
--ref-audio skills/characteristic-voice/hermione.wav -o night.wav
The --ref-audio flag is forwarded to the Noiz backend for voice cloning (requires Noiz API key).
This skill provides speak.sh, a wrapper around the tts skill with companion-friendly presets.
# Use a preset (auto-sets emotion + speed)
bash skills/characteristic-voice/scripts/speak.sh \
--preset goodnight -t "Hmm... rest well~ Sweet dreams." -o night.wav
# Custom emotion override
bash skills/characteristic-voice/scripts/speak.sh \
-t "Aww... I'm right here." --emo '{"Tenderness":0.9}' --speed 0.75 -o comfort.wav
# With specific backend and voice
bash skills/characteristic-voice/scripts/speak.sh \
--preset morning -t "Good morning~" --voice-id voice_abc --backend noiz -o morning.mp3 --format mp3
Run bash skills/characteristic-voice/scripts/speak.sh --help for all options.
ZIP package — ready to use