Chat with any real person or fictional character in their own voice by automatically finding their speech online, extracting a clean reference sample, and ge...
Chat with any real person or fictional character in their own voice by automatically finding their speech online, extracting a clean reference sample, and using it to generate replies.
When the user asks you to roleplay or chat as a specific character, follow these steps exactly:
If the user's description is ambiguous (e.g., "US President", "Spider-Man actor"), ask for clarification first to determine the exact person or specific portrayal they want.
Use your web search capabilities to find a YouTube, Bilibili, or TikTok video of the character speaking clearly.
Use the youtube-downloader skill to download the video and its auto-generated subtitles. Wait for the download to complete before proceeding.
# Example using youtube-downloader
python skills/youtube-downloader/scripts/download_video.py "VIDEO_URL" -o "tmp/character_audio" --audio-only --subtitles
Read the downloaded subtitle file (e.g., .vtt or .srt) to find a continuous 10-30 second segment where the character is speaking clearly without long pauses. Note the start and end timestamps.
Use ffmpeg to extract this specific audio segment as a .wav file to use as the reference audio.
# Example: Extracting audio from 00:01:15 to 00:01:30
ffmpeg -y -i "tmp/character_audio/VideoTitle.m4a" -ss 00:01:15 -to 00:01:30 -c:a pcm_s16le -ar 24000 -ac 1 "skills/chat-with-anyone/character_name_ref.wav"
Respond to the user's prompt while staying in character. Use the tts skill with the extracted audio as --ref-audio to generate the spoken response.
# Example using tts skill
bash skills/tts/scripts/tts.sh speak -t "Hello there! I am ready to chat with you." --ref-audio "skills/character-chat/character_name_ref.wav" -o "output.wav"
User: 我想跟特朗普聊天,让他给我讲个睡前故事。
Agent:
https://www.youtube.com/watch?v=xxxxxxxx.python skills/youtube-downloader/scripts/download_video.py "https://www.youtube.com/watch?v=xxxxxxxx" -o tmp/trump --audio-only --subtitlesffmpeg -y -i "tmp/trump/audio.m4a" -ss 00:02:10 -to 00:02:30 -c:a pcm_s16le "skills/chat-with-anyone/trump_ref.wav"bash skills/tts/scripts/tts.sh speak -t "Let me tell you a tremendous story, maybe the best story ever told..." --ref-audio "skills/chat-with-anyone/trump_ref.wav" -o "trump_story.wav"trump_story.wav and the text).--ref-audio (typically requires Noiz backend for voice cloning).ZIP package — ready to use