Extract transcripts/subtitles from video URLs and deliver as .docx files. Use this skill whenever the user provides a video link (YouTube, Bilibili, or any y...
Extract subtitles from a video URL and deliver clean .docx transcript files. For non-Chinese videos, produce two files: original language + Chinese translation. For Chinese videos, produce one file.
Check if node_modules exists in the scripts directory. If not, run:
$skillScripts = "$env:USERPROFILE\.agents\skills\video-transcript\scripts"
if (-not (Test-Path "$skillScripts\node_modules\docx")) {
Push-Location $skillScripts
npm install
Pop-Location
}
Run in a temp directory (use $env:TEMP\vt_<random> on Windows):
$tmp = "$env:TEMP\vt_$(Get-Random)"
New-Item -ItemType Directory -Force $tmp | Out-Null
YouTube:
# Try manual subs first, fall back to auto-generated
yt-dlp --skip-download --write-subs --write-auto-subs --sub-langs "en,zh-Hans,zh-Hant,zh" --convert-subs srt -o "$tmp\sub" "<URL>"
Bilibili:
yt-dlp --skip-download --write-subs --sub-langs "zh-Hans,zh,ai-zh" --convert-subs srt -o "$tmp\sub" "<URL>"
Other platforms: Use the same YouTube command — yt-dlp handles most platforms automatically.
After download, list what was fetched:
Get-ChildItem $tmp -Filter "*.srt"
Pick the best available file:
.en.srt over .en-orig.srt or .en-auto.srt)zh-Hans or ai-zhIf NO subtitle file was downloaded:
Read the .srt file and strip all timing/index lines. Keep only the spoken text lines.
SRT format to strip:
1
00:00:01,000 --> 00:00:03,500
This is the spoken text.
2
00:00:04,000 --> 00:00:06,000
More text here.
Output: plain text, one paragraph per subtitle block, blank lines between blocks removed (merge into flowing paragraphs). Also strip HTML tags like <i>, <b>, <font ...>.
Do this parsing yourself by reading the file content — no external tool needed.
Detect the subtitle language:
Translation guidelines:
Use the bundled script. The script path is:
~/.agents/skills/video-transcript/scripts/make_docx.js
Get the video title from yt-dlp output or use a sanitized version of the URL as fallback.
For non-Chinese video (two files):
# Original
node "~/.agents/skills/video-transcript/scripts/make_docx.js" "$tmp\transcript_original.docx" "<VideoTitle> - Original" "<plain_text>"
# Chinese translation
node "~/.agents/skills/video-transcript/scripts/make_docx.js" "$tmp\transcript_zh.docx" "<VideoTitle> - 中文译稿" "<chinese_text>"
For Chinese video (one file):
node "~/.agents/skills/video-transcript/scripts/make_docx.js" "$tmp\transcript_zh.docx" "<VideoTitle> - 讲稿" "<plain_text>"
For long texts, write content to a temp .txt file first and pipe it:
$plain | Out-File -Encoding utf8 "$tmp\content.txt"
Get-Content "$tmp\content.txt" -Raw | node "~/.agents/skills/video-transcript/scripts/make_docx.js" "$tmp\transcript_original.docx" "<title>"
Tell the user the output file paths clearly:
C:\...\transcript_original.docx"C:\...\transcript_zh.docx"Optionally show a short preview (first 200 chars) of the extracted text so the user can verify quality.
| Situation | Action |
|---|---|
| No subtitles found | Inform user, suggest Whisper as alternative |
| yt-dlp not found | yt-dlp --version to check; tell user to install if missing |
| node/docx error | Show error, check if docx npm package is installed globally |
| Private/geo-blocked video | Inform user the video is inaccessible |
$tmp dir after delivering files, or leave it and tell the user the pathmake_docx.js script resolves docx from global npm automaticallyZIP package — ready to use