Smart web content fetcher - articles and videos from WeChat, Feishu, Bilibili, Zhihu, Toutiao, YouTube, etc. Triggers: '抓取文章', '下载网页', '保存文章', 'fetch URL', '...
Smart web content fetcher for Claude Code. Automatically detects platform and uses the best strategy to fetch articles or download videos.
# Fetch an article
python3 {SKILL_DIR}/fetcher.py "URL" -o ~/docs/
# Download a video
python3 {SKILL_DIR}/fetcher.py "https://b23.tv/xxx" -o ~/videos/
# Batch fetch from file
python3 {SKILL_DIR}/fetcher.py --urls-file urls.txt -o ~/docs/
Install only what you need — dependencies are checked at runtime:
| Dependency | Purpose | Install |
|---|---|---|
| scrapling | Article fetching (HTTP + browser) | pip install scrapling |
| yt-dlp | Video download | pip install yt-dlp |
| camoufox | Anti-detection browser (Xiaohongshu, Weibo) | pip install camoufox && python3 -m camoufox fetch |
| html2text | HTML to Markdown conversion | pip install html2text |
The fetcher automatically detects the platform from the URL:
| Platform | Method | Notes |
|---|---|---|
| mp.weixin.qq.com | scrapling | Extracts data-src images, handles SVG placeholders |
| *.feishu.cn | Virtual scroll | Collects all blocks via scrolling, downloads images with cookies |
| zhuanlan.zhihu.com | scrapling | .Post-RichText selector |
| www.zhihu.com | scrapling | .RichContent selector |
| www.toutiao.com | scrapling | Handles toutiaoimg.com base64 placeholders |
| www.xiaohongshu.com | camoufox | Anti-bot protection requires stealth browser |
| www.weibo.com | camoufox | Anti-bot protection requires stealth browser |
| bilibili.com / b23.tv | yt-dlp | Video download, supports quality selection |
| youtube.com / youtu.be | yt-dlp | Video download |
| douyin.com | yt-dlp | Video download |
| Unknown URLs | scrapling | Generic fetch with fallback tiers |
python3 {SKILL_DIR}/fetcher.py [URL] [OPTIONS]
Arguments:
url URL to fetch
Options:
-o, --output DIR Output directory (default: current)
-q, --quality N Video quality, e.g. 1080, 720 (default: 1080)
--method METHOD Force method: scrapling, camoufox, ytdlp, feishu
--selector CSS Force CSS selector for content extraction
--urls-file FILE File with URLs (one per line, # for comments)
--audio-only Extract audio only (video downloads)
--no-images Skip image download (articles)
--cookies-browser NAME Browser for cookies (e.g., chrome, firefox)
data-src attribute with mmbiz.qpic.cn URLs<img> tags contain SVG placeholders (lazy loading)Referer: https://mp.weixin.qq.com/ header[data-block-id] elements--cookies-browser chrome-q| Problem | Solution |
|---|---|
scrapling not found | pip install scrapling |
yt-dlp not found | pip install yt-dlp |
| Article content too short | Try --method camoufox for JS-heavy pages |
| Feishu returns login page | The doc may require authentication |
| Bilibili 403 | Use --cookies-browser chrome |
| Image download fails | Check network; WeChat images need Referer header (auto-handled) |
When the CLI doesn't fit your needs, use the modules directly:
from lib.router import route, check_dependency
from lib.article import fetch_article
from lib.video import fetch_video
from lib.feishu import fetch_feishu
# Route a URL
r = route("https://mp.weixin.qq.com/s/xxx")
# {'type': 'article', 'method': 'scrapling', 'selector': '#js_content', 'post': 'wx_images'}
# Fetch article
fetch_article(url, output_dir="/tmp/out", route_config=r)
# Download video
fetch_video(url, output_dir="/tmp/out", quality="720")
# Fetch Feishu doc
fetch_feishu(url, output_dir="/tmp/out")
ZIP package — ready to use