Web scraping, crawling, and search via Firecrawl API. Converts web pages to clean markdown/JSON optimized for AI consumption. Use when you need to extract co...
Professional web scraping powered by Firecrawl API. Converts websites to clean, AI-ready markdown or structured JSON.
Scrape a single page:
python3 scripts/scrape.py https://example.com
Crawl a website:
python3 scripts/scrape.py --crawl https://docs.example.com --depth 2 --limit 10
Search and scrape:
python3 scripts/scrape.py --search "AI agent frameworks" --limit 5
Check crawl status:
python3 scripts/scrape.py --crawl-status abc123
Extract content from a single URL:
python3 scripts/scrape.py <url> [options]
Options:
--formats markdown,html,screenshot — Output formats (default: markdown)--full — Include full page (no main content extraction)--json — Output raw JSON responseExamples:
# Basic scrape
python3 scripts/scrape.py https://docs.example.com
# Get HTML and markdown
python3 scripts/scrape.py https://site.com --formats markdown,html
# Full page (no content filtering)
python3 scripts/scrape.py https://site.com --full
# JSON output
python3 scripts/scrape.py https://site.com --json
Systematically crawl and scrape multiple pages:
python3 scripts/scrape.py --crawl <url> [options]
Options:
--depth N — Maximum crawl depth (default: 2)--limit N — Maximum pages to crawl (default: 10)--json — Output raw JSON responseExamples:
# Basic crawl
python3 scripts/scrape.py --crawl https://docs.site.com
# Deep crawl with limit
python3 scripts/scrape.py --crawl https://blog.com --depth 3 --limit 50
# Shallow crawl
python3 scripts/scrape.py --crawl https://site.com --depth 1 --limit 5
Note: Crawl returns a job ID. Use --crawl-status to check progress and retrieve results.
Search the web and get scraped content from results:
python3 scripts/scrape.py --search <query> [options]
Options:
--limit N — Number of results (default: 5)--json — Output raw JSON responseExamples:
# Search and scrape
python3 scripts/scrape.py --search "WordPress security best practices"
# More results
python3 scripts/scrape.py --search "AI agents 2026" --limit 10
# JSON output
python3 scripts/scrape.py --search "casino bonuses" --json
Check status of a crawl job:
python3 scripts/scrape.py --crawl-status <job-id>
Returns JSON with:
scraping, completed, failedMarkdown (default): Clean, LLM-ready text with preserved structure
HTML: Full HTML source (useful for parsing specific elements)
Screenshot: Base64-encoded PNG of rendered page
JSON: Structured data extraction (custom schemas supported)
Smart Content Extraction:
JavaScript Support:
Anti-Bot Handling:
Caching:
The script looks for the Firecrawl API key in:
workspace/secrets/firecrawl_api_key (OpenClaw workspace)secrets/firecrawl_api_key (relative to current directory)FIRECRAWL_API_KEY environment variableCurrent key is stored at: workspace/secrets/firecrawl_api_key
Free tier: 500 credits
Paid plans: Starting at $16/month (3,000 credits)
Documentation Extraction:
python3 scripts/scrape.py --crawl https://docs.framework.com --depth 2 --limit 50
Competitive Research:
python3 scripts/scrape.py --search "top casino affiliate sites" --limit 10
Content Migration:
python3 scripts/scrape.py https://old-site.com/page1 --formats markdown
News Monitoring:
python3 scripts/scrape.py --search "WordPress security updates" --limit 5
Blog Scraping:
python3 scripts/scrape.py --crawl https://blog.site.com --depth 1 --limit 20
--limit values to test--depth 1 for blog homepages (gets all posts)--depth 2-3 for documentation sites--search is faster than manual crawling for research--crawl-status regularly for long crawls--json for programmatic processingvs web_fetch tool:
vs browser tool:
ZIP package — ready to use