Extract structured data from websites using browser automation. Use when scraping product listings, articles, contact info, prices, or any web content. Suppo...
Professional web scraping skill using agent-browser. Extract structured data from any website with support for JavaScript-rendered content, pagination, and complex selectors.
python scripts/scrape_page.py \
--url "https://example.com/products" \
--fields "title= h2.title,price=.price,link=a.href" \
--output products.csv
python scripts/scrape_paginated.py \
--url "https://example.com/products?page={page}" \
--pages 10 \
--fields "title,price,description" \
--output all_products.csv
Scrape a single page or static list.
Arguments:
--url - Target URL--fields - Field definitions (name=selector format, comma-separated)--output - Output file (CSV, JSON, or XLSX)--format - Output format (csv, json, xlsx)--wait - Wait time for dynamic content (seconds)Field Definition Format:
fieldname=css_selector
Examples:
title=h1.product-title
price=.price-tag
description=.product-description
image=img.product-image.src
link=a.product-link.href
Scrape multiple pages with pagination.
Arguments:
--url - URL pattern (use {page} for page number)--pages - Number of pages to scrape--fields - Field definitions--output - Output file--delay - Delay between pages (seconds)--next-selector - CSS selector for "next page" button (alternative to URL pattern)Scrape pages with infinite scroll loading.
Arguments:
--url - Target URL--scrolls - Number of scroll actions--fields - Field definitions--output - Output file--scroll-delay - Delay between scrolls (ms)Scrape JavaScript-heavy sites with custom interactions.
Arguments:
--url - Target URL--actions - JSON file with interaction sequence--fields - Field definitions--output - Output file{
"actions": [
{"type": "click", "selector": "#load-more"},
{"type": "wait", "ms": 2000},
{"type": "scroll", "direction": "down", "pixels": 500},
{"type": "fill", "selector": "#search", "value": "keyword"},
{"type": "press", "key": "Enter"}
]
}
CSV:
title,price,link,url
"Product A",29.99,https://...,https://...
"Product B",39.99,https://...,https://...
JSON:
[
{
"title": "Product A",
"price": "29.99",
"link": "https://...",
"scraped_at": "2026-03-07T16:00:00"
}
]
Excel (XLSX):
python scripts/scrape_paginated.py \
--url "https://example.com/shop?page={page}" \
--pages 5 \
--fields "name=.product-name,price=.price,rating=.stars,reviews=.review-count,url=a.href" \
--output products.csv \
--delay 3
python scripts/scrape_page.py \
--url "https://news-site.com/latest" \
--fields "headline=h2.article-title,summary=.article-summary,author=.byline,date=.publish-date,url=a.read-more.href" \
--output articles.json \
--format json
python scripts/scrape_infinite_scroll.py \
--url "https://jobs-site.com/search" \
--scrolls 10 \
--fields "title=.job-title,company=.company-name,location=.location,salary=.salary,posted=.date-posted,url=a.job-link.href" \
--output jobs.csv \
--scroll-delay 1500
python scripts/scrape_paginated.py \
--url "https://realestate.com/listings?page={page}" \
--pages 20 \
--fields "address=.property-address,price=.listing-price,beds=.bedrooms,baths=.bathrooms,sqft=.square-feet,url=a.property-link.href" \
--output listings.xlsx \
--format xlsx \
--delay 5
Some sites employ anti-scraping techniques:
| Measure | Countermeasure |
|---|---|
| IP blocking | Use proxies, rotate IPs |
| CAPTCHA | Manual solving or CAPTCHA services |
| Rate limiting | Increase delays, randomize timing |
| JavaScript challenges | Use browser automation (agent-browser) |
| Honeypot traps | Avoid hidden fields, validate selectors |
Disclaimer: This skill is for educational purposes. Users are responsible for compliance with applicable laws and website terms.
See references/css-selectors.md for comprehensive selector examples.
See references/website-patterns.md for common HTML structures and selectors.
ZIP package — ready to use