Use this skill whenever the user asks to scrape a website, extract structured data from web pages, handle anti-bot/Cloudflare pages, crawl multiple pages, or...
Use Scrapling to extract web data with minimal selector breakage and better anti-bot resilience.
Prefer this skill when users ask for:
Before scraping, always:
All dependencies should live under D:\clawtest.
Recommended setup commands:
python -m venv D:\clawtest\.venv
D:\clawtest\.venv\Scripts\python -m pip install -U pip
D:\clawtest\.venv\Scripts\python -m pip install "scrapling[fetchers]"
D:\clawtest\.venv\Scripts\scrapling install
Notes:
pip install scrapling is enough.scrapling install is needed for browser-based fetchers.Choose the lightest option that works:
Fetcher:
StealthyFetcher:
DynamicFetcher:
Spider:
Fetcher -> StealthyFetcher -> DynamicFetcher escalation).from scrapling.fetchers import StealthyFetcher
StealthyFetcher.adaptive = True
url = "https://example.com/products"
page = StealthyFetcher.fetch(url, headless=True, network_idle=True, timeout=45000)
items = []
for card in page.css(".product-card", auto_save=True):
items.append({
"title": card.css("h2::text").get(default="").strip(),
"price": card.css(".price::text").get(default="").strip(),
"url": card.css("a::attr(href)").get(default="")
})
print(items)
# First run stores fingerprints:
products = page.css(".product-card", auto_save=True)
# Future run can recover after layout drift:
products = page.css(".product-card", adaptive=True)
from scrapling.spiders import Spider, Response
class ProductSpider(Spider):
name = "product_spider"
start_urls = ["https://example.com/catalog"]
async def parse(self, response: Response):
for card in response.css(".product-card"):
yield {
"title": card.css("h2::text").get(default="").strip(),
"price": card.css(".price::text").get(default="").strip(),
}
for href in response.css("a.next::attr(href)").all():
yield response.follow(href, callback=self.parse)
if __name__ == "__main__":
ProductSpider().start()
When executing a user task with this skill, respond with:
If extraction fails:
Fetcher -> StealthyFetcher.DynamicFetcher for JS-rendered content.auto_save=True then adaptive=True).ZIP package — ready to use