Crawlee

Crawlee

Python web crawler and browser automation library

Unified HTTP and headless browser crawling interface
Automatic parallel crawling based on system resources
Python type prompts enhance the development experience
Automatic error retry and anti blocking function
Integrate proxy rotation and session management
Configurable request routing and persistent URL queue
Support multiple data and file storage methods
Robust error handling mechanism

Product Details

Crawlee is a Python web crawler and browser automation library used to build reliable crawlers, extracting data for AI, LLMs, RAG, or GPTs. It provides a unified interface to handle HTTP and headless browser crawling tasks, supports automatic parallel crawling, and adjusts based on system resources. Crawlee is written in Python and includes type hints, enhancing the development experience and reducing errors. It has features such as automatic retry, integrated proxy rotation and session management, configurable request routing, persistent URL queue, and pluggable storage options. Compared to Scrapy, Crawle provides native support for headless browser crawling, has a simple and elegant interface, and is completely based on standard asynchronous IO.

Product Details

Related Projects

Kipps.AI

Airtable Cobuilder

Tempest AI

llama-agentic-system