Crawlee

Crawlee

Python web crawler and browser automation library

  • Unified HTTP and headless browser crawling interface
  • Automatic parallel crawling based on system resources
  • Python type prompts enhance the development experience
  • Automatic error retry and anti blocking function
  • Integrate proxy rotation and session management
  • Configurable request routing and persistent URL queue
  • Support multiple data and file storage methods
  • Robust error handling mechanism

Product Details

Crawlee is a Python web crawler and browser automation library used to build reliable crawlers, extracting data for AI, LLMs, RAG, or GPTs. It provides a unified interface to handle HTTP and headless browser crawling tasks, supports automatic parallel crawling, and adjusts based on system resources. Crawlee is written in Python and includes type hints, enhancing the development experience and reducing errors. It has features such as automatic retry, integrated proxy rotation and session management, configurable request routing, persistent URL queue, and pluggable storage options. Compared to Scrapy, Crawle provides native support for headless browser crawling, has a simple and elegant interface, and is completely based on standard asynchronous IO.