Titan AI LogoTitan AI

crawlee

19,402
987
TypeScript

Project Description

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

crawlee: Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In Jav

Project Title

crawlee — A comprehensive web scraping and browser automation library for Node.js

Overview

Crawlee is a robust web scraping and browser automation library designed for Node.js, offering a reliable solution for building efficient crawlers. It supports various tools like Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP, and is capable of handling both headful and headless modes. Crawlee is particularly adept at mimicking human-like behavior, allowing it to bypass modern bot protections effectively.

Key Features

  • Supports multiple scraping and automation tools (Puppeteer, Playwright, Cheerio, JSDOM, raw HTTP)
  • Capable of headless and headful browser operations
  • Proxy rotation for added flexibility
  • Human-like behavior to evade bot protections
  • Supports data extraction for AI, LLMs, RAG, or GPTs
  • Downloads various file types (HTML, PDF, JPG, PNG, etc.)

Use Cases

  • Data extraction for AI and machine learning models
  • Building reliable web crawlers for scraping data from websites
  • Automating browser tasks for testing or data collection
  • Downloading files from the web for further processing or storage

Advantages

  • High configurability to suit various project needs
  • Fast and efficient, with the ability to fly under the radar of bot protections
  • Supports both headless and headful modes for different use cases
  • Comprehensive documentation and community support

Limitations / Considerations

  • Requires Node.js 16 or higher
  • May have a steeper learning curve for those unfamiliar with Node.js or web scraping tools
  • Performance may vary depending on the complexity of the website being scraped

Similar / Related Projects

  • Puppeteer: A high-level API for controlling headless Chrome or Chromium browsers, similar to Crawlee in terms of browser automation capabilities but without the same level of scraping features.
  • Scrapy: An open-source and collaborative framework for extracting the data you need from websites, differing from Crawlee in that it is Python-based and has a different set of tools and libraries.
  • Playwright: A Node library to automate Chromium, Firefox, and WebKit with a single API, which Crawlee integrates with for browser automation tasks.

Basic Information


📊 Project Information

  • Project Name: crawlee
  • GitHub URL: https://github.com/apify/crawlee
  • Programming Language: TypeScript
  • ⭐ Stars: 19,370
  • 🍴 Forks: 982
  • 📅 Created: 2016-08-26
  • 🔄 Last Updated: 2025-09-08

🏷️ Project Topics

Topics: [, ", a, p, i, f, y, ", ,, , ", a, u, t, o, m, a, t, i, o, n, ", ,, , ", c, r, a, w, l, e, r, ", ,, , ", c, r, a, w, l, i, n, g, ", ,, , ", h, e, a, d, l, e, s, s, ", ,, , ", h, e, a, d, l, e, s, s, -, c, h, r, o, m, e, ", ,, , ", j, a, v, a, s, c, r, i, p, t, ", ,, , ", n, o, d, e, j, s, ", ,, , ", n, p, m, ", ,, , ", p, l, a, y, w, r, i, g, h, t, ", ,, , ", p, u, p, p, e, t, e, e, r, ", ,, , ", s, c, r, a, p, e, r, ", ,, , ", s, c, r, a, p, i, n, g, ", ,, , ", t, y, p, e, s, c, r, i, p, t, ", ,, , ", w, e, b, -, c, r, a, w, l, e, r, ", ,, , ", w, e, b, -, c, r, a, w, l, i, n, g, ", ,, , ", w, e, b, -, s, c, r, a, p, i, n, g, ", ]


📚 Documentation


This article is automatically generated by AI based on GitHub project information and README content analysis

Titan AI Explorehttps://www.titanaiexplore.com/projects/crawlee-66670819en-USTechnology

Project Information

Created on 8/26/2016
Updated on 9/10/2025