Weave Code
Code Weaver
Helps Laravel developers discover, compare, and choose open-source packages. See popularity, security, maintainers, and scores at a glance to make better decisions.
Feedback
Share your thoughts, report bugs, or suggest improvements.
Subject
Message

Crawler Laravel Package

spatie/crawler

PHP web crawler that discovers links concurrently via Guzzle, with optional JavaScript rendering powered by Chrome/Puppeteer. Configure depth, internal-only rules, and callbacks for per-page handling, plus a fake mode to test crawl logic without real HTTP requests.

View on GitHub
Deep Wiki
Context7

Product Decisions This Supports

  • Website Audits & SEO Optimization: Automate discovery of all internal links, broken links, or duplicate content for SEO audits or content migration projects.
  • Data Extraction & Scraping: Build scalable crawlers for extracting structured data from internal websites (e.g., product catalogs, documentation, or legacy systems).
  • Content Migration: Inventory all pages on a legacy system before migrating to a new platform (e.g., WordPress → Laravel).
  • Build vs. Buy: Avoid reinventing a crawler from scratch; leverage this package for internal tools where crawling is a core feature.
  • Roadmap for Analytics Tools: Integrate into a larger analytics platform to track page reach, link equity, or crawlability metrics.
  • Testing & Validation: Use the fake() method to unit-test crawlers without hitting production APIs or external services.

When to Consider This Package

  • Internal-Only Crawling: Ideal for crawling your own websites (e.g., staging, production, or legacy systems). Not designed for large-scale public web scraping (e.g., competitor sites).
  • JavaScript-Rendered Content: Required if your site relies on client-side rendering (SPAs, React, Vue, etc.).
  • Controlled Depth/Limit: Need to crawl only up to a certain depth (e.g., 3 levels deep) or limit the number of URLs.
  • Testing-First Development: Prefer writing tests with mocked responses before deploying real crawls.
  • Laravel/PHP Ecosystem: Already using Laravel or PHP; avoids context-switching to Python/Node.js tools.
  • Avoid Overkill: Don’t need advanced features like proxy rotation, CAPTCHA solving, or distributed crawling (consider Scrapy or Apify instead).

Look Elsewhere If:

  • You need to crawl external, large-scale public websites (rate limits, anti-bot measures).
  • You require distributed crawling (e.g., across multiple servers).
  • You need headless browser automation beyond crawling (e.g., form submission, screenshots).
  • Your budget allows for managed scraping services (e.g., ScraperAPI, Bright Data).

How to Pitch It (Stakeholders)

For Executives: "This package lets us build a self-service crawler for internal websites—think of it as a ‘Google for our own content.’ We can automate audits for broken links, duplicate pages, or SEO issues, saving QA teams weeks of manual work. It’s lightweight, runs in PHP (our stack), and can even handle JavaScript-heavy sites. For example, we could use it to inventory all pages before a major migration, or power a real-time content monitor. The cost? Zero—it’s open-source and maintained by a trusted vendor."

For Engineers: *"Spatie’s crawler is a batteries-included solution for PHP/Laravel projects needing to crawl internal sites. Key advantages:

  • Concurrent requests (via Guzzle) for speed.
  • JavaScript support (via Puppeteer/Chrome) for SPAs.
  • Fake mode for testing without hitting real APIs.
  • Observers & callbacks for extensibility (e.g., logging, analytics).
  • Depth/limit controls to avoid crawling too wide/deep.

It’s not a replacement for Scrapy or Scrapinghub, but it’s perfect for:

  • SEO audits (findUrls() + status code checks).
  • Data extraction (e.g., scraping product pages from a legacy system).
  • Pre-migration inventorying (e.g., ‘What pages exist on our old WordPress site?’).

Tradeoffs:

  • No built-in proxy/rotator (add spatie/ray or a custom layer).
  • Not for aggressive public scraping (use Apify or Scrapy instead).
  • Requires PHP/Laravel (but the team already uses it).

Example Use Case:

*‘We’re migrating our documentation from Confluence to Laravel. Before we lift-and-shift, we can use this crawler to:

  1. Inventory all pages (including nested ones).
  2. Check for broken links.
  3. Extract metadata (titles, last updated) for our new CMS. All in a script we can run locally or in CI.’"
Weaver

How can I help you explore Laravel packages today?

Conversation history is not saved when not logged in.
Prompt
Add packages to context
No packages found.
jayeshmepani/jpl-moshier-ephemeris-php
elnasnato/laraliveui
labrodev/rest-sdk
sampaui/sampaui
babelqueue/php-sdk
facebook/capi-param-builder-php
babelqueue/symfony
hamzi/corewatch
minionfactory/raw-hydrator
hexters/coinpayment
rjcodes/rjcms
act-training/laravel-permissions-manager
alimarchal/laravel-chart-of-accounts
babenkoivan/elastic-scout-driver
mkwebdesign/filament-watchdog-v5
renatomarinho/laravel-page-speed
zedmagdy/filament-business-hours
renatovdemoura/blade-elements-ui
devgeek/beacon-admin
benjamin-rqt/data-watcher-bundle