Weave Code
Code Weaver
Helps Laravel developers discover, compare, and choose open-source packages. See popularity, security, maintainers, and scores at a glance to make better decisions.
Feedback
Share your thoughts, report bugs, or suggest improvements.
Subject
Message

Crawler Laravel Package

spatie/crawler

Fast, concurrent web crawler for PHP. Crawl sites, collect internal URLs with depth limits, and hook into crawl events. Can execute JavaScript via Chrome/Puppeteer for rendered pages. Includes fakes for testing crawl logic without real HTTP requests.

View on GitHub
Deep Wiki
Context7

Product Decisions This Supports

  • Web Scraping & Data Collection: Enables building scalable, PHP-based web crawlers for extracting structured data from websites (e.g., competitor pricing, product catalogs, or news aggregation).
  • SEO & Site Audits: Facilitates crawling internal/external links to identify broken links, duplicate content, or crawlability issues (e.g., for SEO tools or site health dashboards).
  • Content Syndication: Powers automated content discovery for platforms like newsletters, aggregators, or social media curation tools.
  • Build vs. Buy: Avoids reinventing crawling logic (e.g., no need to build custom queues, concurrency, or JavaScript rendering from scratch).
  • Testing & Mocking: Accelerates development by allowing fake HTTP responses for unit/integration tests (e.g., simulating API failures or dynamic content).
  • Roadmap Priorities:
    • Phase 1: MVP for internal link extraction (e.g., foundUrls()).
    • Phase 2: Add JavaScript rendering for dynamic sites (e.g., SPAs).
    • Phase 3: Integrate with analytics (e.g., track crawl progress in dashboards).

When to Consider This Package

  • Adopt if:

    • Your use case requires PHP-based crawling (e.g., Laravel apps).
    • You need concurrency (Guzzle promises) or JavaScript rendering (Puppeteer).
    • Your team lacks expertise in low-level HTTP/JS scraping (e.g., Python Scrapy or Node Puppeteer).
    • You want built-in testing (fake responses) to avoid flaky CI pipelines.
    • Your crawl scope is moderate (not distributed/cloud-scale; see alternatives like Scrapy for big data).
  • Look elsewhere if:

    • You need distributed crawling (e.g., Scrapy + Redis or Scrapy Cloud).
    • Your target sites block scrapers (consider proxies/rotating user agents).
    • You require headless browser automation beyond crawling (e.g., form submission; use Puppeteer directly).
    • Your stack is non-PHP (e.g., Python/Ruby/Node; use native libraries).
    • You need real-time streaming (e.g., Kafka integration; this package buffers results).

How to Pitch It (Stakeholders)

For Executives:

*"This package lets us build a scalable web crawler in PHP—no need for external services or custom engineering. Key benefits:

  • Cost-effective: Open-source (MIT license), no vendor lock-in.
  • Fast iteration: Test crawls locally with fake responses before production.
  • JavaScript support: Crawl dynamic sites (e.g., React/Angular) without Python/Node.
  • Use cases: Competitor monitoring, SEO audits, or content aggregation—all owned in-house. Example: Launch a pricing tracker in 2 weeks vs. 2 months with a custom solution."*

For Engineering:

*"Spatie’s Crawler gives us:

  • Concurrent requests (Guzzle promises) for speed.
  • Observers for structured crawl logic (e.g., log failures, parse HTML).
  • Progress tracking (e.g., urlsCrawled, FinishReason) for monitoring.
  • Zero HTTP overhead in tests (fake responses). Trade-offs: Limited to PHP; for distributed crawling, we’d need to extend it (e.g., add Redis queues).* Proposal: Start with foundUrls() for internal link audits, then add JavaScript rendering for dynamic sites."*
Weaver

How can I help you explore Laravel packages today?

Conversation history is not saved when not logged in.
Prompt
Add packages to context
No packages found.
davejamesmiller/laravel-breadcrumbs
artisanry/parsedown
christhompsontldr/phpsdk
enqueue/dsn
bunny/bunny
enqueue/test
enqueue/null
enqueue/amqp-tools
milesj/emojibase
bower-asset/punycode
bower-asset/inputmask
bower-asset/jquery
bower-asset/yii2-pjax
laravel/nova
spatie/laravel-mailcoach
spatie/laravel-superseeder
laravel/liferaft
nst/json-test-suite
danielmiessler/sec-lists
jackalope/jackalope-transport