Weave Code
Code Weaver
Helps Laravel developers discover, compare, and choose open-source packages. See popularity, security, maintainers, and scores at a glance to make better decisions.
Feedback
Share your thoughts, report bugs, or suggest improvements.
Subject
Message

Crawler Laravel Package

contextualcode/crawler

View on GitHub
Deep Wiki
Context7

Getting Started

Minimal Setup

  1. Installation

    composer require contextualcode/crawler
    

    Ensure your project has a database (supports MySQL, PostgreSQL, SQLite) and configure it in .env.

  2. Publish Config & Migrations

    php artisan vendor:publish --provider="ContextualCode\Crawler\CrawlerServiceProvider"
    php artisan migrate
    

    This sets up the crawls, pages, and links tables.

  3. First Crawl Define a seed URL and start a crawl:

    use ContextualCode\Crawler\Crawler;
    
    $crawler = new Crawler();
    $crawler->addSeed('https://example.com')
            ->setDepth(2) // Limit crawl depth
            ->setConcurrency(5) // Parallel requests
            ->crawl();
    
  4. Access Results Fetch crawled pages via Eloquent:

    use ContextualCode\Crawler\Models\Page;
    
    $pages = Page::where('url', 'like', '%example.com%')->get();
    

Implementation Patterns

Workflow: Structured Crawling

  1. Define Rules Use filters to control what gets crawled:

    $crawler->addFilter(function ($url) {
        return strpos($url, 'admin') === false; // Skip admin pages
    });
    
  2. Extract Data Process pages with a callback:

    $crawler->onPage(function ($page) {
        $title = $page->title;
        $content = $page->content;
        // Store in DB or process further
    });
    
  3. Resumable Crawls Pause and resume crawls using:

    $crawl = $crawler->start(); // Returns a Crawl model
    $crawl->pause();
    $crawl->resume();
    

Integration Tips

  • Queue Crawls Dispatch crawls to Laravel queues for background processing:

    CrawlerJob::dispatch('https://example.com')->delay(now()->addMinutes(5));
    
  • Custom Storage Extend Page model to add custom fields:

    php artisan make:model PageExtension --extends="ContextualCode\Crawler\Models\Page"
    
  • Rate Limiting Configure delays between requests:

    $crawler->setDelay(1000); // 1-second delay (ms)
    

Gotchas and Tips

Pitfalls

  1. Database Locks High concurrency may cause deadlocks. Use transactions sparingly or reduce setConcurrency().

  2. Duplicate URLs The package deduplicates URLs, but ensure your filters don’t reintroduce duplicates.

  3. Dynamic Content JavaScript-rendered pages won’t be crawled by default. Use a headless browser (e.g., Puppeteer) via a custom PageFetcher.

Debugging

  • Log Crawl Status Enable logging in config/crawler.php:

    'log' => [
        'enabled' => true,
        'channel' => 'single',
    ],
    
  • Inspect Failed Requests Check the failed_requests table or enable:

    $crawler->onError(function ($url, $exception) {
        \Log::error("Crawl error on {$url}: " . $exception->getMessage());
    });
    

Extension Points

  1. Custom Fetchers Implement ContextualCode\Crawler\Contracts\PageFetcher for non-HTTP sources (e.g., APIs).

  2. Post-Processing Use onPage or onLink events to transform data before storage.

  3. Sitemap Generation Export crawled URLs to a sitemap:

    $urls = Page::pluck('url');
    // Use a package like spatie/sitemap to generate XML
    
Weaver

How can I help you explore Laravel packages today?

Conversation history is not saved when not logged in.
Prompt
Add packages to context
No packages found.
comsave/common
alecsammon/php-raml-parser
chrome-php/wrench
lendable/composer-license-checker
typhoon/reflection
mesilov/moneyphp-percentage
mike42/gfx-php
bookdown/themes
aura/view
aura/html
aura/cli
povils/phpmnd
nayjest/manipulator
omnipay/tests
psr-mock/http-message-implementation
psr-mock/http-factory-implementation
psr-mock/http-client-implementation
voku/email-check
voku/urlify
rtheunissen/guzzle-log-middleware