Weave Code
Code Weaver
Helps Laravel developers discover, compare, and choose open-source packages. See popularity, security, maintainers, and scores at a glance to make better decisions.
Feedback
Share your thoughts, report bugs, or suggest improvements.
Subject
Message

Crawler Detect Laravel Package

jaybizzle/crawler-detect

Detect bots/crawlers/spiders in PHP by matching User-Agent and HTTP_FROM headers. CrawlerDetect recognizes thousands of known crawlers, lets you check the current request or a provided UA string, and returns the matched bot name.

View on GitHub
Deep Wiki
Context7

Getting Started

Minimal Steps

  1. Installation:

    composer require jaybizzle/crawler-detect
    

    For Laravel, consider the dedicated package: jaybizzle/laravel-crawler-detect for seamless integration.

  2. Basic Usage:

    use Jaybizzle\CrawlerDetect\CrawlerDetect;
    
    $detector = new CrawlerDetect();
    if ($detector->isCrawler()) {
        // Handle crawler logic (e.g., serve lightweight content, block access)
    }
    
  3. First Use Case:

    • Middleware Integration: Create a middleware to block or throttle crawlers globally.
      // app/Http/Middleware/DetectCrawler.php
      public function handle($request, Closure $next) {
          $detector = new CrawlerDetect();
          if ($detector->isCrawler()) {
              return response('Access denied to crawlers.', 403);
          }
          return $next($request);
      }
      
      Register it in app/Http/Kernel.php:
      protected $middleware = [
          \App\Http\Middleware\DetectCrawler::class,
      ];
      
  4. Where to Look First:

    • Documentation: GitHub README for core functionality.
    • Laravel-Specific: Laravel-Crawler-Detect for framework integration.
    • Tests: tests/crawlers.txt to verify detection of known bots.

Implementation Patterns

Core Workflows

  1. Middleware Pipeline:

    • Use CrawlerDetect in Laravel middleware to enforce bot-specific rules:
      public function handle($request, Closure $next) {
          $detector = new CrawlerDetect();
          $matches = $detector->getMatches();
      
          if ($detector->isCrawler()) {
              if (in_array('Googlebot', $matches)) {
                  return $next($request); // Allow Googlebot
              }
              return response('Blocked.', 403); // Block others
          }
          return $next($request);
      }
      
  2. Dynamic Content Serving:

    • Serve lightweight content to bots (e.g., static HTML for SEO crawlers):
      public function serveContent($request) {
          $detector = new CrawlerDetect();
          if ($detector->isCrawler()) {
              return response()->view('bot-optimized', [], 200)->header('Content-Type', 'text/html');
          }
          return $next($request);
      }
      
  3. Analytics Filtering:

    • Exclude bot traffic from logs or analytics:
      public function logRequest($request) {
          $detector = new CrawlerDetect();
          if (!$detector->isCrawler()) {
              // Proceed with logging
              Log::info('Request from: ' . $request->ip());
          }
      }
      
  4. Rate Limiting:

    • Throttle crawlers using Laravel’s throttle middleware:
      Route::middleware(['throttle:100,1'])->group(function () {
          Route::get('/api/data', [Controller::class, 'fetchData']);
      });
      
      Combine with CrawlerDetect to apply stricter limits:
      $detector = new CrawlerDetect();
      $limit = $detector->isCrawler() ? '10,1' : '100,1';
      

Integration Tips

  1. Laravel Service Provider: Bind CrawlerDetect as a singleton for dependency injection:

    // app/Providers/AppServiceProvider.php
    public function register() {
        $this->app->singleton(CrawlerDetect::class, function () {
            return new CrawlerDetect();
        });
    }
    

    Use in controllers:

    public function __construct(private CrawlerDetect $detector) {}
    
  2. Blade Directives: Create custom Blade directives for bot detection:

    // app/Providers/BladeServiceProvider.php
    Blade::if('crawler', function () {
        return app(CrawlerDetect::class)->isCrawler();
    });
    

    Usage in views:

    @if(!crawler())
        <script defer>...</script>
    @endif
    
  3. Event Listeners: Trigger events when crawlers are detected (e.g., log or notify admins):

    // app/Listeners/LogCrawler.php
    public function handle($event) {
        Log::warning('Crawler detected: ' . $event->botName);
    }
    

    Dispatch in middleware:

    event(new CrawlerDetected($detector->getMatches()));
    
  4. API Gateways: Use CrawlerDetect in Laravel Sanctum or Passport to filter bot traffic before it reaches microservices:

    public function authenticate($request) {
        $detector = new CrawlerDetect();
        if ($detector->isCrawler()) {
            return response()->json(['error' => 'Crawlers not allowed'], 403);
        }
        return $next($request);
    }
    

Gotchas and Tips

Pitfalls

  1. False Positives:

    • Some legitimate user agents (e.g., YaBrowser) may be misclassified. Verify with getMatches():
      $matches = $detector->getMatches();
      if (in_array('YaBrowser', $matches)) {
          // Handle false positive (e.g., allow access)
      }
      
    • Fix: Update the package’s regex patterns or whitelist known false positives in middleware.
  2. Header Overrides:

    • Crawlers can spoof User-Agent headers. Cross-check with HTTP_X_FORWARDED_FOR or HTTP_VIA:
      $detector = new CrawlerDetect([
          'user_agent' => $request->userAgent(),
          'http_from'  => $request->server('HTTP_X_FORWARDED_FOR'),
      ]);
      
  3. Performance Overhead:

    • Regex matching can slow down requests if overused. Cache results for repeated checks:
      $cacheKey = 'crawler_detect_' . $request->ip();
      $isCrawler = Cache::remember($cacheKey, 60, function () use ($detector) {
          return $detector->isCrawler();
      });
      
  4. Missing Crawlers:

    • The package may not detect niche or new crawlers. Contribute by:
      • Adding regex patterns to Fixtures/Crawlers.php.
      • Submitting a PR or issue with the missing User-Agent.
  5. Laravel Caching:

    • If using CrawlerDetect in middleware, ensure the instance isn’t recreated per request (bind as singleton).

Debugging

  1. Inspect Matches: Use getMatches() to debug why a crawler was detected or missed:

    dd($detector->getMatches());
    

    Example output:

    ['Googlebot', 'WebmasterTools']
    
  2. Test User Agents: Manually test detection with known crawlers:

    $detector->isCrawler('Mozilla/5.0 (compatible; AhrefsBot/7.0; +http://ahrefs.com/robot/)');
    
  3. Log Undetected Crawlers: Log User-Agent strings of undetected crawlers for community contributions:

    if (!$detector->isCrawler() && str_contains($request->userAgent(), 'UnknownBot')) {
        Log::warning('Undetected crawler: ' . $request->userAgent());
    }
    

Tips

  1. Whitelist SEO Crawlers: Allow known SEO crawlers (e.g., Googlebot, Bingbot) while blocking others:

    $allowedCrawlers = ['Googlebot', 'Bingbot', 'DuckDuckBot'];
    $matches = $detector->getMatches();
    if ($detector->isCrawler() && !array_intersect($allowedCrawlers, $matches)) {
        return response('Blocked.', 403);
    }
    
  2. Combine with IP Blocking: Block repeat offenders by IP:

    if ($detector->isCrawler() && $request->ip() === '123.45.67.89') {
        Cache::put('blocked_ips_' . $request->ip(), true, 3600);
        return response('Blocked.', 403);
    }
    
  3. Leverage HTTP_SEC_CH_UA: Modern Chromium-based crawlers use HTTP_SEC_CH_UA. Update CrawlerDetect to check this

Weaver

How can I help you explore Laravel packages today?

Conversation history is not saved when not logged in.
Prompt
Add packages to context
No packages found.
davejamesmiller/laravel-breadcrumbs
artisanry/parsedown
christhompsontldr/phpsdk
enqueue/dsn
bunny/bunny
enqueue/test
enqueue/null
enqueue/amqp-tools
milesj/emojibase
bower-asset/punycode
bower-asset/inputmask
bower-asset/jquery
bower-asset/yii2-pjax
laravel/nova
spatie/laravel-mailcoach
spatie/laravel-superseeder
laravel/liferaft
nst/json-test-suite
danielmiessler/sec-lists
jackalope/jackalope-transport