Weave Code
Code Weaver
Helps Laravel developers discover, compare, and choose open-source packages. See popularity, security, maintainers, and scores at a glance to make better decisions.
Feedback
Share your thoughts, report bugs, or suggest improvements.
Subject
Message

Crawler Detect Bundle Laravel Package

druidvav/crawler-detect-bundle

View on GitHub
Deep Wiki
Context7

Getting Started

Minimal Setup

  1. Installation Add the bundle via Composer in a Symfony 2/3 project:

    composer require druidvav/crawler-detect-bundle
    

    Enable the bundle in config/bundles.php:

    return [
        // ...
        DruidVav\CrawlerDetectBundle\DruidVavCrawlerDetectBundle::class => ['all' => true],
    ];
    
  2. Basic Configuration Override default settings in config/packages/druidvav_crawler_detect.yaml:

    druidvav_crawler_detect:
        user_agent_header: 'HTTP_USER_AGENT' # Default, can be customized
        whitelist: ['Googlebot', 'Bingbot']  # Customize allowed crawlers
    
  3. First Use Case Detect crawlers in a controller or event subscriber:

    use DruidVav\CrawlerDetectBundle\CrawlerDetect;
    
    public function someAction(Request $request, CrawlerDetect $detect)
    {
        $isCrawler = $detect->isCrawler($request);
        if ($isCrawler) {
            return new Response('Crawler detected', 403);
        }
        // Normal logic for humans
    }
    

Implementation Patterns

Common Workflows

  1. Request Filtering Use in middleware to block or modify responses for crawlers:

    public function handle(Request $request, Closure $next)
    {
        $detect = $this->get('druidvav_crawler_detect');
        if ($detect->isCrawler($request)) {
            return new Response('Forbidden', 403);
        }
        return $next($request);
    }
    
  2. Event-Driven Logic Subscribe to kernel events (e.g., kernel.request) to dynamically adjust behavior:

    public static function getSubscribedEvents()
    {
        return [
            'kernel.request' => ['onKernelRequest', 10],
        ];
    }
    
    public function onKernelRequest(GetResponseForControllerResultEvent $event)
    {
        $request = $event->getRequest();
        $detect = $this->container->get('druidvav_crawler_detect');
        if ($detect->isCrawler($request)) {
            $event->setResponse(new Response('Crawler content', 200));
        }
    }
    
  3. Twig Integration Pass crawler detection to templates:

    {% if app.request.attributes.get('_crawler_detected') %}
        <div class="crawler-warning">This content is for crawlers.</div>
    {% endif %}
    

    In a Twig extension or controller:

    $request->attributes->set('_crawler_detected', $detect->isCrawler($request));
    
  4. API Rate Limiting Combine with Symfony’s rate limiter to throttle crawlers:

    if ($detect->isCrawler($request)) {
        $limiter = $this->get('rate_limiter');
        $limiter->hit($request->getClientIp());
        if ($limiter->isOverLimit()) {
            return new Response('Too many requests', 429);
        }
    }
    

Integration Tips

  • Cache Results: Store detection results in cache for performance (e.g., CacheInterface).
  • Custom Rules: Extend CrawlerDetect by overriding its isCrawler() method or creating a decorator.
  • Logging: Log crawler activity for analytics:
    $this->logger->info('Crawler detected', ['bot' => $detect->getBotName($request)]);
    
  • Symfony Flex: If using Symfony 4+, manually register the bundle in config/bundles.php (no autoconfiguration).

Gotchas and Tips

Pitfalls

  1. Outdated Dependencies

    • The package targets Symfony 2/3 (last release in 2016). Test thoroughly in Symfony 4/5 if used.
    • Workaround: Use a compatibility layer or fork the package for modern Symfony.
  2. User-Agent Spoofing

    • Crawlers can spoof User-Agent headers. Combine with IP-based checks or behavioral analysis for robustness:
      $isCrawler = $detect->isCrawler($request) && $this->isSuspiciousIp($request->getClientIp());
      
  3. Performance Overhead

    • Parsing User-Agent strings can be slow for high-traffic sites. Cache results aggressively:
      $cache = $this->container->get('cache.app');
      $key = 'crawler_' . $request->getClientIp();
      $isCrawler = $cache->get($key, function() use ($detect, $request) {
          return $detect->isCrawler($request);
      });
      
  4. False Positives/Negatives

    • Default bot lists may be incomplete. Tip: Extend the whitelist or blacklist in config:
      druidvav_crawler_detect:
          blacklist: ['BadBot/1.0'] # Explicitly block known scrapers
      
  5. No Modern Symfony Support

    • The bundle lacks ContainerAware traits or modern dependency injection. Tip: Use a service decorator:
      // src/Service/CrawlerDetectDecorator.php
      class CrawlerDetectDecorator implements CrawlerDetectInterface
      {
          private $decorated;
      
          public function __construct(CrawlerDetectInterface $decorated)
          {
              $this->decorated = $decorated;
          }
      
          public function isCrawler(Request $request)
          {
              // Add custom logic here
              return $this->decorated->isCrawler($request);
          }
      }
      

Debugging

  • Verify User-Agent: Log the raw User-Agent to debug misclassifications:
    $ua = $request->headers->get('User-Agent');
    $this->logger->debug('User-Agent', ['ua' => $ua]);
    
  • Check Config: Ensure user_agent_header matches your environment (e.g., HTTP_X_USER_AGENT in some proxies).
  • Test with Known Bots: Use tools like User-Agent Switcher to simulate crawlers.

Extension Points

  1. Custom Bot Detection Override the CrawlerDetect service to add regex-based detection:

    # config/services.yaml
    services:
        DruidVav\CrawlerDetectBundle\CrawlerDetect:
            arguments:
                $botPatterns: ['%kernel.project_dir%/config/bots.yml']
    

    Then define patterns in bots.yml:

    custom_bots:
        - pattern: '/Scraper\/\d+\.\d+/'
          name: 'CustomScraper'
    
  2. Event Dispatching Trigger events when crawlers are detected (e.g., crawler.detected):

    $event = new CrawlerDetectedEvent($request, $detect->getBotName($request));
    $this->dispatcher->dispatch($event);
    
  3. IP-Based Rules Combine with Symfony\Component\HttpFoundation\RequestStack to add IP-based crawler logic:

    if ($detect->isCrawler($request) && $this->isDatacenterIp($request->getClientIp())) {
        // Handle datacenter crawlers
    }
    
  4. Performance Profiling Use Symfony’s profiler to measure detection time:

    $profiler = $this->container->get('profiler');
    $token = $profiler->openSection('crawler_detection');
    $isCrawler = $detect->isCrawler($request);
    $profiler->closeSection($token);
    
Weaver

How can I help you explore Laravel packages today?

Conversation history is not saved when not logged in.
Prompt
Add packages to context
No packages found.
jayeshmepani/jpl-moshier-ephemeris-php
elnasnato/laraliveui
labrodev/rest-sdk
sampaui/sampaui
babelqueue/php-sdk
facebook/capi-param-builder-php
babelqueue/symfony
hamzi/corewatch
minionfactory/raw-hydrator
hexters/coinpayment
rjcodes/rjcms
act-training/laravel-permissions-manager
alimarchal/laravel-chart-of-accounts
babenkoivan/elastic-scout-driver
mkwebdesign/filament-watchdog-v5
renatomarinho/laravel-page-speed
zedmagdy/filament-business-hours
renatovdemoura/blade-elements-ui
devgeek/beacon-admin
benjamin-rqt/data-watcher-bundle