druidvav/crawler-detect-bundle
Installation Add the bundle via Composer in a Symfony 2/3 project:
composer require druidvav/crawler-detect-bundle
Enable the bundle in config/bundles.php:
return [
// ...
DruidVav\CrawlerDetectBundle\DruidVavCrawlerDetectBundle::class => ['all' => true],
];
Basic Configuration
Override default settings in config/packages/druidvav_crawler_detect.yaml:
druidvav_crawler_detect:
user_agent_header: 'HTTP_USER_AGENT' # Default, can be customized
whitelist: ['Googlebot', 'Bingbot'] # Customize allowed crawlers
First Use Case Detect crawlers in a controller or event subscriber:
use DruidVav\CrawlerDetectBundle\CrawlerDetect;
public function someAction(Request $request, CrawlerDetect $detect)
{
$isCrawler = $detect->isCrawler($request);
if ($isCrawler) {
return new Response('Crawler detected', 403);
}
// Normal logic for humans
}
Request Filtering Use in middleware to block or modify responses for crawlers:
public function handle(Request $request, Closure $next)
{
$detect = $this->get('druidvav_crawler_detect');
if ($detect->isCrawler($request)) {
return new Response('Forbidden', 403);
}
return $next($request);
}
Event-Driven Logic
Subscribe to kernel events (e.g., kernel.request) to dynamically adjust behavior:
public static function getSubscribedEvents()
{
return [
'kernel.request' => ['onKernelRequest', 10],
];
}
public function onKernelRequest(GetResponseForControllerResultEvent $event)
{
$request = $event->getRequest();
$detect = $this->container->get('druidvav_crawler_detect');
if ($detect->isCrawler($request)) {
$event->setResponse(new Response('Crawler content', 200));
}
}
Twig Integration Pass crawler detection to templates:
{% if app.request.attributes.get('_crawler_detected') %}
<div class="crawler-warning">This content is for crawlers.</div>
{% endif %}
In a Twig extension or controller:
$request->attributes->set('_crawler_detected', $detect->isCrawler($request));
API Rate Limiting Combine with Symfony’s rate limiter to throttle crawlers:
if ($detect->isCrawler($request)) {
$limiter = $this->get('rate_limiter');
$limiter->hit($request->getClientIp());
if ($limiter->isOverLimit()) {
return new Response('Too many requests', 429);
}
}
CacheInterface).CrawlerDetect by overriding its isCrawler() method or creating a decorator.$this->logger->info('Crawler detected', ['bot' => $detect->getBotName($request)]);
config/bundles.php (no autoconfiguration).Outdated Dependencies
User-Agent Spoofing
User-Agent headers. Combine with IP-based checks or behavioral analysis for robustness:
$isCrawler = $detect->isCrawler($request) && $this->isSuspiciousIp($request->getClientIp());
Performance Overhead
User-Agent strings can be slow for high-traffic sites. Cache results aggressively:
$cache = $this->container->get('cache.app');
$key = 'crawler_' . $request->getClientIp();
$isCrawler = $cache->get($key, function() use ($detect, $request) {
return $detect->isCrawler($request);
});
False Positives/Negatives
whitelist or blacklist in config:
druidvav_crawler_detect:
blacklist: ['BadBot/1.0'] # Explicitly block known scrapers
No Modern Symfony Support
ContainerAware traits or modern dependency injection. Tip: Use a service decorator:
// src/Service/CrawlerDetectDecorator.php
class CrawlerDetectDecorator implements CrawlerDetectInterface
{
private $decorated;
public function __construct(CrawlerDetectInterface $decorated)
{
$this->decorated = $decorated;
}
public function isCrawler(Request $request)
{
// Add custom logic here
return $this->decorated->isCrawler($request);
}
}
User-Agent to debug misclassifications:
$ua = $request->headers->get('User-Agent');
$this->logger->debug('User-Agent', ['ua' => $ua]);
user_agent_header matches your environment (e.g., HTTP_X_USER_AGENT in some proxies).Custom Bot Detection
Override the CrawlerDetect service to add regex-based detection:
# config/services.yaml
services:
DruidVav\CrawlerDetectBundle\CrawlerDetect:
arguments:
$botPatterns: ['%kernel.project_dir%/config/bots.yml']
Then define patterns in bots.yml:
custom_bots:
- pattern: '/Scraper\/\d+\.\d+/'
name: 'CustomScraper'
Event Dispatching
Trigger events when crawlers are detected (e.g., crawler.detected):
$event = new CrawlerDetectedEvent($request, $detect->getBotName($request));
$this->dispatcher->dispatch($event);
IP-Based Rules
Combine with Symfony\Component\HttpFoundation\RequestStack to add IP-based crawler logic:
if ($detect->isCrawler($request) && $this->isDatacenterIp($request->getClientIp())) {
// Handle datacenter crawlers
}
Performance Profiling Use Symfony’s profiler to measure detection time:
$profiler = $this->container->get('profiler');
$token = $profiler->openSection('crawler_detection');
$isCrawler = $detect->isCrawler($request);
$profiler->closeSection($token);
How can I help you explore Laravel packages today?