jaybizzle/crawler-detect
Detect bots/crawlers/spiders in PHP by matching User-Agent and HTTP_FROM headers. CrawlerDetect recognizes thousands of known crawlers, lets you check the current request or a provided UA string, and returns the matched bot name.
Installation:
composer require jaybizzle/crawler-detect
For Laravel, consider the dedicated package: jaybizzle/laravel-crawler-detect for seamless integration.
Basic Usage:
use Jaybizzle\CrawlerDetect\CrawlerDetect;
$detector = new CrawlerDetect();
if ($detector->isCrawler()) {
// Handle crawler logic (e.g., serve lightweight content, block access)
}
First Use Case:
// app/Http/Middleware/DetectCrawler.php
public function handle($request, Closure $next) {
$detector = new CrawlerDetect();
if ($detector->isCrawler()) {
return response('Access denied to crawlers.', 403);
}
return $next($request);
}
Register it in app/Http/Kernel.php:
protected $middleware = [
\App\Http\Middleware\DetectCrawler::class,
];
Where to Look First:
tests/crawlers.txt to verify detection of known bots.Middleware Pipeline:
CrawlerDetect in Laravel middleware to enforce bot-specific rules:
public function handle($request, Closure $next) {
$detector = new CrawlerDetect();
$matches = $detector->getMatches();
if ($detector->isCrawler()) {
if (in_array('Googlebot', $matches)) {
return $next($request); // Allow Googlebot
}
return response('Blocked.', 403); // Block others
}
return $next($request);
}
Dynamic Content Serving:
public function serveContent($request) {
$detector = new CrawlerDetect();
if ($detector->isCrawler()) {
return response()->view('bot-optimized', [], 200)->header('Content-Type', 'text/html');
}
return $next($request);
}
Analytics Filtering:
public function logRequest($request) {
$detector = new CrawlerDetect();
if (!$detector->isCrawler()) {
// Proceed with logging
Log::info('Request from: ' . $request->ip());
}
}
Rate Limiting:
throttle middleware:
Route::middleware(['throttle:100,1'])->group(function () {
Route::get('/api/data', [Controller::class, 'fetchData']);
});
Combine with CrawlerDetect to apply stricter limits:
$detector = new CrawlerDetect();
$limit = $detector->isCrawler() ? '10,1' : '100,1';
Laravel Service Provider:
Bind CrawlerDetect as a singleton for dependency injection:
// app/Providers/AppServiceProvider.php
public function register() {
$this->app->singleton(CrawlerDetect::class, function () {
return new CrawlerDetect();
});
}
Use in controllers:
public function __construct(private CrawlerDetect $detector) {}
Blade Directives: Create custom Blade directives for bot detection:
// app/Providers/BladeServiceProvider.php
Blade::if('crawler', function () {
return app(CrawlerDetect::class)->isCrawler();
});
Usage in views:
@if(!crawler())
<script defer>...</script>
@endif
Event Listeners: Trigger events when crawlers are detected (e.g., log or notify admins):
// app/Listeners/LogCrawler.php
public function handle($event) {
Log::warning('Crawler detected: ' . $event->botName);
}
Dispatch in middleware:
event(new CrawlerDetected($detector->getMatches()));
API Gateways:
Use CrawlerDetect in Laravel Sanctum or Passport to filter bot traffic before it reaches microservices:
public function authenticate($request) {
$detector = new CrawlerDetect();
if ($detector->isCrawler()) {
return response()->json(['error' => 'Crawlers not allowed'], 403);
}
return $next($request);
}
False Positives:
YaBrowser) may be misclassified. Verify with getMatches():
$matches = $detector->getMatches();
if (in_array('YaBrowser', $matches)) {
// Handle false positive (e.g., allow access)
}
Header Overrides:
User-Agent headers. Cross-check with HTTP_X_FORWARDED_FOR or HTTP_VIA:
$detector = new CrawlerDetect([
'user_agent' => $request->userAgent(),
'http_from' => $request->server('HTTP_X_FORWARDED_FOR'),
]);
Performance Overhead:
$cacheKey = 'crawler_detect_' . $request->ip();
$isCrawler = Cache::remember($cacheKey, 60, function () use ($detector) {
return $detector->isCrawler();
});
Missing Crawlers:
Fixtures/Crawlers.php.User-Agent.Laravel Caching:
CrawlerDetect in middleware, ensure the instance isn’t recreated per request (bind as singleton).Inspect Matches:
Use getMatches() to debug why a crawler was detected or missed:
dd($detector->getMatches());
Example output:
['Googlebot', 'WebmasterTools']
Test User Agents: Manually test detection with known crawlers:
$detector->isCrawler('Mozilla/5.0 (compatible; AhrefsBot/7.0; +http://ahrefs.com/robot/)');
Log Undetected Crawlers:
Log User-Agent strings of undetected crawlers for community contributions:
if (!$detector->isCrawler() && str_contains($request->userAgent(), 'UnknownBot')) {
Log::warning('Undetected crawler: ' . $request->userAgent());
}
Whitelist SEO Crawlers:
Allow known SEO crawlers (e.g., Googlebot, Bingbot) while blocking others:
$allowedCrawlers = ['Googlebot', 'Bingbot', 'DuckDuckBot'];
$matches = $detector->getMatches();
if ($detector->isCrawler() && !array_intersect($allowedCrawlers, $matches)) {
return response('Blocked.', 403);
}
Combine with IP Blocking: Block repeat offenders by IP:
if ($detector->isCrawler() && $request->ip() === '123.45.67.89') {
Cache::put('blocked_ips_' . $request->ip(), true, 3600);
return response('Blocked.', 403);
}
Leverage HTTP_SEC_CH_UA:
Modern Chromium-based crawlers use HTTP_SEC_CH_UA. Update CrawlerDetect to check this
How can I help you explore Laravel packages today?