composer require anassrojea/laracrawler
php artisan vendor:publish --provider="AnassRojea\Laracrawler\LaracrawlerServiceProvider"
config/laracrawler.php) and update:
'crawler' => [
'base_url' => 'https://yourdomain.com',
'depth' => 3, // Default crawl depth
'excludes' => [
'regex' => ['/admin/', '/login/'],
'extensions' => ['pdf', 'docx'],
],
],
php artisan laracrawler:crawl
sitemap.xml in storage/app/sitemaps/.use AnassRojea\Laracrawler\Facades\Laracrawler;
// Trigger a crawl and generate sitemap
Laracrawler::crawl()->generate();
sitemap.xml with all crawlable URLs.Crawling & Indexing:
Laracrawler::crawl()->depth(5)->exclude('regex', '/private/')->run();
Laracrawler::crawl()->exclude('extensions', ['zip', 'exe'])->run();
Multilingual Support:
hreflang in config:
'hreflang' => [
'en' => 'https://en.yourdomain.com',
'fr' => 'https://fr.yourdomain.com',
],
Laracrawler::crawl()->withHreflang()->run();
Priority & Lastmod:
Route::get('/important', function () {
return Laracrawler::priority(0.9)->response(...);
});
lastmod:
Laracrawler::crawl()->lastmodStrategy('db', 'posts.updated_at')->run();
Image/Video Sitemaps:
<!-- Auto-parsed for image sitemap -->
<img src="/image.jpg" alt="Example" title="Example Title">
'video' => [
'default_title' => 'Video Content',
'default_description' => 'Default description',
],
Laracrawler::middleware(function ($request) {
if ($request->ip() === '123.123.123.123') {
return Laracrawler::exclude();
}
});
// Listen for crawl completion
Laracrawler::listen('crawl.completed', function ($urls) {
Log::info("Crawled URLs: " . count($urls));
});
// app/Console/Kernel.php
$schedule->command('laracrawler:crawl')->daily();
Crawl Depth Limits:
depth > 3) may hit memory limits. Use chunking:
Laracrawler::crawl()->chunk(100)->run();
php artisan laracrawler:stats
URL Normalization Conflicts:
base_url in config matches production. Misconfigurations cause duplicate entries.Laracrawler::normalizeUrl('https://example.com/Page?query=1');
// Output: "https://example.com/page"
Dynamic Content Exclusions:
i modifier:
Laracrawler::crawl()->exclude('regex', '/private/i')->run();
Database lastmod Strategies:
file strategy if DB fails:
Laracrawler::crawl()->lastmodStrategy('file')->run();
Laracrawler::crawl()->log()->run();
// Logs to `storage/logs/laracrawler.log`
php artisan laracrawler:validate
// Checks for broken links and noindex tags
Laracrawler::getExcludedUrls();
// Returns array of filtered URLs
Custom Crawlers:
AnassRojea\Laracrawler\Crawler:
class CustomCrawler extends Crawler {
public function customRule($url) {
if (str_contains($url, 'special')) {
return $this->exclude();
}
}
}
config/laracrawler.php:
'crawler' => [
'class' => \App\CustomCrawler::class,
],
Sitemap Transformers:
Laracrawler::transformer(function ($urls) {
return new CustomSitemapTransformer($urls);
});
Priority Algorithms:
Laracrawler::priorityStrategy(function ($url, $depth, $links) {
return $depth === 1 ? 1.0 : 0.5;
});
Asset Indexing:
Laracrawler::assetParser('pdf', function ($content) {
return ['title' => 'PDF Title', 'caption' => 'PDF Description'];
});
How can I help you explore Laravel packages today?