Weave Code
Code Weaver
Helps Laravel developers discover, compare, and choose open-source packages. See popularity, security, maintainers, and scores at a glance to make better decisions.
Feedback
Share your thoughts, report bugs, or suggest improvements.
Subject
Message

Laracrawler Laravel Package

anassrojea/laracrawler

View on GitHub
Deep Wiki
Context7

๐Ÿš€ Laracrawler Sitemap Generator

A powerful Laravel sitemap generator with crawling, validation, multilingual support, priority auto-scoring, indexability audit, and more.
Optimized for Google SEO best practices.


โœจ Features

  • Recursive crawling with depth control
  • URL normalization (HTTPS, trailing slashes, lowercase, strip queries/anchors)
  • Exclusion rules for URLs and assets (regex, extensions, substrings)
  • Multilingual alternates (hreflang) with validation
  • Image sitemap enhancements
    • Extract <img> + <picture> sources
    • Add <image:title> and <image:caption> from alt/title
  • Video sitemap enhancements
    • Extract <video>, <source>, and <iframe> (YouTube, Vimeo)
    • Add <video:title> and <video:description> (defaults configurable)
  • Priority auto-scoring
    • Based on crawl depth, internal link popularity, freshness
    • Supports per-page priority_boost
  • Flexible lastmod strategies
    • now โ†’ always current time
    • file โ†’ file modification time
    • db โ†’ fetch from database column
    • callback โ†’ resolve dynamically via Closure/service
  • Indexability audit
    • Detects noindex in headers (X-Robots-Tag) or meta tags
    • Excludes such pages and logs them into sitemap-errors.xml
  • Link validation
    • Detects broken or soft-404 links
    • Excludes them and logs into sitemap-errors.xml
  • Split & index
    • Auto-splits large sitemaps (50k URLs or 50MB limit)
    • Generates sitemap-index.xml
  • Queue support for async crawling in Laravel jobs
  • Auto-ping search engines (Google, Bing, Yandex, Baidu)
  • Configurable HTTP client (timeouts, SSL verify, User-Agent)

โš™๏ธ Installation

composer require anassrojea/laracrawler

Publish config:

php artisan vendor:publish --tag=laracrawler-config

๐Ÿ› ๏ธ Usage

Generate sitemap:

php artisan laracrawler:generate

Options:

  • --summary โ†’ Print summary of exclusions.
  • --debug โ†’ Extra debug output.
  • --validate โ†’ Force validation of links even if disabled in config.

๐Ÿ“‚ Configuration (config/sitemap.php)

๐Ÿ”— Base settings

'base_url'     => env('APP_URL', 'https://example.com'),
'xdefault'     => 'https://example.com', // <xhtml:link hreflang="x-default">
'validate_links' => false,
'max_errors'   => 5000,

๐Ÿšซ Exclusions

'exclude_urls' => [
    '/admin',
    '#\?page=\d+#', // regex pagination
    '#/search#',
    '#\.(css|js)$#',
],
'exclude_assets' => [
    '#\.(css|js|json|xml|txt|md)$#',
    '#\.(zip|rar|tar|gz|7z)$#',
],

๐ŸŒ Normalization

'normalize' => [
    'strip_queries'       => true,
    'strip_anchors'       => true,
    'strip_trailing_slash'=> true,
    'canonicalize'        => true,   // lowercase
    'enforce_https'       => true,
    'enforce_www'         => null,   // true = add, false = strip
    'force_trailing_slash'=> false,
],

๐ŸŒ Multilingual

'default_lang' => 'en',
'lang_mode'    => 'path', // "path", "subdomain", or "query"
'alternates'   => [
    'en' => 'https://example.com/en',
    'ar' => 'https://example.com/ar',
    'tr' => 'https://example.com/tr',
],

๐Ÿ–ผ Include Rules

'include' => [
    'urls'      => true,
    'images'    => true,
    'videos'    => true,
    'languages' => true,

    'rules' => [
        '#/blog#' => [
            'images' => true,
            'videos' => false,
        ],
    ],
],

๐Ÿ–ผ Image Settings

'image_whitelist' => [
    // '/storage/uploads/services/',
],
'image_defaults' => [
    'title'       => 'Image Title',
    'description' => 'Image Description',
],

๐ŸŽฅ Video Settings

'video_whitelist' => [
    // '/storage/uploads/services/',
],
'video_defaults' => [
    'title'       => 'Video Title',
    'description' => 'Video Description',
],

๐Ÿ“Š Rules (SEO Overrides)

Rules let you override defaults per URL pattern:

'rules' => [
    '/$' => [ // homepage
        'changefreq' => 'daily',
        'priority'   => '1.0',
        'lastmod'    => 'now',
    ],

    '/blog' => [
        'changefreq'    => 'daily',
        'priority'      => '0.9',
        'priority_boost'=> 0.3, // ๐Ÿš€ boost blogs slightly
        'lastmod'       => [
            'strategy' => 'db',
            'table'    => 'posts',
            'lookup'   => 'slug',
            'column'   => 'updated_at',
        ],
    ],

    '#^/(en|ar|tr)?/service#' => [
        'changefreq'    => 'weekly',
        'priority'      => null, // auto-score
        'priority_boost'=> 0.3,  // ๐Ÿš€ boost services
        'lastmod'       => [
            'strategy' => 'db',
            'table'    => 'services',
            'lookup'   => 'slug',
            'column'   => 'updated_at',
        ],
    ],
],
  • priority โ†’ fixed value (0.1โ€“1.0) or null for auto-score.
  • priority_boost โ†’ bump score (applied only if auto-score).
  • lastmod strategies:
    • "now" โ†’ always current timestamp
    • "file" โ†’ filesystem mtime
    • "db" โ†’ fetch updated_at from DB
    • "callback" โ†’ custom closure or service

๐Ÿ“ˆ Priority Scoring

'priority_scoring' => [
    'enabled'   => true,
    'weights'   => [
        'depth'     => 0.4,
        'links'     => 0.4,
        'freshness' => 0.2,
    ],
    'min' => 0.1,
    'max' => 1.0,
],

๐Ÿ“ก Pinging Search Engines

'ping' => true,
'ping_targets' => [
    'Google' => 'http://www.google.com/ping?sitemap=',
    'Bing'   => 'http://www.bing.com/ping?sitemap=',
    'Yandex' => 'https://webmaster.yandex.com/ping?sitemap=',
    'Baidu'  => 'http://ping.baidu.com/ping?sitemap=',
],

๐Ÿงต Queue Support

'queue' => [
    'enabled'    => false,
    'connection' => 'default',
    'batch_size' => 100,
],

๐ŸŒ HTTP Client Settings

'http' => [
    'validate_links' => [
        'timeout' => 10,
        'connect_timeout' => 5,
        'verify' => false,
        'http_errors' => false,
        'headers' => [
            'User-Agent' => 'LaracrawlerBot/1.0 (https://example.com)',
        ],
    ],
    'validate_alternates' => [
        'timeout' => 5,
        'connect_timeout' => 1,
        'verify' => false,
        'http_errors' => false,
        'headers' => [
            'User-Agent' => 'LaracrawlerBot/1.0 (https://example.com)',
        ],
    ],
],

๐Ÿ•ต Indexability Audit

'indexability_audit' => true,

Flags URLs with:

  • X-Robots-Tag: noindex
  • <meta name="robots" content="noindex">

๐Ÿ›  Artisan Command

php artisan laracrawler:generate     --max-depth=2     --output=public     --split     --single     --no-ping     --ping-only     --sitemap=sitemap.xml     --debug     --summary     --fresh     --queue     --validate     --audit-indexability

Flags

  • --max-depth โ†’ set crawl depth
  • --output โ†’ custom output dir
  • --split โ†’ force multiple sitemap files
  • --single โ†’ force one sitemap.xml
  • --no-ping โ†’ skip pinging search engines
  • --ping-only โ†’ only ping, no crawl
  • --sitemap โ†’ custom sitemap name (with ping-only)
  • --debug โ†’ show exclusions in detail
  • --summary โ†’ summary of exclusions
  • --fresh โ†’ clear cache and recrawl
  • --queue โ†’ run crawl in background via jobs
  • --validate โ†’ enable link validation
  • --audit-indexability โ†’ enable noindex audit

๐Ÿ“ฆ Outputs

  • sitemap.xml or sitemap-index.xml
  • sitemap-errors.xml (broken links, invalid alternates, noindex pages)

โœ… SEO Benefits

  • Clean, canonicalized URLs only
  • Correct handling of alternates (hreflang + x-default)
  • Image metadata (title, caption)
  • Video metadata (title, description)
  • Excludes noindex & broken pages automatically
  • Auto-prioritization for deep/fresh/popular content

๐Ÿ”ง Best Practices

  • Always run with --validate in production
  • Configure ping_targets so Google/Bing auto-refresh faster
  • Use priority_boost in rules for critical pages
  • Whitelist only important image/video directories to keep sitemap lean
  • Enable indexability_audit to avoid indexing blocked content

๐Ÿ“œ License

This package is open-sourced software licensed under the MIT license.

Weaver

How can I help you explore Laravel packages today?

Conversation history is not saved when not logged in.
Prompt
Add packages to context
No packages found.
jayeshmepani/jpl-moshier-ephemeris-php
elnasnato/laraliveui
labrodev/rest-sdk
sampaui/sampaui
babelqueue/php-sdk
facebook/capi-param-builder-php
babelqueue/symfony
hamzi/corewatch
minionfactory/raw-hydrator
hexters/coinpayment
rjcodes/rjcms
act-training/laravel-permissions-manager
alimarchal/laravel-chart-of-accounts
babenkoivan/elastic-scout-driver
mkwebdesign/filament-watchdog-v5
renatomarinho/laravel-page-speed
zedmagdy/filament-business-hours
renatovdemoura/blade-elements-ui
devgeek/beacon-admin
benjamin-rqt/data-watcher-bundle