Weave Code
Code Weaver
Helps Laravel developers discover, compare, and choose open-source packages. See popularity, security, maintainers, and scores at a glance to make better decisions.
Feedback
Share your thoughts, report bugs, or suggest improvements.
Subject
Message

Crawler Laravel Package

spatie/crawler

Fast, concurrent web crawler for PHP. Crawl sites, collect internal URLs with depth limits, and hook into crawl events. Can execute JavaScript via Chrome/Puppeteer for rendered pages. Includes fakes for testing crawl logic without real HTTP requests.

View on GitHub
Deep Wiki
Context7
9.2.1

What's Changed

Full Changelog: https://github.com/spatie/crawler/compare/9.0.1...9.2.1

9.0.1

Fixed

  • Fixed allow_redirects default: changed from false to ['track_redirects' => true] so redirects are followed and the redirect history header is populated correctly
  • Non-parseable responses (e.g. binary files filtered by allowedMimeTypes) now notify observers via crawled() with an empty body instead of being silently skipped
  • URLs containing control characters are now detected and reported as malformed
  • Dot segments (/../, /./) in extracted URLs are now normalized per RFC 3986
  • Custom client options passed to Crawler::create() now merge with defaults instead of replacing them (pass null to remove a default)
  • CrawlRequestFailed now wraps non-RequestException errors so observers always receive a RequestException
  • Removed unused classes: Url, ResponseWithCachedBody, InvalidUrl

Added

  • stream() method to opt-in to streaming HTTP responses for reduced memory usage
  • matchWww() method to treat www.example.com and example.com as equivalent when using internalOnly()
  • includeSubdomains() now works as a flag on internalOnly() and composes with matchWww()
  • CrawlResponse::redirectHistory() and CrawlResponse::wasRedirected() for inspecting redirect chains
  • CrawlObserver::crawlFailed() now receives a ?TransferStatistics parameter for detecting timeouts
  • addObserver() now accepts variadic arguments: addObserver($obs1, $obs2)
  • CrawlRequestFailed now preserves the original request from ConnectException (retaining custom headers like X-Started-At)
9.0.0

Major rewrite. See UPGRADING.md for a full list of breaking changes.

Changed

  • Replace UriInterface with plain string URLs throughout the API
  • Replace ResponseInterface with CrawlResponse in observer callbacks
  • CrawlProfile is now an interface instead of an abstract class
  • CrawlObserverCollection no longer implements ArrayAccess or Iterator
  • Default scheme changed from http to https
  • JavaScript rendering is now driver-based (Browsershot moved to suggest)
  • UrlParser interface redesigned to return ExtractedUrl[] instead of adding to queue directly
  • CrawlQueue::has() now accepts string instead of CrawlUrl|UriInterface
  • start() now returns a FinishReason enum
  • URL is now required in Crawler::create()

Added

  • CrawlResponse object with status(), body(), dom(), header(), transferStats(), and more
  • CrawlProgress tracking with urlsCrawled, urlsFailed, urlsFound, urlsPending
  • FinishReason enum: Completed, CrawlLimitReached, TimeLimitReached, Interrupted
  • Closure callbacks: onCrawled(), onFailed(), onFinished(), onWillCrawl()
  • foundUrls() to collect all URLs as CrawledUrl objects
  • fake() for testing without HTTP requests
  • Scope helpers: internalOnly(), includeSubdomains(), shouldCrawl()
  • Shorter method names: depth(), concurrency(), delay(), limit(), userAgent()
  • Throttling: FixedDelayThrottle and AdaptiveThrottle
  • Resource type extraction: alsoExtract(), extractAll(), ResourceType enum
  • URL normalization in ArrayCrawlQueue
  • Graceful shutdown via SIGINT/SIGTERM
  • alwaysCrawl() and neverCrawl() pattern overrides
  • retry() for automatic retries on connection errors and 5xx responses
  • TransferStatistics with typed timing accessors
  • CloudflareRenderer for JavaScript rendering
  • JavaScriptRenderer interface for custom renderers
  • Request configuration: basicAuth(), token(), withoutVerifying(), proxy(), cookies(), queryParameters(), middleware()

Removed

  • CrawlUrl::create() static factory (use new CrawlUrl(...) instead)
  • Spatie\Crawler\Url class
  • ResponseWithCachedBody (replaced by CrawlResponse)
  • nicmart/tree dependency
  • spatie/browsershot as a required dependency (moved to suggest)
  • setBrowsershot() and getBrowsershot() methods
  • startCrawling() method (use start())
  • setUrlParserClass() (use parseSitemaps() or pass a UrlParser directly)
8.5.0

Add Laravel 13 support

8.4.7

What's Changed

New Contributors

Full Changelog: https://github.com/spatie/crawler/compare/8.4.6...8.4.7

8.4.5

What's Changed

Full Changelog: https://github.com/spatie/crawler/compare/8.4.4...8.4.5

8.4.4

What's Changed

New Contributors

Full Changelog: https://github.com/spatie/crawler/compare/8.4.3...8.4.4

8.4.3

What's Changed

New Contributors

Full Changelog: https://github.com/spatie/crawler/compare/8.4.2...8.4.3

8.4.2

What's Changed

New Contributors

Full Changelog: https://github.com/spatie/crawler/compare/8.4.1...8.4.2

8.4.0

What's Changed

New Contributors

Full Changelog: https://github.com/spatie/crawler/compare/8.3.1...8.4.0

8.3.1

What's Changed

New Contributors

Full Changelog: https://github.com/spatie/crawler/compare/8.3.0...8.3.1

8.2.1

What's Changed

New Contributors

Full Changelog: https://github.com/spatie/crawler/compare/8.2.0...8.2.1

8.2.0

What's Changed

New Contributors

Full Changelog: https://github.com/spatie/crawler/compare/8.1.0...8.2.0

8.1.0

What's Changed

New Contributors

Full Changelog: https://github.com/spatie/crawler/compare/8.0.4...8.1.0

8.0.4
  • allow Browsershot v4
8.0.3

What's Changed

New Contributors

Full Changelog: https://github.com/spatie/crawler/compare/8.0.2...8.0.3

8.0.2

What's Changed

New Contributors

Full Changelog: https://github.com/spatie/crawler/compare/8.0.1...8.0.2

8.0.1

What's Changed

New Contributors

Full Changelog: https://github.com/spatie/crawler/compare/8.0.0...8.0.1

8.0.0
  • add linkText to crawl observer methods
  • upgrade dependencies
7.1.3
  • support Laravel 10
7.1.2

What's Changed

New Contributors

Full Changelog: https://github.com/spatie/crawler/compare/7.1.1...7.1.2

7.1.1
7.1.0
  • allow Laravel 9 collections
7.0.5
7.0.2
  • allow psr7 v2
7.0.1
  • change response type hint (#371)
7.0.0
  • require PHP 8+
  • drop support for PHP 7.x
  • convert syntax to PHP 8
  • no API changes have been made
6.0.1
  • bugfix: infinite loops when a CrawlProfile prevents crawling (#358)
6.0.0
  • add setCurrentCrawlLimit and setTotalCrawlLimit
  • internal refactors
5.0.2
  • add support for PHP 8.0
5.0.1
  • tweak variable naming in ArrayCrawlQueue (#326)
5.0.0
  • improve chucked reading of response
  • move observer / profiles / queues to separate namespaces
  • typehint all the things
  • use laravel/collections instead of tightenco package
  • remove support for anything below PHP 7.4
  • remove all deprecated functions and classes
4.7.5
  • treat connection exceptions as request exceptions
4.7.4
  • fix: method and property name error (#311)
4.7.3
  • add crawler option to allow crawl links with rel="nofollow" (#310)
4.7.2
  • only crawl links that are completely parsed
4.7.1
  • fix curl streaming responses (#295)
4.7.0
  • add setParseableMimeTypes() (#293)
4.6.9
  • fix LinkAdder not receiving the updated DOM (#292)
4.6.8
  • allow tightenco/collect 7 (#282)
4.6.7
  • respect maximum response size when checking Robots Meta tags (#281)
4.6.6
  • allow Guzzle 7
4.6.5
  • allow symfony 5 components
4.6.4
  • allow tightenco/collect 6.0 and up (#261)
4.6.3
  • fix crash when CrawlRequestFailed receives an exception other than RequestException
4.6.2
  • case-insensitive user agent bugfix (#249)
4.6.1
  • fix bugs in hasAlreadyBeenProcessed
4.6.0

THIS VERSION CONTAINS A CRITICAL BUG, DO NOT USE

  • added ArrayCrawlQueue; this is now the default queue
  • deprecated CollectionCrawlQueue
4.5.0
  • Make user agent configurable (#246)
4.4.3
  • delayBetweenRequests now uses int instead of float everywhere
4.4.2
  • remove incorrect docblock
4.4.1
  • handle relative paths after redirects correctly
4.4.0
  • add getUrls and getPendingUrls
4.3.2
  • Respect maximumDepth in combination with robots (#181)
4.3.1
  • Properly handle noindex,follow urls.
4.3.0
  • added capability of crawling links with rel= next or prev
4.2.0
  • add setDelayBetweenRequests
4.1.7
  • fix an issue where the node in the depthtree could be null
4.1.6
  • improve performance by only building the depth three when needed
  • handlers will get html after JavaScript has been processed
4.1.5
  • refactor to improve extendability
4.1.4
  • always add links to pool if robots shouldn't be respected
4.1.3
  • refactor of internals
4.1.2
  • make it possible to override $defaultClientOptions
4.1.1
  • Bump minimum required version of spatie/robots-txt to 1.0.1.
4.1.0
  • Respect robots.txt
4.0.5
  • improved extensibility by removing php native type hinting of url, queue and crawler pool Closures
4.0.4
  • do not follow links that have attribute rel set to nofollow
4.0.3
  • Support both Illuminate's and Tighten's Collection.
4.0.2
  • fix bugs when installing into a Laravel app
4.0.0
  • the CrawlObserver and CrawlProfile are upgraded from interfaces to abstract classes
  • don't crawl tel: links
3.2.1
  • fix endless loop
3.2.0
  • add setCrawlObservers, addCrawlObserver
3.1.3
  • fix setMaximumResponseSize (someday we'll get this right)
3.1.2

CONTAINS BUGS, DO NOT USE THIS VERSION

  • fix setMaximumResponseSize
3.1.1

CONTAINS BUGS, DO NOT USE THIS VERSION

  • fix setMaximumResponseSize
3.1.0

CONTAINS BUGS, DO NOT USE THIS VERSION

  • add setMaximumResponseSize
3.0.1
  • fix for exception being thrown when encountering a malformatted url
3.0.0
  • use \Psr\Http\Message\UriInterface for all urls
  • use Puppeteer
  • drop support from PHP 7.0
2.7.1
  • allow symfony 4 crawler
2.7.0
  • added the ability to change the crawl queue
2.6.2
  • more performance improvements
2.6.1
  • performance improvements
2.6.0
  • add CrawlSubdomains profile
2.5.0
  • add crawl count limit
2.4.0
  • add depth limit
2.3.0
  • add JavaScript execution
2.2.1
  • fix deps for PHP 7.2
2.2.0
  • add EmptyCrawlObserver
2.1.2
  • refactor to make use of Symfony Crawler's link function
2.1.1
  • fix bugs around relative urls
2.1.0
  • add CrawlInternalUrls
2.0.7
  • make sure the passed client options are being used
2.0.6
  • second attempt to fix detection of redirects
2.0.5
  • fix detection of redirects
2.0.4
  • fix the default timeout of 5 seconds
2.0.3
  • set a default timeout of 5 seconds
2.0.2
  • fix for non responding hosts
2.0.1
  • fix for the accidental crawling of mailto-links
2.0.0
  • improve performance by concurrent crawling
  • make it possible to determine on which url a url was found
1.2.0
  • Add support for DomCrawler 3.x
1.1.1
  • Fix for normalizing relative links when using non-80 ports
1.1.0
  • Add support for custom ports
1.0.2
  • Lower required php version to 5.5
1.0.1
  • Make url's case sensitive
1.0.0
  • First release
1.3.1
  • Ignore tel: links when crawling
1.3.0
  • Added path, segment and segments functions to Url
1.2.3
  • Updated the required version of Guzzle to a secure version
1.2.2
  • Fixed a bug where the crawler would not take query strings into account
1.2.1
  • Fixed a bug where the crawler tries to follow JavaScript links
What's
Weaver

How can I help you explore Laravel packages today?

Conversation history is not saved when not logged in.
Prompt
Add packages to context
No packages found.
davejamesmiller/laravel-breadcrumbs
artisanry/parsedown
christhompsontldr/phpsdk
enqueue/dsn
bunny/bunny
enqueue/test
enqueue/null
enqueue/amqp-tools
milesj/emojibase
bower-asset/punycode
bower-asset/inputmask
bower-asset/jquery
bower-asset/yii2-pjax
laravel/nova
spatie/laravel-mailcoach
spatie/laravel-superseeder
laravel/liferaft
nst/json-test-suite
danielmiessler/sec-lists
jackalope/jackalope-transport