symfony/dom-crawler
Symfony DomCrawler makes it easy to parse and navigate HTML/XML documents. It provides a fluent API to filter elements, extract text/attributes, follow links and forms, and integrates well with HttpClient and BrowserKit for web scraping and testing.
Architecture Fit
Illuminate\Support\ServiceProvider or app() helper).$crawler->filter('selector')->each()) mirrors Laravel’s Eloquent query builder, reducing cognitive load for developers. This aligns with Laravel’s emphasis on expressive, chainable syntax.HttpFoundation for HTTP message handling (e.g., parsing responses) and BrowserKit for testing, enabling cross-component workflows (e.g., scraping + testing in the same pipeline).DOMDocument for malformed markup. Laravel 10+ users can leverage this natively; older versions may require polyfills or manual upgrades.$crawler->selectButton()->form()) aligns with Laravel’s HttpTests and Dusk for end-to-end testing of HTML-rendered content.Integration Feasibility
AppServiceProvider with a facade or bound to the container for dependency injection.
$this->app->singleton(DomCrawler::class, fn() => new DomCrawler());
Http facade or Guzzle client to parse responses:
use Symfony\Component\DomCrawler\Crawler;
$html = Http::get('https://example.com')->body();
$crawler = new Crawler($html);
ShouldQueue job for background processing (e.g., large-scale scraping) using Laravel’s queue system.php artisan scrape:competitor).Technical Risk
symfony/dom-crawler:7.x for PHP 8.1–8.3).file_get_contents() or Guzzle streams.DomCrawler::createFromFile() for disk-based parsing.HTMLPurifier or Tidy.div > ul > li:nth-child(3) > a) can degrade performance. Profile with microtime(true) and optimize queries.Key Questions
7.x with potential parsing gaps?Crawler objects, arrays, or custom DTOs? (Example: $crawler->filter('.product')->extract(['title', 'price']).)Ensure or custom middleware?)ScraperService) or use the component directly?robots.txt) to respect? (Use Guzzle middleware for headers/delays.)DomCrawler into Laravel’s test suite (e.g., assertSelectorTextContains() helpers)?Stack Fit
Http facade, Guzzle clients, or Illuminate\Http\Request for parsing incoming HTML.Illuminate\Bus\Queueable jobs for async processing.scrape:prices).phpunit/html-entity-parser with DomCrawler for assertions in Feature tests.HttpFoundation for request/response handling.BrowserKit for testing (e.g., Client::request() + Crawler).Panther for hybrid JS/static scraping (if needed).Guzzle for advanced HTTP features (e.g., retries, proxies).Spatie/ArrayToXml for XML output transformation.Laravel Excel to export scraped data to CSV/XLSX.Migration Path
Phase 1: Proof of Concept (1–2 weeks)
DOMDocument or regex) with DomCrawler.$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
to:
$crawler = new Crawler($html);
$prices = $crawler->filter('.price')->each(fn(Crawler $node) => $node->text());
Phase 2: Standardize Usage (2–3 weeks)
ScraperService facade/class to encapsulate DomCrawler logic:
class ScraperService {
public function scrapeProducts(string $html): array {
return (new Crawler($html))
->filter('.product')
->each(fn(Crawler $node) => [
'title' => $node->filter('.title')->text(),
'price' => $node->filter('.price')->text(),
]);
}
}
AppServiceProvider:
$this->app->singleton(ScraperService::class, fn() => new ScraperService());
Phase 3: Scale & Optimize (Ongoing)
class ScrapeJob implements ShouldQueue {
public function handle() {
$html = Http::get('https://competitor.com')->body();
$data = app(ScraperService::class)->scrapeProducts($html);
// Store in DB/queue next steps...
}
}
Compatibility
symfony/dom-crawler:8.x for PHP 8.4+ features (native HTML5 parser).symfony/dom-crawler:7.x (PHP 8.1–8.3) with potential parsing trade-offs.dom, libxml, and mbstring (enabled by default in Laravel).composer.json:
"require": {
"symfony/dom-crawler": "^8.0 || ^7.4"
}
Sequencing
symfony/dom-crawler to composer.json and run composer update.How can I help you explore Laravel packages today?