symfony/dom-crawler
Symfony DomCrawler makes it easy to parse and navigate HTML/XML documents. It provides a fluent API to filter elements, extract text/attributes, follow links and forms, and integrates well with HttpClient and BrowserKit for web scraping and testing.
HttpTests or Dusk.DOMDocument/SimpleHTMLDomParser usage with DomCrawler for consistency and maintainability.ScraperService facade to encapsulate common scraping patterns (e.g., pagination, rate limiting, or data transformation) using DomCrawler as the core parser.HttpClient, BrowserKit, or Panther) and want consistency across tools.*"Symfony DomCrawler is a powerful, lightweight tool that lets us extract structured data from HTML and XML sources—whether it’s competitor pricing, internal reports, or user-uploaded content—without building and maintaining custom parsers from scratch. This reduces development time by 30–50% and ensures our tools are resilient to malformed data, which is common in legacy systems or third-party sources.
Key Benefits:
Example Use Cases:
By adopting DomCrawler, we can focus on delivering business value rather than debugging fragile parsing logic."*
*"Symfony DomCrawler is a mature, dependency-light component that provides a fluent, jQuery-like API for navigating and extracting data from HTML and XML documents. It’s already integrated into Laravel’s ecosystem and is widely used in Symfony projects for testing and scraping. Here’s why it’s the right choice for our needs:
Why Use DomCrawler?
$crawler->filter('.product')->each(fn ($node) => $node->text()) instead of verbose DOM traversal or regex.$crawler->filter('table tr td:nth-child(2)')).$crawler->selectButton('Submit')->form(), which is useful for legacy system integrations or automated testing.Integration with Laravel:
Http::get()), service container, and testing tools (e.g., HttpTests or Dusk).ScraperService to encapsulate common patterns (e.g., rate limiting, pagination, or data transformation).use Symfony\Component\DomCrawler\Crawler;
$html = Http::get('https://example.com/products')->body();
$crawler = new Crawler($html);
$products = $crawler->filter('.product')->each(function (Crawler $node) {
return [
'name' => $node->filter('.name')->text(),
'price' => $node->filter('.price')->text(),
];
});
Trade-offs:
Proposal:
How can I help you explore Laravel packages today?