oscarotero/html-parser
Fast, lightweight HTML parser for PHP by Oscar Otero. Parse HTML into a DOM-like structure, query and traverse nodes, extract text/attributes, and handle real-world, imperfect markup. Useful for scraping, content cleanup, and transformations.
Install via Composer: composer require oscarotero/html-parser. Start by parsing an HTML string:
use oscarotero\HtmlParser\Parser;
$html = '<div class="card"><h2>Title</h2><p>Content</p></div>';
$parser = new Parser($html);
Get the first matching element using a CSS-like selector and extract data:
$card = $parser->query('.card')->first();
$title = $card->query('h2')->text(); // 'Title'
$content = $card->query('p')->text(); // 'Content'
First use case: extract all links from a snippet of HTML:
$links = $parser->query('a')->map(fn($el) => [
'href' => $el->attr('href'),
'text' => $el->text()
]);
Key entry points: Parser, Element (returned by queries), and Collection. Always check docs for method signatures ($parser->query(), $el->find(), $el->siblings(), etc.).
Parser, then use chained queries:
$items = $parser->query('.product')->map(fn($p) => [
'name' => $p->query('.title')->text(),
'price' => $p->query('.price')->attr('data-value'),
]);
$html = $this->renderView('email/template.html.twig', $data);
$parser = new Parser($html);
$this->assertEquals(3, $parser->query('img')->count());
$this->assertNotNull($parser->query('button[aria-label="Subscribe"]'));
$parser->query('script, style, iframe')->remove(); // destructive!
$sanitized = $parser->save(); // returns cleaned HTML
parent(), next(), prev()) for dense DOMs:
$table = $parser->query('table.data');
$headers = $table->query('thead th')->map('text');
$rows = $table->query('tbody tr')->map(fn($tr) => $tr->query('td')->map('text')->toArray());
new Parser($html) is independent—don’t reuse instances across unrelated documents.<script> with no closing tag in real-world data) may misparse. If critical, prefer DOMDocument + libxml_use_internal_errors(true).query() is aliased as find() and select() for familiarity; use whichever reads best. text() returns concatenated text across children (including nested tags); use html() for inner HTML or outerHtml() for full element + content.Element is a final class, but you can wrap results in your own abstractions (e.g., class CardElement extends Element { ... }) via factory patterns.map() over thousands of elements may be slow—pre-filter with more specific selectors where possible.How can I help you explore Laravel packages today?