Weave Code
Code Weaver
Helps Laravel developers discover, compare, and choose open-source packages. See popularity, security, maintainers, and scores at a glance to make better decisions.
Feedback
Share your thoughts, report bugs, or suggest improvements.
Subject
Message

Html Parser Laravel Package

oscarotero/html-parser

Fast, lightweight HTML parser for PHP by Oscar Otero. Parse HTML into a DOM-like structure, query and traverse nodes, extract text/attributes, and handle real-world, imperfect markup. Useful for scraping, content cleanup, and transformations.

View on GitHub
Deep Wiki
Context7

Getting Started

Install via Composer: composer require oscarotero/html-parser. Start by parsing an HTML string:

use oscarotero\HtmlParser\Parser;

$html = '<div class="card"><h2>Title</h2><p>Content</p></div>';
$parser = new Parser($html);

Get the first matching element using a CSS-like selector and extract data:

$card = $parser->query('.card')->first();
$title = $card->query('h2')->text(); // 'Title'
$content = $card->query('p')->text(); // 'Content'

First use case: extract all links from a snippet of HTML:

$links = $parser->query('a')->map(fn($el) => [
    'href' => $el->attr('href'),
    'text' => $el->text()
]);

Key entry points: Parser, Element (returned by queries), and Collection. Always check docs for method signatures ($parser->query(), $el->find(), $el->siblings(), etc.).

Implementation Patterns

  • Scraping fragments: Wrap external HTML (e.g., from cURL responses) in Parser, then use chained queries:
    $items = $parser->query('.product')->map(fn($p) => [
        'name' => $p->query('.title')->text(),
        'price' => $p->query('.price')->attr('data-value'),
    ]);
    
  • Validation/assertions in tests: Parse known fixture HTML and assert presence/structure:
    $html = $this->renderView('email/template.html.twig', $data);
    $parser = new Parser($html);
    $this->assertEquals(3, $parser->query('img')->count());
    $this->assertNotNull($parser->query('button[aria-label="Subscribe"]'));
    
  • HTML sanitization pipelines: Filter by removing unwanted elements before rendering:
    $parser->query('script, style, iframe')->remove(); // destructive!
    $sanitized = $parser->save(); // returns cleaned HTML
    
  • Incremental traversal: Use context-aware methods (parent(), next(), prev()) for dense DOMs:
    $table = $parser->query('table.data');
    $headers = $table->query('thead th')->map('text');
    $rows = $table->query('tbody tr')->map(fn($tr) => $tr->query('td')->map('text')->toArray());
    

Gotchas and Tips

  • Parser is stateless per instance: Each new Parser($html) is independent—don’t reuse instances across unrelated documents.
  • Tolerance ≠ perfection: While it handles malformed HTML (missing closing tags, unquoted attributes), deeply broken markup (e.g., unclosed <script> with no closing tag in real-world data) may misparse. If critical, prefer DOMDocument + libxml_use_internal_errors(true).
  • Method naming: query() is aliased as find() and select() for familiarity; use whichever reads best. text() returns concatenated text across children (including nested tags); use html() for inner HTML or outerHtml() for full element + content.
  • No built-in async or HTTP: Designed for small, focused tasks. Pair with Guzzle or Symfony HttpFoundation for full scraping workflows.
  • Extensibility: Element is a final class, but you can wrap results in your own abstractions (e.g., class CardElement extends Element { ... }) via factory patterns.
  • Memory note: Large documents (>100KB) won’t strain memory, but map() over thousands of elements may be slow—pre-filter with more specific selectors where possible.
Weaver

How can I help you explore Laravel packages today?

Conversation history is not saved when not logged in.
Prompt
Add packages to context
No packages found.
davejamesmiller/laravel-breadcrumbs
artisanry/parsedown
christhompsontldr/phpsdk
enqueue/dsn
bunny/bunny
enqueue/test
enqueue/null
enqueue/amqp-tools
milesj/emojibase
bower-asset/punycode
bower-asset/inputmask
bower-asset/jquery
bower-asset/yii2-pjax
laravel/nova
spatie/laravel-mailcoach
spatie/laravel-superseeder
laravel/liferaft
nst/json-test-suite
danielmiessler/sec-lists
jackalope/jackalope-transport