Weave Code
Code Weaver
Helps Laravel developers discover, compare, and choose open-source packages. See popularity, security, maintainers, and scores at a glance to make better decisions.
Feedback
Share your thoughts, report bugs, or suggest improvements.
Subject
Message

Graby Laravel Package

j0k3r/graby

Graby extracts clean article content from web pages. Built on php-readability and FiveFilters site_config patterns, it’s a composer-friendly, decoupled, fully tested fork of Full-Text RSS. Requires PHP 8.2+, Tidy and cURL.

View on GitHub
Deep Wiki
Context7

Product Decisions This Supports

  • Content Aggregation Platforms: Enables building scalable systems to scrape, clean, and extract article content from diverse websites (e.g., news aggregators, RSS feed generators, or AI training datasets).
  • SEO/Content Analysis Tools: Powers tools that analyze article structure, readability, or metadata (e.g., title, authors, dates) for SEO audits or competitive analysis.
  • Build vs. Buy: Justifies buying this package over custom development for teams lacking expertise in web scraping/parsing, reducing time-to-market for content extraction features.
  • Multi-Platform Content Curation: Supports roadmap items like:
    • Adding "Read Later" functionality to apps (e.g., Pocket, Instapaper).
    • Building a "Save Article" feature for browser extensions.
    • Enabling dynamic content display in news apps (e.g., filtering by language, date, or domain).
  • Compliance & Data Integrity: Addresses needs for:
    • Legal compliance: Configurable allowed_urls/blocked_urls to avoid scraping restricted sites.
    • Data quality: Built-in XSS filtering (xss_filter) and error handling to ensure clean, usable output.
  • Extensibility: Aligns with roadmaps requiring customization (e.g., integrating with existing logging systems via GrabyHandler or tweaking extraction rules via site_config).

When to Consider This Package

  • Adopt if:

    • Your product requires reliable article content extraction from unstructured HTML (e.g., news sites, blogs, or forums).
    • You need structured metadata (titles, authors, dates, images) alongside raw content.
    • Your team lacks resources to maintain a custom scraping solution or wants to avoid Full-Text RSS’s clunky integration.
    • You prioritize maintainability (tested, documented, and actively forked) over cutting-edge features.
    • Your use case fits PHP/Laravel ecosystems (e.g., backend services, cron jobs, or API endpoints).
  • Look elsewhere if:

    • You need real-time scraping (Graby is optimized for batch processing; consider headless browsers like Puppeteer for dynamic content).
    • Your target sites rely heavily on JavaScript rendering (Graby uses static HTML parsing).
    • You require highly custom extraction logic (e.g., per-site templates) and prefer a no-code tool like ParseHub or Octoparse.
    • Your stack is non-PHP (e.g., Python, Node.js). Alternatives: readability-lxml (Python) or cheerio (Node.js).
    • You need large-scale distributed scraping (consider Scrapy or Scrapy Cloud).
    • Legal/compliance risks are high (Graby doesn’t handle CAPTCHAs or rate limiting; add proxies/rotating user agents manually).

How to Pitch It (Stakeholders)

For Executives:

*"Graby is a battle-tested, MIT-licensed PHP package that solves a critical pain point for our [content aggregation/SEO analysis/read-later] product: extracting clean, structured article content from the web at scale. Instead of building a custom scraper (which would require months of dev effort and ongoing maintenance), we can leverage this open-source, well-documented tool to:

  • Accelerate feature delivery: Add ‘Save Article’ or ‘Read Later’ in weeks, not months.
  • Reduce costs: Avoid hiring specialized scraping engineers or licensing proprietary tools.
  • Ensure reliability: Graby handles edge cases (broken links, ads, multi-page articles) and provides structured metadata (titles, authors, dates) out-of-the-box.
  • Future-proof: It’s actively maintained (last release: March 2026) and integrates seamlessly with our Laravel stack.

Risk: Minimal—we can start with a pilot (e.g., extracting 10K articles/month) and scale. Competitors like [Product X] use similar tools, so we’re not at a disadvantage."*


For Engineering:

*"Graby is a drop-in PHP library that replaces manual HTML parsing or fragile regex-based extraction. Here’s why it’s a win:

  • No reinventing the wheel: Built on FiveFilters’ Full-Text RSS (industry standard) but decoupled for Laravel (HTTPlug support, Composer-friendly).
  • Configurable: Tweak extraction rules via site_config (e.g., handle WordPress/Blogger sites differently) or override defaults (timeouts, allowed domains).
  • Robust error handling: Returns structured errors (e.g., 404s, blocked URLs) instead of crashing.
  • Performance: Optimized for batch processing (e.g., cron jobs to pre-fetch articles for our API).
  • Extensible: Hook into logging (Monolog support), customize output, or pre-process HTML before extraction.

Trade-offs:

  • Not for JS-heavy sites: If a site relies on client-side rendering, we’ll need to pair it with a headless browser (e.g., Puppeteer via Symfony Panther).
  • PHP-only: If we later adopt Python/Node, we’d need a separate solution.

Proposal:

  1. Spike: Test Graby on 50 target sites (BBC, NYT, Medium) to validate extraction quality.
  2. Integrate: Wrap it in a Laravel service class to handle retries, rate limiting, and caching.
  3. Monitor: Use the built-in logging to track failures and refine site_config rules.

Alternatives considered:

  • Custom solution: Too risky (no tests, maintenance burden).
  • Commercial APIs: Expensive (e.g., $10K/year for high-volume scraping). Graby gives us 80% of the value at 20% of the cost."*
Weaver

How can I help you explore Laravel packages today?

Conversation history is not saved when not logged in.
Prompt
Add packages to context
No packages found.
daikazu/eloquent-salesforce-objects
unseen-codes/chat
romalytar/yammi-jobs-monitoring-laravel
kisame76/filament-db-table-state
nqxcode/laravel-lucene-search
dpfx/laravel-livewire-wizards
workos/workos-php-laravel
sofa/laravel-global-scope
nawasara/auth-primitives
adhocrat-io/arkhe-main
make-dev/orca-harpoon
itsemon245/lamet
baks-dev/dashboard
amoifr/pickle-panther-bundle
make-dev/orca
dmstr/symfony-system-resources-bundle
dmstr/symfony-job-queue-bundle
dmstr/openapi-json-schema-bundle
dmstr/keycloak-security-bundle
dmstr/doctrine-audit-log-bundle