j0k3r/graby
Graby extracts clean article content from web pages. Built on php-readability and FiveFilters site_config patterns, it’s a composer-friendly, decoupled, fully tested fork of Full-Text RSS. Requires PHP 8.2+, Tidy and cURL.
GuzzleHttp).site_config), enabling fine-grained control over extraction logic for known domains (e.g., WordPress, Blogger).Illuminate\Cache) to reduce redundant fetches.allowed_urls, blocked_urls) via bindings.php-http/guzzle7-adapter could clash with Laravel’s Guzzle version. Solution: Isolate via Composer’s replace or use php-http/curl-client.robots.txt, copyright)?parallel:batch or spatie/async.)spatie/array-to-xml or spatie/pdf-to-text.laravel-monitor or sentry.php-http/guzzle7-adapter.$app->singleton(Graby::class, function ($app) {
return new Graby([
'allowed_urls' => ['example.com', 'trusted-site.org'],
'debug' => env('GRABY_DEBUG', false),
]);
});
- **Queue Jobs**: Wrap extraction in a **job** (e.g., `ExtractContentJob`) for async processing.
Article model) or Laravel Scout for search.GET /articles/{id}/content).symfony/dom-crawler).file_get_contents hacks).Accept-Language, User-Agent):
public function handle(Request $request, Closure $next) {
$request->headers->set('User-Agent', config('graby.http_client.ua_browser'));
return $next($request);
}
composer require j0k3r/graby php-http/guzzle7-adapter
config/graby.php (merge defaults with app-specific settings).app/Services/ContentExtractor.php):
public function extract(string $url): Article {
$result = app(Graby::class)->fetchContent($url);
return Article::create([
'title' => $result->getTitle(),
'content' => $result->getHtml(),
// ...
]);
}
ExtractContentJob::dispatch($url)->onQueue('scraping');
>5% error rate).site_config files in Laravel’s storage/app/site_configs/ or a database table for dynamic updates..env) to toggle features like debug or xss_filter.single or daily handlers).'graby' => [
'driver' => 'single',
'path' => storage_path('logs/graby.log'),
'level' => 'debug',
],
log_level: debug to inspect HTML at each step.rewrite_relative_urls: true is set.How can I help you explore Laravel packages today?