artryazanov/laravel-wikipedia-games-db
Laravel package that builds a normalized video games database by scraping Wikipedia. Queue-driven and resumable, traverses categories, parses infoboxes via MediaWiki API + HTML, stores many-to-many relations with wikipedia_* tables, configurable via .env.
jobs, horizon).games, platforms, genres tables).simplehtmldom (lightweight but may need updates for evolving Wikipedia HTML).platforms table).games table initially).redis) to avoid redundant calls.| Risk Area | Mitigation Strategy |
|---|---|
| Wikipedia API Changes | Monitor MediaWiki API docs for breaking changes; wrap API calls in a service layer. |
| Data Quality | Validate scraped data against known game databases (e.g., IGDB) or manual spot-checks. |
| Queue Bottlenecks | Use batch processing (e.g., 100 games/job) and monitor horizon for failures. |
| Schema Conflicts | Start with a separate database or schema prefix (e.g., wikipedia_games_*) for testing. |
| License Compliance | Verify Unlicense terms for redistribution; ensure no conflicts with other licensed data. |
ScrapeGameJob).Cache::remember.Queue::fake() to test job failures.str() helper deprecations if using older Laravel.guzzlehttp/guzzle (for API calls) – version conflicts possible; pin in composer.json.simplehtmldom/simple_html_dom – may need updates for modern HTML parsing.games table via Eloquent).utf8mb4 collation for international game names.game_title, release_year, and platform_id for query performance.// Example: Delay between requests
sleep(rand(1, 3)); // Random delay to avoid patterns
composer require artryazanov/laravel-wikipedia-games-db.php artisan vendor:publish --tag=wikipedia-games-migrations.WIKIPEDIA_API_URL in .env (customize if using a mirror like https://en.wikipedia.org/w/api.php).QUEUE_CONNECTION=redis).ScrapeCategoryJob::dispatch('Video_game_consoles');
horizon to track job failures.archived_at in DB).CHANGELOG.md for the team.failed_jobs table to retry jobs.QUEUE_CONNECTION=database with multiple workers).queue_worker memory limit if parsing large HTML pages.release_year, genre).| Failure Scenario | Impact | Mitigation |
|---|---|---|
| Wikipedia API Downtime | Scraping halts | Fallback to cached data or manual entry. |
| Queue Worker Crash | Unprocessed games | Use retry_after in job exceptions. |
| Corrupt Scraped Data | Bad records in DB | Validate data before insertion. |
| Rate-Limited API | Slow/blocked requests | Implement exponential backoff. |
| Schema Migration Issues | Deployment |
How can I help you explore Laravel packages today?