artryazanov/laravel-wikipedia-games-db
Laravel package that builds a normalized video games database by scraping Wikipedia. Queue-driven and resumable, traverses categories, parses infoboxes via MediaWiki API + HTML, stores many-to-many relations with wikipedia_* tables, configurable via .env.
A Laravel package to build a normalized database of video games by scraping Wikipedia. It uses a queue-driven architecture to traverse categories and parse game pages via the Wikipedia (MediaWiki) API and an HTML infobox parser.
By default, the package targets English Wikipedia and allows full configuration via environment variables.
wikipedia_games.env (endpoint, user agent, throttling, queues)If this package is included as a path repository in your monorepo (as in this project), ensure your root composer.json has a repository entry pointing to packages/artryazanov/laravel-wikipedia-games-db, then require it:
composer require artryazanov/laravel-wikipedia-games-db:dev-main
If installing from a VCS/Packagist in another project, require it the same way and ensure Composer discovers the service provider (auto-discovery enabled). If needed, register the provider manually in config/app.php:
'providers' => [
// ...
Artryazanov\WikipediaGamesDb\WikipediaGamesDbServiceProvider::class,
],
php artisan vendor:publish --provider="Artryazanov\\WikipediaGamesDb\\WikipediaGamesDbServiceProvider" --tag=config
php artisan vendor:publish --provider="Artryazanov\\WikipediaGamesDb\\WikipediaGamesDbServiceProvider" --tag=migrations
Then migrate:
php artisan migrate
All settings can be overridden via environment variables.
WIKIPEDIA_GAMES_DB_API_ENDPOINT (default: https://en.wikipedia.org/w/api.php)WIKIPEDIA_GAMES_DB_USER_AGENT (default example: LaravelWikipediaGamesDb/1.0 (+https://example.com; contact@example.com))WIKIPEDIA_GAMES_DB_ROOT_CATEGORY (default: Category:Video games)WIKIPEDIA_GAMES_DB_THROTTLE_MS (default: 1000)WIKIPEDIA_GAMES_DB_QUEUE_CONNECTION (default: null — uses Laravel default)WIKIPEDIA_GAMES_DB_QUEUE_NAME (default: default)Example snippet for your .env:
WIKIPEDIA_GAMES_DB_API_ENDPOINT=https://en.wikipedia.org/w/api.php
WIKIPEDIA_GAMES_DB_USER_AGENT="YourApp/1.0 (+https://your-site; you@example.com)"
WIKIPEDIA_GAMES_DB_ROOT_CATEGORY="Category:Video games"
WIKIPEDIA_GAMES_DB_THROTTLE_MS=1000
WIKIPEDIA_GAMES_DB_QUEUE_CONNECTION=
WIKIPEDIA_GAMES_DB_QUEUE_NAME=default
Please set a meaningful User-Agent per MediaWiki API etiquette.
This package ships migrations that create the following tables (with comments):
wikipedia_game_wikipages: central storage for Wikipedia page meta reused by multiple entities. Columns: title, wikipedia_url, description, wikitext, timestamps.wikipedia_games (core games) — now has wikipage_id pointing to wikipedia_game_wikipages; still stores clean_title, cover_image_url, release_date, release_year.wikipedia_game_genres — has wikipage_id.wikipedia_game_platforms — has wikipage_id and keeps platform-specific fields like cover_image_url, release_date, website_url.wikipedia_game_companies — has wikipage_id and keeps cover_image_url, founded, website_url.wikipedia_game_modes — has wikipage_id.wikipedia_game_series — has wikipage_id.wikipedia_game_engines — has wikipage_id and keeps cover_image_url, release_date, website_url.wikipedia_game_game_genre (pivot)wikipedia_game_game_platform (pivot)wikipedia_game_game_mode (pivot)wikipedia_game_game_series (pivot)wikipedia_game_game_engine (pivot)wikipedia_game_game_company (pivot, with role column: developer|publisher)The migrations check for existence prior to creation, making it safer for incremental adoption. A data migration backfills wikipage_id and moves title, wikipedia_url, description, wikitext into wikipedia_game_wikipages.
You can kick off discovery in multiple ways. The fastest, high-precision path is via template transclusions.
php artisan games:scan-all
php artisan games:discover-by-template
This enumerates all pages that include Template:Infobox video game (main namespace) and enqueues parsing jobs. To also include series/franchises:
php artisan games:discover-by-template --series
php artisan games:scrape-wikipedia --category="Category:Video games"
Or seed multiple high-value roots (platforms and genres):
php artisan games:scrape-wikipedia --seed-high-value
If --category is omitted, the command uses game-scraper.root_category from config (by default, English Category:Video games).
php artisan queue:work --queue="${WIKIPEDIA_GAMES_DB_QUEUE_NAME:-default}"
Tips:
WIKIPEDIA_GAMES_DB_THROTTLE_MS to respect API limits (start with 1000 ms).WIKIPEDIA_GAMES_DB_USER_AGENT.QUEUE_CONNECTION and, optionally, WIKIPEDIA_GAMES_DB_QUEUE_CONNECTION).You can schedule periodic updates (e.g., weekly) in app/Console/Kernel.php:
protected function schedule(\Illuminate\Console\Scheduling\Schedule $schedule): void
{
$schedule->command('games:scrape-wikipedia')->weekly()->sundays()->at('03:00');
}
WIKIPEDIA_GAMES_DB_THROTTLE_MS and verify your User-Agent.This package processes pages via queued jobs. The main entry point parses a game page and conditionally enqueues per-taxonomy jobs for additional details.
Wikipage, persists game-specific fields, and dispatches taxonomy jobs for linked items found in the infobox (developers, publishers, platforms, engines, genres, modes, series).Wikipage and persists company-specific fields (cover_image_url, founded, website_url).Wikipage and persists platform-specific fields (cover_image_url, release_date, website_url).Wikipage and persists engine-specific fields (cover_image_url, release_date, website_url).Wikipage and links the genre.Wikipage and links the mode.Wikipage and links the series.Conditional dispatch
wikipage.wikipedia_url is empty.Throttling and deduping
game-scraper.throttle_milliseconds to avoid exceeding API limits.ShouldBeUnique on specific jobs in your app fork.This repository includes a full test suite based on Orchestra Testbench with an in-memory SQLite database.
You can also run via Composer script: composer test.
If phpunit cannot be found, ensure Composer finished installing dependencies successfully.
The Unlicense. This is free and unencumbered software released into the public domain. See the LICENSE file or https://unlicense.org for details.
How can I help you explore Laravel packages today?