Weave Code
Code Weaver
Helps Laravel developers discover, compare, and choose open-source packages. See popularity, security, maintainers, and scores at a glance to make better decisions.
Feedback
Share your thoughts, report bugs, or suggest improvements.
Subject
Message

Laravel Wikipedia Games Db Laravel Package

artryazanov/laravel-wikipedia-games-db

Laravel package that builds a normalized video games database by scraping Wikipedia. Queue-driven and resumable, traverses categories, parses infoboxes via MediaWiki API + HTML, stores many-to-many relations with wikipedia_* tables, configurable via .env.

View on GitHub
Deep Wiki
Context7

Getting Started

Minimal Setup

  1. Installation

    composer require artryazanov/laravel-wikipedia-games-db
    php artisan vendor:publish --provider="ArtRyazanov\WikipediaGamesDb\WikipediaGamesDbServiceProvider" --tag="config"
    
  2. Configure Edit config/wikipedia-games-db.php to set:

    • api_endpoint (default: https://en.wikipedia.org/w/api.php)
    • queue_connection (e.g., database, redis)
    • game_model (default: App\Models\Game)
  3. Run Migrations

    php artisan migrate
    
  4. First Use Case: Scrape a Category Dispatch a job to scrape a Wikipedia category (e.g., "Action video games"):

    use ArtRyazanov\WikipediaGamesDb\Jobs\ScrapeCategory;
    
    ScrapeCategory::dispatch('Action video games');
    
  5. Process Queue

    php artisan queue:work
    

Implementation Patterns

Core Workflow

  1. Category Traversal Use ScrapeCategory to recursively scrape all games in a Wikipedia category tree.

    ScrapeCategory::dispatch('Video game genres');
    
  2. Game Parsing The package auto-parses game pages via:

    • MediaWiki API (structured data)
    • HTML Infobox Parser (fallback for missing API data)
    // Manually trigger parsing for a specific game title
    use ArtRyazanov\WikipediaGamesDb\Jobs\ParseGame;
    ParseGame::dispatch('The Legend of Zelda');
    
  3. Data Normalization Extend App\Models\Game to map Wikipedia fields to your schema:

    // Example: Cast infobox fields to model attributes
    protected $casts = [
        'release_year' => 'integer',
        'developer' => 'array',
    ];
    
  4. Queue Management

    • Batch Processing: Use ScrapeCategory::dispatch('Category', ['batch_size' => 50]) to limit API calls.
    • Retry Logic: Failed jobs auto-retry (configurable in .env):
      QUEUE_WORKER_RETRIES=3
      
  5. Integration with Existing Data

    // Sync parsed games with your DB
    $game = Game::firstOrCreate(
        ['title' => 'Super Mario Bros.'],
        [
            'developer' => ['Nintendo'],
            'release_year' => 1985,
        ]
    );
    

Advanced Patterns

  1. Custom Field Mapping Override the parser’s default mappings in config/wikipedia-games-db.php:

    'field_mappings' => [
        'infobox_developer' => 'developers',
        'infobox_publisher' => 'publishers',
    ],
    
  2. API Rate Limiting Throttle requests via middleware (extend WikipediaGamesDbServiceProvider):

    $router->middleware('throttle:wikipedia-api', function ($request) {
        return $request->ip() !== '127.0.0.1';
    });
    
  3. Webhooks for New Games Listen for game.parsed events in EventServiceProvider:

    protected $listen = [
        'ArtRyazanov\WikipediaGamesDb\Events\GameParsed' => [
            'App\Listeners\NotifySlack',
        ],
    ];
    
  4. Hybrid Scraping Combine with manual data entry for high-value games:

    // Skip API parsing for a game (e.g., "Half-Life")
    Game::updateOrCreate(
        ['title' => 'Half-Life'],
        ['manually_verified' => true]
    );
    

Gotchas and Tips

Pitfalls

  1. API Quotas

    • Wikipedia’s API enforces 800 requests/hour/IP. Exceeding this causes 503 errors.
    • Fix: Use a proxy or distribute workload across multiple IPs.
  2. Circular Categories Categories like "Video game genres" may reference themselves, causing infinite loops.

    • Fix: Add a visited_categories table or use Laravel’s queue:failed to monitor stuck jobs.
  3. Infobox Inconsistency Not all games have infoboxes, leading to partial data.

    • Fix: Log missing fields and manually curate:
      // In a listener for GameParsed
      if (empty($game->developer)) {
          logger()->warning("Missing developer for {$game->title}");
      }
      
  4. Title Ambiguity Games with similar names (e.g., "Resident Evil" vs. "Resident Evil 2") may merge incorrectly.

    • Fix: Use title + release_year as a composite key.
  5. Queue Stalling Large categories (e.g., "List of video games") may stall the queue.

    • Fix: Process in chunks:
      ScrapeCategory::dispatch('List of video games', ['chunk_size' => 100]);
      

Debugging Tips

  1. Job Logging Enable Laravel’s queue logging:

    QUEUE_LOG=true
    

    Check storage/logs/laravel.log for failed jobs.

  2. API Debugging Inspect raw API responses by temporarily overriding the API client:

    // In a service provider
    $this->app->singleton(\ArtRyazanov\WikipediaGamesDb\Services\WikipediaApi::class, function () {
        return new \ArtRyazanov\WikipediaGamesDb\Services\WikipediaApi(
            new \ArtRyazanov\WikipediaGamesDb\Services\DebugApiClient
        );
    });
    
  3. Database Constraints Add indexes to speed up lookups:

    Schema::table('games', function (Blueprint $table) {
        $table->index('title');
        $table->index('release_year');
    });
    

Extension Points

  1. Custom Parsers Extend ArtRyazanov\WikipediaGamesDb\Parsers\GameParser to handle niche fields:

    namespace App\Parsers;
    
    use ArtRyazanov\WikipediaGamesDb\Parsers\GameParser as BaseParser;
    
    class CustomGameParser extends BaseParser {
        protected function parsePlatforms($html) {
            // Custom logic for parsing platform data
        }
    }
    

    Register in config/wikipedia-games-db.php:

    'parser' => \App\Parsers\CustomGameParser::class,
    
  2. Web Scraping Fallback For games missing API data, implement a fallback parser using Goutte:

    use Symfony\Component\DomCrawler\Crawler;
    
    $crawler = new Crawler($html);
    $platforms = $crawler->filter('.infobox-platform')->text();
    
  3. Data Validation Add Laravel Validation rules to sanitize scraped data:

    use Illuminate\Support\Facades\Validator;
    
    $validator = Validator::make($data, [
        'release_year' => 'nullable|integer|min:1950|max:' . (date('Y') + 1),
    ]);
    
  4. Export/Import Seed your database from Wikipedia dumps:

    // Export parsed games to JSON
    $games = Game::all()->toJson();
    file_put_contents('games_export.json', $games);
    
Weaver

How can I help you explore Laravel packages today?

Conversation history is not saved when not logged in.
Prompt
Add packages to context
No packages found.
jayeshmepani/jpl-moshier-ephemeris-php
elnasnato/laraliveui
labrodev/rest-sdk
sampaui/sampaui
babelqueue/php-sdk
facebook/capi-param-builder-php
babelqueue/symfony
hamzi/corewatch
minionfactory/raw-hydrator
hexters/coinpayment
rjcodes/rjcms
act-training/laravel-permissions-manager
alimarchal/laravel-chart-of-accounts
babenkoivan/elastic-scout-driver
mkwebdesign/filament-watchdog-v5
renatomarinho/laravel-page-speed
zedmagdy/filament-business-hours
renatovdemoura/blade-elements-ui
devgeek/beacon-admin
benjamin-rqt/data-watcher-bundle