Weave Code
Code Weaver
Helps Laravel developers discover, compare, and choose open-source packages. See popularity, security, maintainers, and scores at a glance to make better decisions.
Feedback
Share your thoughts, report bugs, or suggest improvements.
Subject
Message

Stemmer Laravel Package

dompat/stemmer

Strictly typed PHP 8.3+ stemming library for better full-text search, indexing, and text analysis. Supports LIGHT and AGGRESSIVE modes, with extensible language drivers. Includes Czech and English out of the box; add custom locales easily.

View on GitHub
Deep Wiki
Context7

Technical Evaluation

Architecture Fit

  • Search Relevance: Directly addresses morphological normalization for full-text search, improving recall by reducing inflected forms to root stems (e.g., "running" → "run"). Ideal for Elasticsearch, Algolia, or Laravel Scout integrations where query expansion is critical.
  • Multilingual Support: Tailored for Czech/English markets, aligning with products targeting Europe or bilingual audiences. Extensible for custom languages (e.g., Slovak via CzechDriver hack) without core changes.
  • Text Processing Pipeline: Fits seamlessly into NLP workflows (e.g., sentiment analysis, topic modeling) by standardizing vocabulary. Can be chained with other Laravel packages like Laravel NLP or Symfony Text.
  • Lightweight Alternative: Avoids heavy dependencies (e.g., Lucene) while providing strict typing and modern PHP features (enums, generics), reducing technical debt.
  • Laravel-Specific Use Cases:
    • Scout/Algolia: Pre-process toSearchableArray() for indexed fields.
    • PostgreSQL Full-Text: Enhance tsvector queries with stemmed input.
    • API Responses: Normalize search results for consistency.

Integration Feasibility

  • Composer Dependency: Zero-friction installation with zero runtime overhead (pure PHP, no extensions).
  • PHP 8.3+ Constraint: High risk if using older Laravel versions (e.g., 8.x). Mitigation:
    • Upgrade PHP to 8.3+ (recommended for new projects).
    • Isolate via Docker for legacy environments.
    • Polyfill (not ideal; may introduce edge-case bugs).
  • Service Provider Pattern: Encapsulates initialization logic cleanly:
    // app/Providers/StemmerServiceProvider.php
    public function register()
    {
        $this->app->bind(Stemmer::class, fn() => new Stemmer([
            new EnglishDriver('en'),
            new CzechDriver('cs'),
        ]));
    }
    
  • Facade for Convenience:
    // app/Facades/Stemmer.php
    public static function stem(string $word, string $locale = 'en', string $mode = StemmerMode::LIGHT): string
    
  • Middleware for API Requests:
    public function handle($request, Closure $next)
    {
        $request->merge(['stemmed_query' => stem($request->query, 'en')]);
        return $next($request);
    }
    

Technical Risk

Risk Severity Mitigation
PHP 8.3 Dependency Critical Upgrade PHP or use Docker isolation.
Unproven at Scale High Benchmark with 10K+ documents in staging before production.
Limited Language Support Medium Plan for custom driver development or fork if needed (e.g., German).
Mode Selection Complexity Medium Document LIGHT vs. AGGRESSIVE trade-offs in internal guidelines.
No Production Dependents Medium Monitor GitHub issues and add unit tests for critical use cases.
Performance Overhead Low Compare against regex-based stemming for cost-sensitive pipelines.

Key Questions

  1. Use Case Clarity:
    • Is stemming for search indexing (AGGRESSIVE) or user-facing text (LIGHT)?
    • Example: Autocomplete (LIGHT) vs. Elasticsearch relevance (AGGRESSIVE).
  2. Language Requirements:
    • Are Czech/English sufficient, or needed for German, French, etc.?
    • If yes, budget for custom driver development (1–2 weeks).
  3. Integration Points:
    • Where in the pipeline should stemming occur?
      • Pre-indexing (Scout callbacks) vs. runtime (query normalization).
  4. Fallback Strategy:
    • What if stemming fails (e.g., unsupported locale)?
      • Graceful degradation: Return original word + log error.
      • Fallback regex: Simple plural/suffix removal as backup.
  5. Testing Scope:
    • How to validate accuracy?
      • Unit tests for edge cases (e.g., stem('happiness', 'en')happi).
      • Integration tests with search backend (e.g., Elasticsearch).
  6. Scaling Assumptions:
    • Will stemming be applied to millions of documents?
      • If yes, cache frequent stems (e.g., Redis) or batch-process.

Integration Approach

Stack Fit

Laravel Component Integration Strategy Example Implementation
Scout (Algolia/Meilisearch) Pre-process toSearchableArray() with stemming. Override model method to stem fields before indexing.
PostgreSQL Full-Text Normalize tsvector input via raw queries. DB::raw("to_tsvector('english', ?)", [stem($text)])
Elasticsearch Use in custom analyzer or runtime query. POST /index/_update_by_query with stemmed terms.
Custom Search Middleware/Service Layer. app/Services/StemmerService.php for reuse.
NLP Pipelines Laravel Jobs/Queues. Stem text before sending to a ML model.
API Responses Response Macros. $response->macro('stemmed', fn($text) => stem($text));
Form Requests Validate/normalize input. stem($request->input('query')) in handle().

Migration Path

  1. Phase 1: Proof of Concept (1 day)

    • Add package to composer.json.
    • Test basic stemming in a console command:
      php artisan make:command TestStemmer
      
      public function handle()
      {
          $stemmer = new Stemmer([new EnglishDriver('en')]);
          $this->info($stemmer->stem('running', 'en')); // "run"
      }
      
    • Validate against expected outputs (e.g., stem('městě', 'cs')město).
  2. Phase 2: Scout Integration (2 days)

    • Extend Searchable trait to stem fields:
      public function toSearchableArray()
      {
          return [
              'title' => stem($this->title, 'en', StemmerMode::AGGRESSIVE),
              'body'  => stem($this->body, 'en', StemmerMode::LIGHT),
          ];
      }
      
    • Test with Scout:import and verify search results.
  3. Phase 3: Full-Text Search (1 day)

    • Replace raw LIKE queries with stemming-aware MATCH:
      $stemmedQuery = stem($request->query, 'en');
      return Post::whereRaw("MATCH(title) AGAINST(? IN NATURAL LANGUAGE MODE)", [$stemmedQuery]);
      
    • Benchmark query performance against non-stemmed baseline.
  4. Phase 4: Custom Drivers (1–2 weeks, if needed)

    • Implement DriverInterface for unsupported languages:
      class GermanDriver implements DriverInterface { ... }
      
    • Register via service provider:
      $stemmer->addDriver(new GermanDriver('de'));
      
  5. Phase 5: Optimization (1 day)

    • Cache frequent stems (e.g., Redis):
      $cacheKey = "stem:{$word}:{$locale}:{$mode}";
      return Cache::remember($cacheKey, now()->addHours(1), fn() => $stemmer->stem($word, $locale, $mode));
      
    • Benchmark against alternatives (e.g., Snowball Stemmer).

Compatibility

  • PHP 8.3+: Hard requirement. Mitigation:
    • Upgrade PHP (recommended for new projects).
    • Docker isolation for legacy environments:
      FROM php:8.3-cli
      WORKDIR /app
      COPY composer.json .
      RUN composer install
      
  • Laravel Versions:
    • LTS (10.x, 11.x): Full compatibility.
    • Legacy (8.x, 9.x): Requires PHP 8.3 upgrade or isolation.
  • Database Agnostic: Works with any storage backend (Scout, raw SQL, etc.).
  • No External Dependencies: Pure PHP; no extensions or system libraries.

Sequencing

  1. **Upgrade
Weaver

How can I help you explore Laravel packages today?

Conversation history is not saved when not logged in.
Prompt
Add packages to context
No packages found.
daikazu/eloquent-salesforce-objects
unseen-codes/chat
romalytar/yammi-jobs-monitoring-laravel
kisame76/filament-db-table-state
nqxcode/laravel-lucene-search
dpfx/laravel-livewire-wizards
workos/workos-php-laravel
sofa/laravel-global-scope
nawasara/auth-primitives
adhocrat-io/arkhe-main
make-dev/orca-harpoon
itsemon245/lamet
baks-dev/dashboard
amoifr/pickle-panther-bundle
make-dev/orca
dmstr/symfony-system-resources-bundle
dmstr/symfony-job-queue-bundle
dmstr/openapi-json-schema-bundle
dmstr/keycloak-security-bundle
dmstr/doctrine-audit-log-bundle