dompat/stemmer
Strictly typed PHP 8.3+ stemming library for better full-text search, indexing, and text analysis. Supports LIGHT and AGGRESSIVE modes, with extensible language drivers. Includes Czech and English out of the box; add custom locales easily.
toSearchableArray() for indexed fields.tsvector queries with stemmed input.// app/Providers/StemmerServiceProvider.php
public function register()
{
$this->app->bind(Stemmer::class, fn() => new Stemmer([
new EnglishDriver('en'),
new CzechDriver('cs'),
]));
}
// app/Facades/Stemmer.php
public static function stem(string $word, string $locale = 'en', string $mode = StemmerMode::LIGHT): string
public function handle($request, Closure $next)
{
$request->merge(['stemmed_query' => stem($request->query, 'en')]);
return $next($request);
}
| Risk | Severity | Mitigation |
|---|---|---|
| PHP 8.3 Dependency | Critical | Upgrade PHP or use Docker isolation. |
| Unproven at Scale | High | Benchmark with 10K+ documents in staging before production. |
| Limited Language Support | Medium | Plan for custom driver development or fork if needed (e.g., German). |
| Mode Selection Complexity | Medium | Document LIGHT vs. AGGRESSIVE trade-offs in internal guidelines. |
| No Production Dependents | Medium | Monitor GitHub issues and add unit tests for critical use cases. |
| Performance Overhead | Low | Compare against regex-based stemming for cost-sensitive pipelines. |
stem('happiness', 'en') → happi).| Laravel Component | Integration Strategy | Example Implementation |
|---|---|---|
| Scout (Algolia/Meilisearch) | Pre-process toSearchableArray() with stemming. |
Override model method to stem fields before indexing. |
| PostgreSQL Full-Text | Normalize tsvector input via raw queries. |
DB::raw("to_tsvector('english', ?)", [stem($text)]) |
| Elasticsearch | Use in custom analyzer or runtime query. | POST /index/_update_by_query with stemmed terms. |
| Custom Search | Middleware/Service Layer. | app/Services/StemmerService.php for reuse. |
| NLP Pipelines | Laravel Jobs/Queues. | Stem text before sending to a ML model. |
| API Responses | Response Macros. | $response->macro('stemmed', fn($text) => stem($text)); |
| Form Requests | Validate/normalize input. | stem($request->input('query')) in handle(). |
Phase 1: Proof of Concept (1 day)
composer.json.php artisan make:command TestStemmer
public function handle()
{
$stemmer = new Stemmer([new EnglishDriver('en')]);
$this->info($stemmer->stem('running', 'en')); // "run"
}
stem('městě', 'cs') → město).Phase 2: Scout Integration (2 days)
Searchable trait to stem fields:
public function toSearchableArray()
{
return [
'title' => stem($this->title, 'en', StemmerMode::AGGRESSIVE),
'body' => stem($this->body, 'en', StemmerMode::LIGHT),
];
}
Phase 3: Full-Text Search (1 day)
LIKE queries with stemming-aware MATCH:
$stemmedQuery = stem($request->query, 'en');
return Post::whereRaw("MATCH(title) AGAINST(? IN NATURAL LANGUAGE MODE)", [$stemmedQuery]);
Phase 4: Custom Drivers (1–2 weeks, if needed)
DriverInterface for unsupported languages:
class GermanDriver implements DriverInterface { ... }
$stemmer->addDriver(new GermanDriver('de'));
Phase 5: Optimization (1 day)
$cacheKey = "stem:{$word}:{$locale}:{$mode}";
return Cache::remember($cacheKey, now()->addHours(1), fn() => $stemmer->stem($word, $locale, $mode));
FROM php:8.3-cli
WORKDIR /app
COPY composer.json .
RUN composer install
How can I help you explore Laravel packages today?