yooper/php-text-analysis
PHP Text Analysis provides Information Retrieval and NLP tools in PHP: tokenization, normalization, stemming, frequency and n-gram analysis, document comparison, sentiment and classification, collocations (PMI), lexical diversity, corpus analysis, and summarization.
Begin by installing the package via Composer:
composer require yooper/php-text-analysis
The most approachable entry point is its global-friendly helper functions (e.g., tokenize(), freq_dist(), stem()). Start with basic text preprocessing and analysis:
use TextAnalysis\Functions\AnalysisFunctions as text;
$text = "Laravel makes PHP development joyful!";
$tokens = text::tokenize($text);
$freqDist = text::freq_dist($tokens);
$keywords = array_keys($freqDist->top(3));
These functions require no setup and work out of the box — perfect for adding quick analysis (e.g., keyword frequency, n-gram trends) toartisan commands, jobs, or controllers.
$tokens = text::tokenize($review);
$tokens = text::normalize_tokens($tokens, 'mb_strtolower');
$tokens = text::stem($tokens);
$bigrams = text::ngrams($tokens, 2);
vader() for sentiment or naive_bayes() for classification — both accept token arrays directly after normalization.text::rake() in a KeywordExtractor service).text::naive_bayes() in a classifier service with persistent training: cache trained models via serialize() for reuse across requests.SentenceTokenizer for legal docs, PorterStemmer for generic content).Corpus object for TF-IDF or lexical diversity (e.g., measure uniqueness in product descriptions).vader() with fewer than ~3 tokens — validate or return neutral sentiment early to prevent errors (v1.4.1 fixed some edge cases but not all).rake() — otherwise, scores become unreliable. Use built-in StopWords classes or text::normalize_tokens() first.naive_bayes() instance learns incrementally. Store the trained classifier (e.g., in cache) — don’t retrain on every request.ngrams($tokens, 3, '_') produces token1_token2_token3 — ensure downstream logic (e.g., search) expects underscored forms.print_r($freqDist) or var_dump($tokens) to inspect internal structures — many classes lack __toString() and rely on array access.How can I help you explore Laravel packages today?