Weave Code
Code Weaver
Helps Laravel developers discover, compare, and choose open-source packages. See popularity, security, maintainers, and scores at a glance to make better decisions.
Feedback
Share your thoughts, report bugs, or suggest improvements.
Subject
Message

Php Text Analysis Laravel Package

yooper/php-text-analysis

PHP Text Analysis provides Information Retrieval and NLP tools in PHP: tokenization, normalization, stemming, frequency and n-gram analysis, document comparison, sentiment and classification, collocations (PMI), lexical diversity, corpus analysis, and summarization.

View on GitHub
Deep Wiki
Context7

Getting Started

Begin by installing the package via Composer:

composer require yooper/php-text-analysis

The most approachable entry point is its global-friendly helper functions (e.g., tokenize(), freq_dist(), stem()). Start with basic text preprocessing and analysis:

use TextAnalysis\Functions\AnalysisFunctions as text;

$text = "Laravel makes PHP development joyful!";
$tokens = text::tokenize($text);
$freqDist = text::freq_dist($tokens);
$keywords = array_keys($freqDist->top(3));

These functions require no setup and work out of the box — perfect for adding quick analysis (e.g., keyword frequency, n-gram trends) toartisan commands, jobs, or controllers.

Implementation Patterns

  • Preprocessing pipelines: Chain methods to build reusable analysis workflows. For example, preprocess user reviews:
    $tokens = text::tokenize($review);
    $tokens = text::normalize_tokens($tokens, 'mb_strtolower');
    $tokens = text::stem($tokens);
    $bigrams = text::ngrams($tokens, 2);
    
  • Sentiment & classification in batch: Process data asynchronously (e.g., in queued jobs) using vader() for sentiment or naive_bayes() for classification — both accept token arrays directly after normalization.
  • Laravel integration:
    • Inject helpers into services for dependency injection (e.g., wrap text::rake() in a KeywordExtractor service).
    • Use text::naive_bayes() in a classifier service with persistent training: cache trained models via serialize() for reuse across requests.
  • Extensible stems/tokens: Swap stemmers/tokenizers dynamically based on content type (e.g., use SentenceTokenizer for legal docs, PorterStemmer for generic content).
  • Corpus analysis: Aggregate documents into a Corpus object for TF-IDF or lexical diversity (e.g., measure uniqueness in product descriptions).

Gotchas and Tips

  • VADER breaks on short inputs: Avoid calling vader() with fewer than ~3 tokens — validate or return neutral sentiment early to prevent errors (v1.4.1 fixed some edge cases but not all).
  • Stemming assumes lowercase: Always normalize tokens before stemming — mis-pairing "RUNNING" and "run" if case isn’t handled.
  • Rake expects clean tokens: Stopwords, punctuation, and numbers must be removed before passing to rake() — otherwise, scores become unreliable. Use built-in StopWords classes or text::normalize_tokens() first.
  • Naive Bayes training is stateful: Each naive_bayes() instance learns incrementally. Store the trained classifier (e.g., in cache) — don’t retrain on every request.
  • N-gram delimiters are literal: ngrams($tokens, 3, '_') produces token1_token2_token3 — ensure downstream logic (e.g., search) expects underscored forms.
  • Debug tip: Use print_r($freqDist) or var_dump($tokens) to inspect internal structures — many classes lack __toString() and rely on array access.
Weaver

How can I help you explore Laravel packages today?

Conversation history is not saved when not logged in.
Prompt
Add packages to context
No packages found.
davejamesmiller/laravel-breadcrumbs
artisanry/parsedown
christhompsontldr/phpsdk
enqueue/dsn
bunny/bunny
enqueue/test
enqueue/null
enqueue/amqp-tools
milesj/emojibase
bower-asset/punycode
bower-asset/inputmask
bower-asset/jquery
bower-asset/yii2-pjax
laravel/nova
spatie/laravel-mailcoach
spatie/laravel-superseeder
laravel/liferaft
nst/json-test-suite
danielmiessler/sec-lists
jackalope/jackalope-transport