Weave Code
Code Weaver
Helps Laravel developers discover, compare, and choose open-source packages. See popularity, security, maintainers, and scores at a glance to make better decisions.
Feedback
Share your thoughts, report bugs, or suggest improvements.
Subject
Message

Laravel Text Chunker Laravel Package

droath/laravel-text-chunker

Flexible Laravel text chunking for AI/LLM apps. Split content into smaller chunks by characters, tokens, sentences, or markdown-aware rules. Fluent, strategy-based API ideal for fitting token limits, RAG pipelines, and custom domain splitting.

View on GitHub
Deep Wiki
Context7

Getting Started

Install the package via Composer, then start chunking text immediately using the provided facade. No service provider registration is needed—Laravel auto-discovers it. Your first use case will likely be splitting text for an LLM API (e.g., OpenAI) where token limits matter: use TextChunker::strategy('token')->size(500)->chunk($longText). For simpler cases, strategy('character') offers predictable, fixed-size chunks. Check config/text-chunker.php after publishing to set defaults (e.g., default strategy, sentence abbreviations, token model). Read the Basic Usage section in the README first—it covers all common patterns in under 5 minutes.

Implementation Patterns

  • LLM Preprocessing Pipeline: Wrap chunking in a service class or job—inject TextChunkerManager and chain with validation/normalization. Use token strategy with gpt-4 or gpt-3.5-turbo depending on model pricing and context windows.
  • RAG Context Preservation: Combine overlap(20) with sentence or markdown strategies to retain continuity across embeddings. This is critical when chunk boundaries split key semantic units.
  • Document Processing Workflows: For technical docs or markdown knowledge bases, prefer the markdown strategy—it avoids breaking code blocks, lists, or headers mid-element. Combine with size(100) and overlap(15) for balanced retrieval granularity.
  • Custom Business Logic: Implement domain-specific strategies (e.g., WordStrategy, paragraph-based, or section-aware) by implementing ChunkerStrategyInterface and register via extend() in a service provider or config.
  • Stateless Chunking in Controllers: Keep your controller clean by delegating chunking to a dedicated transformer or DTO—reuse the fluent interface (->size(250)->overlap(10)) across requests without side effects.

Gotchas and Tips

  • Token Count ≠ Character Count: token strategy relies on yethee/tiktoken, which caches encodings per model. Memory usage spikes if chunking many long texts concurrently—consider chunking in batches and clearing PHP arrays.
  • Overlap is Percent-based, Not Fixed: overlap(20) adds 20% of previous chunk content—not 20% of current chunk size. Expect overlapping text to slightly increase total output size (especially with short, dense chunks).
  • Markdown Strategy Skips Mid-Element Splits: If a code block is 300 chars and size(100) is set, the block stays intact—even if it violates size constraints. This preserves integrity but may produce uneven chunk sizes.
  • Validation Happens at chunk() Time: Misconfigurations (e.g., invalid overlap, missing size) throw exceptions only when chunk() runs—not when building the chain. Wrap calls in try/catch for ChunkerException.
  • Abbreviation Sensitivity: The sentence strategy’s abbreviation list is case-sensitive. 'Dr.' won’t match 'dr.' unless you manually add both. Consider normalizing input text first.
  • Custom Strategy Overlap Support: The HasOverlap trait is provided but not enforced—implement overlap manually if your strategy requires it. The trait only supplies the logic, not automatic application.
  • Position Accuracy: start_position and end_position are UTF-8 safe and 0-indexed. Use them to map chunks back to source locations (e.g., for highlighting or source attribution in RAG). Avoid string functions that assume single-byte characters.
Weaver

How can I help you explore Laravel packages today?

Conversation history is not saved when not logged in.
Prompt
Add packages to context
No packages found.
davejamesmiller/laravel-breadcrumbs
artisanry/parsedown
christhompsontldr/phpsdk
enqueue/dsn
bunny/bunny
enqueue/test
enqueue/null
enqueue/amqp-tools
bower-asset/punycode
bower-asset/inputmask
bower-asset/jquery
bower-asset/yii2-pjax
laravel/nova
spatie/laravel-mailcoach
spatie/laravel-superseeder
laravel/liferaft
nst/json-test-suite
danielmiessler/sec-lists
jackalope/jackalope-transport
twbs/bootstrap4