Weave Code
Code Weaver
Helps Laravel developers discover, compare, and choose open-source packages. See popularity, security, maintainers, and scores at a glance to make better decisions.
Feedback
Share your thoughts, report bugs, or suggest improvements.
Subject
Message

Pdf To Text Laravel Package

spatie/pdf-to-text

Extract text from PDF files in PHP using Spatie’s pdf-to-text wrapper around the pdftotext binary (Poppler/Xpdf). Simple API (Pdf::getText), supports custom binary paths and options, ideal for Laravel apps needing fast PDF text extraction.

View on GitHub
Deep Wiki
Context7

Getting Started

  1. Install via Composer:
    composer require spatie/pdf-to-text
    
  2. Install poppler system dependency (required!):
    • Ubuntu/Debian: sudo apt install poppler-utils
    • macOS: brew install poppler
    • Windows: Use Poppler for Windows and ensure pdftotext.exe is in PATH.
  3. Basic usage:
    use Spatie\PdfToText\Pdf;
    
    $text = (new Pdf('path/to/document.pdf'))->text();
    
  4. First use case: Extract text from uploaded invoices to populate a searchable database.

Implementation Patterns

  • Fluent extraction: Chain methods for config:
    $text = (new Pdf($request->file('pdf')))
        ->setPdfPath('custom.pdf')
        ->options(['-layout']) // preserve layout
        ->text();
    
  • Batch processing: Loop over multiple PDFs in a queue job:
    foreach ($pdfPaths as $path) {
        $job->dispatch((new ExtractPdfJob($path))->onQueue('ocr'));
    }
    
  • Integration with Laravel files: Directly use UploadedFile instances:
    $pdf = $request->file('upload');
    $text = (new Pdf($pdf))->text();
    
  • Custom binary path: Override pdftotext path if needed (e.g., non-standard deployment):
    Pdf::setBinaryPath('/opt/poppler/bin/pdftotext');
    
  • Error handling: Wrap in try/catch—pdf may be corrupted or binary missing:
    try {
        $text = (new Pdf($path))->text();
    } catch (\Spatie\PdfToText\Exceptions\BinaryNotFound $e) {
        // Handle missing poppler
    }
    

Gotchas and Tips

  • Binary detection fails silently if which/exec disabled: Ensure shell_exec() is enabled in php.ini and open_basedir allows temp paths.
  • Layout preservation isn’t perfect: Use ->options(['-layout']) for tables/columns, but test thoroughly—complex PDFs may still misalign text.
  • OCR not built-in: This package only extracts embedded text. Scanned/image PDFs return empty/garbage; consider pairing with spatie/pdf-to-imagetesseract for OCR.
  • Encoding quirks: Non-UTF-8 PDFs may produce garbled output; use mb_convert_encoding() post-extraction if needed.
  • Long-running jobs: For large PDFs (>100 pages), offload to queues—don’t block HTTP requests.
  • Testing: Mock the binary output using Pdf::setBinaryPath('/bin/true') + override text() in tests, or stub the class.
  • Configuration via config:Publish config with php artisan vendor:publish --tag="pdf-to-text-config" to set defaults globally.
Weaver

How can I help you explore Laravel packages today?

Conversation history is not saved when not logged in.
Prompt
Add packages to context
No packages found.
davejamesmiller/laravel-breadcrumbs
artisanry/parsedown
christhompsontldr/phpsdk
enqueue/dsn
bunny/bunny
enqueue/test
enqueue/null
enqueue/amqp-tools
milesj/emojibase
bower-asset/punycode
bower-asset/inputmask
bower-asset/jquery
bower-asset/yii2-pjax
laravel/nova
spatie/laravel-mailcoach
spatie/laravel-superseeder
laravel/liferaft
nst/json-test-suite
danielmiessler/sec-lists
jackalope/jackalope-transport