Weave Code
Code Weaver
Helps Laravel developers discover, compare, and choose open-source packages. See popularity, security, maintainers, and scores at a glance to make better decisions.
Feedback
Share your thoughts, report bugs, or suggest improvements.
Subject
Message

Pdfparser Laravel Package

smalot/pdfparser

Standalone PHP PDF parsing library to extract text, pages, and metadata from PDFs. Supports compressed PDFs and various encodings, with configurable parsing options. Note: secured PDFs and form data extraction are not supported.

View on GitHub
Deep Wiki
Context7

smalot/pdfparser is a standalone PHP library for extracting content and structure from PDF files. It parses PDF headers/objects to provide easy access to text, metadata, and page-ordered content, with configurable behavior via custom configurations.

Ideal for document indexing, analysis, or content pipelines where you need reliable PDF text extraction in PHP (PHP 7.1+).

  • Extract text from pages in order
  • Read metadata (author, description, etc.)
  • Parse objects and headers for deeper inspection
  • Supports compressed PDFs
  • Handles MAC OS Roman, hex, and octal text encodings
Frequently asked questions about Pdfparser
How do I install smalot/pdfparser in a Laravel project?
Use Composer to install it with `composer require smalot/pdfparser`. No Laravel-specific setup is required, though you may want to wrap it in a service class for better integration. The package supports PHP 7.1+ and works standalone.
Can smalot/pdfparser handle encrypted or password-protected PDFs?
No, the library does not support encrypted PDFs. You’ll need to pre-process such files (e.g., decrypt them externally) or use a fallback like Tesseract OCR for scanned PDFs. The `setIgnoreEncryption` config option can bypass errors but won’t extract content.
Is smalot/pdfparser compatible with Laravel’s service container?
Yes, you can register it as a singleton in Laravel’s DI container. For example, bind it in `AppServiceProvider` or use a service class like `PdfParserService` to encapsulate parsing logic and configurations.
What Laravel versions does smalot/pdfparser support?
The package itself is PHP 7.1+ compatible and works with any Laravel version that supports those PHP requirements. There are no Laravel-specific dependencies, so it integrates seamlessly with Laravel 5.8+ or newer.
How do I optimize memory usage for large PDFs in Laravel?
Use the `setDecodeMemoryLimit` config option to cap memory usage (e.g., `50MB`). For very large files, consider chunked processing or offloading parsing to a queue job with `ParsePdfJob::dispatch()`. Avoid streaming for now, as the library lacks built-in support.
Can I use smalot/pdfparser in Laravel queues for background processing?
Yes, dispatch parsing jobs to queues (e.g., `ParsePdfJob::dispatch($filePath)->onQueue('pdf-parsing')`). This is ideal for handling large batches or user-uploaded PDFs without blocking HTTP requests. Implement retries for failed parses.
What are the alternatives to smalot/pdfparser for Laravel?
Alternatives include `setasign/fpdf` (for generating PDFs), `spatie/pdf-to-text` (simpler text extraction), or `symfony/dom-crawler` + external tools like `pdftotext` for complex cases. Choose based on whether you need metadata, encryption support, or OCR.
How do I extract metadata (e.g., author, title) from a PDF in Laravel?
Parse the PDF with `$pdf = $parser->parseFile($path)` and access metadata via `$pdf->getInfo()`, which returns an array of key-value pairs like `author`, `title`, or `subject`. Example: `$metadata = $pdf->getInfo()['author'];`
Does smalot/pdfparser support page-ordered text extraction?
Yes, the library extracts text in page order by default. Use `$pdf->getText()` to get all text or `$pdf->getPageText($pageNumber)` for specific pages. This is useful for structured document processing like invoices or contracts.
How do I handle parsing errors or malformed PDFs in Laravel?
Wrap parsing logic in try-catch blocks to catch exceptions like `PdfParserException`. Log errors with Laravel’s logging system and implement fallback strategies, such as caching failed parses or notifying users via email or queues.
Weaver

How can I help you explore Laravel packages today?

Conversation history is not saved when not logged in.
Prompt
Add packages to context
No packages found.
davejamesmiller/laravel-breadcrumbs
artisanry/parsedown
christhompsontldr/phpsdk
enqueue/dsn
bunny/bunny
enqueue/test
enqueue/null
enqueue/amqp-tools
milesj/emojibase
bower-asset/punycode
bower-asset/inputmask
bower-asset/jquery
bower-asset/yii2-pjax
laravel/nova
spatie/laravel-mailcoach
spatie/laravel-superseeder
laravel/liferaft
nst/json-test-suite
danielmiessler/sec-lists
jackalope/jackalope-transport