- How do I install smalot/pdfparser in a Laravel project?
- Use Composer to install it with `composer require smalot/pdfparser`. No Laravel-specific setup is required, though you may want to wrap it in a service class for better integration. The package supports PHP 7.1+ and works standalone.
- Can smalot/pdfparser handle encrypted or password-protected PDFs?
- No, the library does not support encrypted PDFs. You’ll need to pre-process such files (e.g., decrypt them externally) or use a fallback like Tesseract OCR for scanned PDFs. The `setIgnoreEncryption` config option can bypass errors but won’t extract content.
- Is smalot/pdfparser compatible with Laravel’s service container?
- Yes, you can register it as a singleton in Laravel’s DI container. For example, bind it in `AppServiceProvider` or use a service class like `PdfParserService` to encapsulate parsing logic and configurations.
- What Laravel versions does smalot/pdfparser support?
- The package itself is PHP 7.1+ compatible and works with any Laravel version that supports those PHP requirements. There are no Laravel-specific dependencies, so it integrates seamlessly with Laravel 5.8+ or newer.
- How do I optimize memory usage for large PDFs in Laravel?
- Use the `setDecodeMemoryLimit` config option to cap memory usage (e.g., `50MB`). For very large files, consider chunked processing or offloading parsing to a queue job with `ParsePdfJob::dispatch()`. Avoid streaming for now, as the library lacks built-in support.
- Can I use smalot/pdfparser in Laravel queues for background processing?
- Yes, dispatch parsing jobs to queues (e.g., `ParsePdfJob::dispatch($filePath)->onQueue('pdf-parsing')`). This is ideal for handling large batches or user-uploaded PDFs without blocking HTTP requests. Implement retries for failed parses.
- What are the alternatives to smalot/pdfparser for Laravel?
- Alternatives include `setasign/fpdf` (for generating PDFs), `spatie/pdf-to-text` (simpler text extraction), or `symfony/dom-crawler` + external tools like `pdftotext` for complex cases. Choose based on whether you need metadata, encryption support, or OCR.
- How do I extract metadata (e.g., author, title) from a PDF in Laravel?
- Parse the PDF with `$pdf = $parser->parseFile($path)` and access metadata via `$pdf->getInfo()`, which returns an array of key-value pairs like `author`, `title`, or `subject`. Example: `$metadata = $pdf->getInfo()['author'];`
- Does smalot/pdfparser support page-ordered text extraction?
- Yes, the library extracts text in page order by default. Use `$pdf->getText()` to get all text or `$pdf->getPageText($pageNumber)` for specific pages. This is useful for structured document processing like invoices or contracts.
- How do I handle parsing errors or malformed PDFs in Laravel?
- Wrap parsing logic in try-catch blocks to catch exceptions like `PdfParserException`. Log errors with Laravel’s logging system and implement fallback strategies, such as caching failed parses or notifying users via email or queues.