- How do I install smalot/pdfparser in a Laravel project?
- Run `composer require smalot/pdfparser` in your project directory. The package requires PHP 7.1+ and integrates seamlessly with Laravel as a standalone library. No additional Laravel-specific dependencies are needed.
- Does smalot/pdfparser support encrypted or password-protected PDFs?
- No, this package does not support encrypted or password-protected PDFs. If you need to handle secured documents, consider using a fallback like `symfony/process` to call external tools such as `pdftotext` or explore alternatives like `spatie/pdf-to-text`.
- Can I extract structured data like tables or images from PDFs with this package?
- The package extracts raw text and metadata but does not preserve complex layouts like tables or images. For structured data, you may need to post-process the extracted text using regex, NLP, or additional libraries.
- How do I configure memory limits for large PDF files?
- Use the `setDecodeMemoryLimit()` method to adjust memory usage for large files. Additionally, set `setRetainImageContent(false)` to reduce memory consumption if you don’t need image data. This is useful for processing files over 100MB.
- Is smalot/pdfparser compatible with Laravel 9 or 10?
- Yes, the package supports PHP 7.1+, which includes Laravel 9 and 10. However, always check the latest release notes for any breaking changes, as the library is community-maintained.
- How can I integrate this package into Laravel’s service container?
- Create a service provider to bind the parser as a singleton. For example, register it in `PdfParserServiceProvider` and bind it to the container. You can then use dependency injection in your controllers or services.
- What are the alternatives to smalot/pdfparser for PDF parsing in Laravel?
- Alternatives include `spatie/pdf-to-text` (a wrapper for other parsers) or `setasign/fpdf` (for PDF generation). If you need encrypted PDF support, consider `mikehaertl/phpwkhtmltopdf` or external tools like `pdftotext`.
- How do I test the accuracy of extracted text from PDFs?
- Validate extraction accuracy by comparing output against known PDFs with expected text. Use unit tests to verify metadata extraction (e.g., author, creation date) and text order. Test with real-world documents, including scanned and multi-column layouts.
- Can I use this package in Laravel background jobs or queues?
- Yes, the package is stateless and can be used in Laravel queues for long-running tasks. Dispatch a job like `ParsePdfJob::dispatch($filePath)->onQueue('pdf-parsing')` to offload parsing from HTTP requests.
- Does smalot/pdfparser support custom configurations for parsing?
- Yes, you can create custom configurations using the `Config` class. This allows you to adjust settings like encoding, memory limits, or text extraction behavior. Refer to the [CustomConfig.md](https://github.com/smalot/pdfparser/blob/master/doc/CustomConfig.md) documentation for details.