smalot/pdfparser
Standalone PHP PDF parsing library to extract text, pages, and metadata from PDFs. Supports compressed PDFs and various encodings, with configurable parsing options. Note: secured PDFs and form data extraction are not supported.
smalot/pdfparser is a lightweight, self-contained library designed for PDF parsing, making it a strong fit for Laravel applications where PDF processing is required (e.g., document ingestion, metadata extraction, or text analysis).setIgnoreEncryption) for non-secured but flagged PDFs. Encrypted PDFs are explicitly unsupported.setDecodeMemoryLimit, setRetainImageContent) to avoid crashes.PdfParser interface).spatie/laravel-pdf).Storage facade to read PDFs from local/disk, S3, or other adapters.Illuminate\Support\Facades\Cache) to avoid reprocessing identical files.Allowed memory exhausted errors for large PDFs (>100MB). Mitigation: Use setDecodeMemoryLimit and setRetainImageContent(false).setasign/fpdf with encryption support).mikehaertl/phpwkhtmltopdf for HTML conversion).MAC OS Roman support is noted but may not cover all cases).spatie/laravel-pdf (wrapper for dompdf/wkhtmltopdf) if HTML conversion is a goal.phenx/php-pdf or barryvdh/laravel-dompdf for generation-focused use cases.$this->app->bind(\Smalot\PdfParser\Parser::class, function ($app) {
$config = new \Smalot\PdfParser\Config();
$config->setDecodeMemoryLimit(50 * 1024 * 1024); // 50MB
return new \Smalot\PdfParser\Parser([], $config);
});
PdfParser::extractText($path)).php artisan pdf:parse /path/to/files).Storage facade to read PDFs from any supported adapter (local, S3, etc.):
$pdfContent = Storage::disk('s3')->get('invoices/invoice.pdf');
$pdf = $parser->parseContent($pdfContent);
class ParsePdfJob implements ShouldQueue
{
use Dispatchable, InteractsWithQueue, Queueable;
public function handle(PdfParser $parser) {
$pdf = $parser->parseFile(storage_path('app/pdf.pdf'));
// Process results...
}
}
getText()).getDetails()) for indexing.getDataTm() for text positioning) if needed.Config settings).setDecodeMemoryLimit.memory: 512Mi).composer require smalot/pdfparser.Config options based on pilot testing (e.g., memory limits, whitespace handling).app/Services/PdfParserService.php).class PdfParserService {
public function __construct(private Parser $parser) {}
public function extractTextFromPath(string $path): string {
$pdf = $this->parser->parseFile($path);
return $pdf->getText();
}
}
public function getPdfMetadata(string $path): array {
$pdf = $this->parser->parseFile($path);
return $pdf->getDetails();
}
Exception for unsupported PDFs (e.g., encrypted).RuntimeException for memory issues.Cache::remember).Config settings in a config file (e.g., config/pdfparser.php) for easy adjustments:
return [
'memory_limit' => env('PDF_PARSER_MEMORY_LIMIT', 50 * 1024 * 1024),
'
How can I help you explore Laravel packages today?