Install the package:
composer require 1tomany/pdf-pack
Install Poppler (required dependency):
brew install popplerapt-get install poppler-utils
Verify pdfinfo, pdftoppm, and pdftotext are in $PATH.First use case: Extract text from a PDF:
use OneToMany\PdfPack\Client\Poppler\PopplerClient;
$client = new PopplerClient();
$textGenerator = $client->extractText('path/to/file.pdf');
foreach ($textGenerator as $page => $text) {
echo "Page {$page} text: {$text}\n";
}
PopplerClient and call methods directly (e.g., extractText(), rasterizePage()).ClientFactory for dependency injection (e.g., in Laravel services).Text Extraction for LLM Processing
$client = app(PopplerClient::class);
$textGenerator = $client->extractText($pdfPath);
// Stream text to a file or process page-by-page
foreach ($textGenerator as $page => $text) {
Storage::disk('llm')->put("page_{$page}.txt", $text);
}
Rasterizing Pages for OCR or Storage
// Rasterize a single page as PNG
$imageGenerator = $client->rasterizePage($pdfPath, 1, 'png');
foreach ($imageGenerator as $page => $imageData) {
Storage::disk('images')->put("page_{$page}.png", $imageData);
}
// Rasterize a range of pages (e.g., pages 2-5)
$imageGenerator = $client->rasterizePages($pdfPath, 2, 5, 'jpeg');
Metadata Inspection
$metadata = $client->getMetadata($pdfPath);
$pageCount = $metadata->getPageCount();
AppServiceProvider:
$this->app->singleton(PopplerClient::class, function ($app) {
return new PopplerClient();
});
ExtractPdfJob::dispatch($pdfPath, $outputPath)->onQueue('pdf-processing');
1tomany/data-uri (if installed) to generate embeddable images:
$imageData = $client->rasterizePage($pdfPath, 1, 'png')->current();
$dataUri = $imageData->toDataUri();
| Use Case | Implementation Pattern |
|---|---|
| Preprocess PDFs for LLM | Extract text + metadata → Store in DB |
| Dynamic PDF thumbnails | Rasterize pages → Cache in Redis |
| Batch PDF conversion | Queue jobs for parallel processing |
| PDF metadata validation | Check pageCount before processing |
Poppler Dependencies:
Command not found → Ensure Poppler binaries (pdfinfo, pdftoppm, pdftotext) are installed and in $PATH.which pdftoppm to verify installation.Memory Limits:
yield) or stream results to disk/filesystem incrementally.Page Indexing:
// Extracts page 1 (not page 0)
$client->extractText($pdfPath, 1);
Image Formats:
$client->rasterizePage($pdfPath, 1, 'jpeg');
$client = new PopplerClient(['verbose' => true]);
Symfony\Component\Process\Exception\ProcessFailedException.Custom Output Handling:
Extend OneToMany\PdfPack\Contract\Client\ClientInterface to add custom logic (e.g., OCR post-processing):
class CustomPdfClient implements ClientInterface {
public function extractText(string $filePath, ?int $page = null): Generator {
// Add custom logic (e.g., call Tesseract OCR)
yield from parent::extractText($filePath, $page);
}
}
Configuration: Override default Poppler commands in the client constructor:
$client = new PopplerClient([
'commands' => [
'info' => '/custom/path/pdfinfo',
'rasterize' => '/custom/path/pdftoppm',
'extract' => '/custom/path/pdftotext',
],
]);
Testing:
Mock Symfony\Component\Process\Process for unit tests:
$process = $this->createMock(Process::class);
$process->method('run')->willReturn(0);
$client = new PopplerClient(['process' => $process]);
$cacheKey = "pdf_{$pdfPath}_page_{$page}";
if (!$cachedText = cache()->get($cacheKey)) {
$text = $client->extractText($pdfPath, $page)->current();
cache()->put($cacheKey, $text, now()->addHours(1));
}
$imageGenerator = $client->rasterizePage($pdfPath, 1, 'png');
foreach ($imageGenerator as $page => $imageData) {
Storage::disk('public')->put("pdfs/{$pdfPath}/page_{$page}.png", $imageData);
}
class ProcessPdfsCommand extends Command {
protected $signature = 'pdfs:process {path}';
public function handle() {
$client = app(PopplerClient::class);
// Process PDF...
}
}
How can I help you explore Laravel packages today?