Weave Code
Code Weaver
Helps Laravel developers discover, compare, and choose open-source packages. See popularity, security, maintainers, and scores at a glance to make better decisions.
Feedback
Share your thoughts, report bugs, or suggest improvements.
Subject
Message

Pdf Pack Laravel Package

1tomany/pdf-pack

View on GitHub
Deep Wiki
Context7

Getting Started

Minimal Setup

  1. Install the package:

    composer require 1tomany/pdf-pack
    
  2. Install Poppler (required dependency):

    • macOS: brew install poppler
    • Ubuntu/Debian: apt-get install poppler-utils Verify pdfinfo, pdftoppm, and pdftotext are in $PATH.
  3. First use case: Extract text from a PDF:

    use OneToMany\PdfPack\Client\Poppler\PopplerClient;
    
    $client = new PopplerClient();
    $textGenerator = $client->extractText('path/to/file.pdf');
    
    foreach ($textGenerator as $page => $text) {
        echo "Page {$page} text: {$text}\n";
    }
    

Key Entry Points

  • Direct usage: Instantiate PopplerClient and call methods directly (e.g., extractText(), rasterizePage()).
  • Actions: Use ClientFactory for dependency injection (e.g., in Laravel services).

Implementation Patterns

Core Workflows

  1. Text Extraction for LLM Processing

    $client = app(PopplerClient::class);
    $textGenerator = $client->extractText($pdfPath);
    
    // Stream text to a file or process page-by-page
    foreach ($textGenerator as $page => $text) {
        Storage::disk('llm')->put("page_{$page}.txt", $text);
    }
    
  2. Rasterizing Pages for OCR or Storage

    // Rasterize a single page as PNG
    $imageGenerator = $client->rasterizePage($pdfPath, 1, 'png');
    foreach ($imageGenerator as $page => $imageData) {
        Storage::disk('images')->put("page_{$page}.png", $imageData);
    }
    
    // Rasterize a range of pages (e.g., pages 2-5)
    $imageGenerator = $client->rasterizePages($pdfPath, 2, 5, 'jpeg');
    
  3. Metadata Inspection

    $metadata = $client->getMetadata($pdfPath);
    $pageCount = $metadata->getPageCount();
    

Integration Tips

  • Laravel Service Providers: Bind the client in AppServiceProvider:
    $this->app->singleton(PopplerClient::class, function ($app) {
        return new PopplerClient();
    });
    
  • Queue Jobs for Large PDFs: Use Laravel queues to process PDFs asynchronously:
    ExtractPdfJob::dispatch($pdfPath, $outputPath)->onQueue('pdf-processing');
    
  • Data URIs for Frontend: Leverage 1tomany/data-uri (if installed) to generate embeddable images:
    $imageData = $client->rasterizePage($pdfPath, 1, 'png')->current();
    $dataUri = $imageData->toDataUri();
    

Common Use Cases

Use Case Implementation Pattern
Preprocess PDFs for LLM Extract text + metadata → Store in DB
Dynamic PDF thumbnails Rasterize pages → Cache in Redis
Batch PDF conversion Queue jobs for parallel processing
PDF metadata validation Check pageCount before processing

Gotchas and Tips

Pitfalls

  1. Poppler Dependencies:

    • Error: Command not found → Ensure Poppler binaries (pdfinfo, pdftoppm, pdftotext) are installed and in $PATH.
    • Fix: Use which pdftoppm to verify installation.
  2. Memory Limits:

    • Issue: Large PDFs may exhaust memory when processing all pages at once.
    • Fix: Use generators (yield) or stream results to disk/filesystem incrementally.
  3. Page Indexing:

    • Gotcha: Pages are 1-indexed (not 0-indexed). Example:
      // Extracts page 1 (not page 0)
      $client->extractText($pdfPath, 1);
      
  4. Image Formats:

    • Default: PNG (not JPEG). Explicitly specify format if needed:
      $client->rasterizePage($pdfPath, 1, 'jpeg');
      

Debugging

  • Verbose Output: Enable Symfony Process debug mode:
    $client = new PopplerClient(['verbose' => true]);
    
  • Command Failures: Check return codes in Symfony\Component\Process\Exception\ProcessFailedException.

Extension Points

  1. Custom Output Handling: Extend OneToMany\PdfPack\Contract\Client\ClientInterface to add custom logic (e.g., OCR post-processing):

    class CustomPdfClient implements ClientInterface {
        public function extractText(string $filePath, ?int $page = null): Generator {
            // Add custom logic (e.g., call Tesseract OCR)
            yield from parent::extractText($filePath, $page);
        }
    }
    
  2. Configuration: Override default Poppler commands in the client constructor:

    $client = new PopplerClient([
        'commands' => [
            'info' => '/custom/path/pdfinfo',
            'rasterize' => '/custom/path/pdftoppm',
            'extract' => '/custom/path/pdftotext',
        ],
    ]);
    
  3. Testing: Mock Symfony\Component\Process\Process for unit tests:

    $process = $this->createMock(Process::class);
    $process->method('run')->willReturn(0);
    $client = new PopplerClient(['process' => $process]);
    

Performance Tips

  • Batch Processing: Process pages in chunks (e.g., 10 pages at a time) to avoid memory spikes.
  • Caching: Cache rasterized images or extracted text (e.g., using Laravel’s cache or Redis):
    $cacheKey = "pdf_{$pdfPath}_page_{$page}";
    if (!$cachedText = cache()->get($cacheKey)) {
        $text = $client->extractText($pdfPath, $page)->current();
        cache()->put($cacheKey, $text, now()->addHours(1));
    }
    

Laravel-Specific Quirks

  • Storage Disk: Use Laravel’s filesystem to save outputs:
    $imageGenerator = $client->rasterizePage($pdfPath, 1, 'png');
    foreach ($imageGenerator as $page => $imageData) {
        Storage::disk('public')->put("pdfs/{$pdfPath}/page_{$page}.png", $imageData);
    }
    
  • Artisan Commands: Create a command for bulk processing:
    class ProcessPdfsCommand extends Command {
        protected $signature = 'pdfs:process {path}';
        public function handle() {
            $client = app(PopplerClient::class);
            // Process PDF...
        }
    }
    
Weaver

How can I help you explore Laravel packages today?

Conversation history is not saved when not logged in.
Prompt
Add packages to context
No packages found.
codeflextech/permission-manager
karnoweb/livewire-datepicker
sayedenam/sayed-dashboard
milito/query-filter
apiboxsym/user-bundle
apiboxsym/health-check-bundle
jayeshmepani/jpl-moshier-ephemeris-php
elnasnato/laraliveui
labrodev/rest-sdk
sampaui/sampaui
babelqueue/php-sdk
facebook/capi-param-builder-php
babelqueue/symfony
hamzi/corewatch
minionfactory/raw-hydrator
hexters/coinpayment
rjcodes/rjcms
act-training/laravel-permissions-manager
alimarchal/laravel-chart-of-accounts
babenkoivan/elastic-scout-driver