Weave Code
Code Weaver
Helps Laravel developers discover, compare, and choose open-source packages. See popularity, security, maintainers, and scores at a glance to make better decisions.
Feedback
Share your thoughts, report bugs, or suggest improvements.
Subject
Message

Tesseract Bridge Bundle Laravel Package

bicycle/tesseract-bridge-bundle

View on GitHub
Deep Wiki
Context7

Getting Started

Minimal Setup

  1. Install the Bundle

    composer require bicycle/tesseract-bridge-bundle
    

    Register the bundle in config/bundles.php (Symfony 5+ auto-discovers it, but verify if using older versions).

  2. Configure Tesseract Add to config/packages/bicycle_tesseract_bridge.yaml:

    bicycle_tesseract_bridge:
        path: /usr/bin/tesseract  # Path to Tesseract executable
        language: eng            # Default language (e.g., 'eng', 'fra')
        options:
            - '--psm 6'           # Page segmentation mode (adjust as needed)
    
  3. First Use Case: OCR on an Uploaded Image

    use Bicycle\TesseractBridge\TesseractBridge;
    
    // In a controller or service:
    $tesseract = new TesseractBridge();
    $text = $tesseract->ocr('path/to/image.png');
    return response()->json(['text' => $text]);
    
  4. Verify Installation Run a test OCR on a known image (e.g., a screenshot) to confirm the setup works.


Implementation Patterns

Common Workflows

  1. Image Processing Pipeline

    • Upload → Validate → OCR → Store/Process
      // Example: Controller handling file uploads
      public function upload(Request $request) {
          $file = $request->file('image');
          $file->store('temp');
      
          $tesseract = new TesseractBridge();
          $text = $tesseract->ocr(storage_path('app/temp/' . $file->hashName()));
      
          // Process $text (e.g., save to DB, trigger events)
          return redirect()->back()->with('success', 'OCR completed!');
      }
      
  2. Language-Specific OCR Override language per request:

    $tesseract->setLanguage('fra'); // French
    $text = $tesseract->ocr('image.png');
    
  3. Batch Processing Use Symfony’s Messenger or queues for async OCR:

    // Dispatch a job
    $this->dispatch(new ProcessOcrJob($imagePath));
    
    // Job handler
    public function handle() {
        $tesseract = new TesseractBridge();
        $text = $tesseract->ocr($this->imagePath);
        // Save to DB, etc.
    }
    
  4. Integration with Forms Use in a FormType for dynamic OCR:

    // In a form builder
    $builder->add('image', FileType::class, [
        'mapped' => false,
        'constraints' => [new File(['maxSize' => '1024k'])]
    ]);
    
  5. Dependency Injection Register TesseractBridge as a service in services.yaml:

    services:
        App\Services\OcrService:
            arguments:
                $tesseract: '@bicycle.tesseract_bridge'
    

Integration Tips

  • File Validation: Use Symfony’s File constraint to validate images before OCR.
  • Error Handling: Wrap OCR calls in try-catch to handle Tesseract failures gracefully:
    try {
        $text = $tesseract->ocr($path);
    } catch (\Exception $e) {
        $this->addFlash('error', 'OCR failed: ' . $e->getMessage());
        return back();
    }
    
  • Configuration: Extend the bundle’s config for project-specific defaults (e.g., custom Tesseract paths per environment).
  • Testing: Mock TesseractBridge in PHPUnit:
    $mock = $this->createMock(TesseractBridge::class);
    $mock->method('ocr')->willReturn('test text');
    $this->container->set('bicycle.tesseract_bridge', $mock);
    

Gotchas and Tips

Pitfalls

  1. Tesseract Installation

    • Issue: tesseract CLI tool not found.
    • Fix: Install via package manager (e.g., apt-get install tesseract-ocr on Ubuntu) and update the path in config.
    • Debug: Run which tesseract in your server’s terminal to verify the path.
  2. Language Packs

    • Issue: OCR returns gibberish for non-English text.
    • Fix: Install language packs (e.g., tesseract-ocr-fra for French) and ensure the language config matches.
    • Debug: List available languages with tesseract --list-langs.
  3. Memory Limits

    • Issue: Large images fail with "out of memory" errors.
    • Fix: Use Tesseract’s --psm (page segmentation mode) to limit processing area or resize images before OCR.
  4. Symfony Cache

    • Issue: Changes to config aren’t reflected.
    • Fix: Clear cache (php bin/console cache:clear) after updating bicycle_tesseract_bridge.yaml.
  5. Permissions

    • Issue: "Permission denied" when running OCR.
    • Fix: Ensure the web server user (e.g., www-data) has read access to the image files and execute permissions for the tesseract binary.

Debugging Tips

  1. Log Raw Output Use Tesseract’s --debug flag (if supported) or log the raw command output:

    $tesseract->setOptions(['--debug']);
    $text = $tesseract->ocr($path);
    // Log $tesseract->getLastCommand() for debugging.
    
  2. Check Tesseract Version Ensure compatibility with your PHP version (this bundle requires PHP 7.4+). Run:

    tesseract --version
    
  3. Environment-Specific Configs Use Symfony’s %kernel.environment% to load different Tesseract paths per environment (e.g., dev vs. prod).


Extension Points

  1. Custom OCR Logic Extend TesseractBridge to add pre/post-processing:

    class CustomTesseractBridge extends TesseractBridge {
        public function ocr($file) {
            $text = parent::ocr($file);
            return $this->postProcess($text);
        }
    
        protected function postProcess(string $text): string {
            // Add custom logic (e.g., regex cleanup)
            return preg_replace('/[^a-zA-Z0-9]/', ' ', $text);
        }
    }
    

    Register the custom service in services.yaml.

  2. Event Listeners Trigger events after OCR (e.g., save to DB, send notifications):

    # config/services.yaml
    services:
        App\EventListener\OcrListener:
            tags:
                - { name: kernel.event_listener, event: app.ocr.completed, method: onOcrCompleted }
    
  3. Command-Line Integration Use the bundle’s underlying TesseractBridge in custom console commands for bulk OCR:

    use Symfony\Component\Console\Command\Command;
    use Symfony\Component\Console\Input\InputInterface;
    use Symfony\Component\Console\Output\OutputInterface;
    
    class OcrCommand extends Command {
        protected function execute(InputInterface $input, OutputInterface $output) {
            $tesseract = new TesseractBridge();
            $output->writeln($tesseract->ocr('image.png'));
        }
    }
    
  4. API Wrapper Create a DTO or API resource to standardize OCR responses:

    class OcrResult {
        public function __construct(
            public string $text,
            public string $language,
            public int $confidence
        ) {}
    }
    
Weaver

How can I help you explore Laravel packages today?

Conversation history is not saved when not logged in.
Prompt
Add packages to context
No packages found.
milito/query-filter
apiboxsym/user-bundle
apiboxsym/health-check-bundle
jayeshmepani/jpl-moshier-ephemeris-php
elnasnato/laraliveui
labrodev/rest-sdk
sampaui/sampaui
babelqueue/php-sdk
facebook/capi-param-builder-php
babelqueue/symfony
hamzi/corewatch
minionfactory/raw-hydrator
hexters/coinpayment
rjcodes/rjcms
act-training/laravel-permissions-manager
alimarchal/laravel-chart-of-accounts
babenkoivan/elastic-scout-driver
mkwebdesign/filament-watchdog-v5
renatomarinho/laravel-page-speed
zedmagdy/filament-business-hours