bicycle/tesseract-bridge-bundle
Install the Bundle
composer require bicycle/tesseract-bridge-bundle
Register the bundle in config/bundles.php (Symfony 5+ auto-discovers it, but verify if using older versions).
Configure Tesseract
Add to config/packages/bicycle_tesseract_bridge.yaml:
bicycle_tesseract_bridge:
path: /usr/bin/tesseract # Path to Tesseract executable
language: eng # Default language (e.g., 'eng', 'fra')
options:
- '--psm 6' # Page segmentation mode (adjust as needed)
First Use Case: OCR on an Uploaded Image
use Bicycle\TesseractBridge\TesseractBridge;
// In a controller or service:
$tesseract = new TesseractBridge();
$text = $tesseract->ocr('path/to/image.png');
return response()->json(['text' => $text]);
Verify Installation Run a test OCR on a known image (e.g., a screenshot) to confirm the setup works.
Image Processing Pipeline
// Example: Controller handling file uploads
public function upload(Request $request) {
$file = $request->file('image');
$file->store('temp');
$tesseract = new TesseractBridge();
$text = $tesseract->ocr(storage_path('app/temp/' . $file->hashName()));
// Process $text (e.g., save to DB, trigger events)
return redirect()->back()->with('success', 'OCR completed!');
}
Language-Specific OCR Override language per request:
$tesseract->setLanguage('fra'); // French
$text = $tesseract->ocr('image.png');
Batch Processing
Use Symfony’s Messenger or queues for async OCR:
// Dispatch a job
$this->dispatch(new ProcessOcrJob($imagePath));
// Job handler
public function handle() {
$tesseract = new TesseractBridge();
$text = $tesseract->ocr($this->imagePath);
// Save to DB, etc.
}
Integration with Forms
Use in a FormType for dynamic OCR:
// In a form builder
$builder->add('image', FileType::class, [
'mapped' => false,
'constraints' => [new File(['maxSize' => '1024k'])]
]);
Dependency Injection
Register TesseractBridge as a service in services.yaml:
services:
App\Services\OcrService:
arguments:
$tesseract: '@bicycle.tesseract_bridge'
File constraint to validate images before OCR.try {
$text = $tesseract->ocr($path);
} catch (\Exception $e) {
$this->addFlash('error', 'OCR failed: ' . $e->getMessage());
return back();
}
TesseractBridge in PHPUnit:
$mock = $this->createMock(TesseractBridge::class);
$mock->method('ocr')->willReturn('test text');
$this->container->set('bicycle.tesseract_bridge', $mock);
Tesseract Installation
tesseract CLI tool not found.apt-get install tesseract-ocr on Ubuntu) and update the path in config.which tesseract in your server’s terminal to verify the path.Language Packs
tesseract-ocr-fra for French) and ensure the language config matches.tesseract --list-langs.Memory Limits
--psm (page segmentation mode) to limit processing area or resize images before OCR.Symfony Cache
php bin/console cache:clear) after updating bicycle_tesseract_bridge.yaml.Permissions
www-data) has read access to the image files and execute permissions for the tesseract binary.Log Raw Output
Use Tesseract’s --debug flag (if supported) or log the raw command output:
$tesseract->setOptions(['--debug']);
$text = $tesseract->ocr($path);
// Log $tesseract->getLastCommand() for debugging.
Check Tesseract Version Ensure compatibility with your PHP version (this bundle requires PHP 7.4+). Run:
tesseract --version
Environment-Specific Configs
Use Symfony’s %kernel.environment% to load different Tesseract paths per environment (e.g., dev vs. prod).
Custom OCR Logic
Extend TesseractBridge to add pre/post-processing:
class CustomTesseractBridge extends TesseractBridge {
public function ocr($file) {
$text = parent::ocr($file);
return $this->postProcess($text);
}
protected function postProcess(string $text): string {
// Add custom logic (e.g., regex cleanup)
return preg_replace('/[^a-zA-Z0-9]/', ' ', $text);
}
}
Register the custom service in services.yaml.
Event Listeners Trigger events after OCR (e.g., save to DB, send notifications):
# config/services.yaml
services:
App\EventListener\OcrListener:
tags:
- { name: kernel.event_listener, event: app.ocr.completed, method: onOcrCompleted }
Command-Line Integration
Use the bundle’s underlying TesseractBridge in custom console commands for bulk OCR:
use Symfony\Component\Console\Command\Command;
use Symfony\Component\Console\Input\InputInterface;
use Symfony\Component\Console\Output\OutputInterface;
class OcrCommand extends Command {
protected function execute(InputInterface $input, OutputInterface $output) {
$tesseract = new TesseractBridge();
$output->writeln($tesseract->ocr('image.png'));
}
}
API Wrapper Create a DTO or API resource to standardize OCR responses:
class OcrResult {
public function __construct(
public string $text,
public string $language,
public int $confidence
) {}
}
How can I help you explore Laravel packages today?