Weave Code
Code Weaver
Helps Laravel developers discover, compare, and choose open-source packages. See popularity, security, maintainers, and scores at a glance to make better decisions.
Feedback
Share your thoughts, report bugs, or suggest improvements.
Subject
Message

Pdfparser Laravel Package

smalot/pdfparser

Standalone PHP PDF parsing library to extract text, pages, and metadata from PDFs. Supports compressed PDFs and various encodings, with configurable parsing options. Note: secured PDFs and form data extraction are not supported.

View on GitHub
Deep Wiki
Context7
v2.12.4

Bugfix release

Refining a change in the latest release:

Ignore Form as well as Image XObjects when assembling the text array for a PDFObject. by @rupertj in https://github.com/smalot/pdfparser/pull/783

When assembling the text array for an object, skip Forms that don't contain any text, instead of all Forms. by @rupertj in https://github.com/smalot/pdfparser/pull/789

Full Changelog: https://github.com/smalot/pdfparser/compare/v2.12.3...v2.12.4

v2.12.3

Security fix and refinements

Fix for potential Denial of Service vulnerability

Summary: The fix prevents the RawDataParser.php‎ to enter an endless loop under certain circumstances which would lead to memory exhaustion.

Details: When parsing a specifically crafted, malformed PDF file, the low-level RawDataParser enters a state that leads to uncontrolled memory allocation. This continues until the PHP script exhausts its memory_limit and crashes with a fatal error. An attacker can leverage this vulnerability by submitting a small, malicious PDF file to any service using this library, causing the server process to crash and become unavailable.

Thank you Yang LUO (https://github.com/N0zoM1z0) for reporting this and the provided details on the matter. https://github.com/smalot/pdfparser/pull/787 contains further information.

Refinement to improve extracted texts

Ignore Form as well as Image XObjects when assembling the text array for a PDFObject. by @rupertj in https://github.com/smalot/pdfparser/pull/783


Full Changelog: https://github.com/smalot/pdfparser/compare/v2.12.2...v2.12.3

v2.12.2

What's Changed

Full Changelog: https://github.com/smalot/pdfparser/compare/v2.12.1...v2.12.2

v2.12.1

What's Changed

Internal

New Contributors

Full Changelog: https://github.com/smalot/pdfparser/compare/v2.12.0...v2.12.1

v2.12.0

What's Changed

Internal changes

New Contributors

Full Changelog: https://github.com/smalot/pdfparser/compare/v2.11.0...v2.12.0

v2.11.0

What's Changed

Internal changes

New Contributors

Full Changelog: https://github.com/smalot/pdfparser/compare/v.2.10.0...v2.11.0

v2.10.0

What's Changed

Internal changes

Full Changelog: https://github.com/smalot/pdfparser/compare/v2.9.0...v.2.10.0

v.2.10.0

Replaced by v2.10.0

v2.9.0

What's Changed

New Contributors

Full Changelog: https://github.com/smalot/pdfparser/compare/v2.8.0...v2.9.0

v2.8.0

:exclamation: This release contains a lot of changes in comparison to v2.7.0. We decided to have at least one release candidate before the next production-ready release.

Pull request #634 (Major Update to PDFObject.php + Ancillary) by @GreyWyvern fixes almost 20 issues, brings better parsing and more understandable code. If you wanna find out what exactly changed, have a look.

What's Changed

New Contributors

Full Changelog: https://github.com/smalot/pdfparser/compare/v2.7.0...v2.8.0

v2.8.0-RC2

What's Changed

New Contributors

Full Changelog: https://github.com/smalot/pdfparser/compare/v2.8.0-RC1...v2.8.0-RC2

v2.8.0-RC1

Release Candidate - Please read before using it

:exclamation: This release contains a lot of changes in comparison to v2.7.0. We decided to have at least one release candidate before the next production-ready release.

Pull request #634 (Major Update to PDFObject.php + Ancillary) by @GreyWyvern fixes almost 20 issues, brings better parsing and more understandable code. If you wanna find out what exactly changed, have a look.

If you find any bugs, please let us know in https://github.com/smalot/pdfparser/issues/650 or open a new issue.

Further changes

New Contributors

Full Changelog: https://github.com/smalot/pdfparser/compare/v2.7.0...v2.8.0-RC1

v2.7.0

What's Changed

New Contributors

Full Changelog: https://github.com/smalot/pdfparser/compare/v.2.6.0...v2.7.0

v.2.6.0

What's Changed

New Contributors

Full Changelog: https://github.com/smalot/pdfparser/compare/v2.5.0...v.2.6.0

v2.5.0

What's Changed

New Contributors

Full Changelog: https://github.com/smalot/pdfparser/compare/v2.4.0...v2.5.0

v2.4.0

What's Changed

New Contributors

Full Changelog: https://github.com/smalot/pdfparser/compare/v2.3.0...v2.4.0

v2.3.0

What's Changed

New Contributors

Full Changelog: https://github.com/smalot/pdfparser/compare/v2.2.2...v2.3.0

v2.2.2

What's Changed

Full Changelog: https://github.com/smalot/pdfparser/compare/v2.2.1...v2.2.2

v2.2.1

What's Changed

New Contributors

Full Changelog: https://github.com/smalot/pdfparser/compare/v2.2.0...v2.2.1

v2.2.0

What's Changed

New Contributors

Full Changelog: https://github.com/smalot/pdfparser/compare/v2.1.0...v2.2.0

v2.1.0

What's Changed

Full Changelog: https://github.com/smalot/pdfparser/compare/v2.0.1...v2.1.0

v2.0.1

Bugfix release

For PHP 7 users: In 2.0.0 we used a function which is PHP 8 only. It was fixed in #486.

Full Changelog: https://github.com/smalot/pdfparser/compare/v2.0.0...v2.0.1

v2.0.0

Breaking Changes

❗All function parameters as well as return types of functions are typed now. That means, if you are using values which do not fit, you may receive Type errors. Most of it was done internally and you should not get bothered. In case you use internal functions, please check your code before go into production.

We initially decided to release 1.2.0 but finally jumped to 2.0.0 to include BC on a major release instead (see https://github.com/smalot/pdfparser/issues/480)

Highlights

  • massive code refactoring (thanks to @jee7, #440)
  • workaround to enable FPDFs (thanks to @izabala, #453)
  • Added cache for Documents object cache dictionary, which also results in better performance in some cases (thanks to @jee7, #434)
  • prevent endless loops during Page->getText() in some cases (thanks to @Nickmanbear, #457)
  • Fixes invalid return type on unknown glyphs (thanks to @PrinsFrank, #459)
  • Fix TypeError on Document::getFirstFont when no fonts are available (thanks to @PrinsFrank, #461)
  • Fix TypeError on default font when no fonts available (#466, thanks for @PrinsFrank)
  • Fix for extractRawData, extractDecodedRawData, getDataTm and getDataXY do not work with a Pdf file produced by FPDI/FPDF (#454, thanks to @izabala)
  • Test backend was improved by @j0k3r (#460)
v1.2.0-RC2

Not production ready - We reworked our code base and added typed parameters as well as return values. If you find anything, please drop us a comment. Further information can be found https://github.com/smalot/pdfparser/issues/468. Thank you in advance!❗

Changes since v1.2.0-RC1

  • Fix TypeError on default font when no fonts available (#466, thanks for @PrinsFrank)
  • Fix for extractRawData, extractDecodedRawData, getDataTm and getDataXY do not work with a Pdf file produced by FPDI/FPDF (#454, thanks to @izabala)

Further information about changes and fixes in 1.2.0 can be found here: https://github.com/smalot/pdfparser/releases/tag/v1.2.0-RC1

v1.2.0-RC1

Bug fix and performance release

Not production ready - We reworked our code base and added typed parameters as well as return values. If you find anything, please drop us a comment. Further information can be found https://github.com/smalot/pdfparser/issues/468. Thank you in advance!❗

Highlights:

  • massive code refactoring (thanks to @jee7, #440)
  • workaround to enable FPDFs (thanks to @izabala, #453)
  • Added cache for Documents object cache dictionary, which also results in better performance in some cases (thanks to @jee7, #434)
  • prevent endless loops during Page->getText() in some cases (thanks to @Nickmanbear, #457)
  • Fixes invalid return type on unknown glyphs (thanks to @PrinsFrank, #459)
  • Fix TypeError on Document::getFirstFont when no fonts are available (thanks to @PrinsFrank, #461)

@j0k3r improved our test backend.

v1.1.0

Maintenance and small performance boost

PDFs with images can be parsed with less resource consumption (like memory) from now on. @Connum added a feature with #441 to ignore image data. It must be enabled manually though. You can do it easily:

use Smalot\PdfParser\Config;
use Smalot\PdfParser\Parser;

$config = new Config();
$config->setRetainImageContent(false);
$parser = new Parser([], $config);
// $parser->parseFile (...)

Besides that, we fixed a problem with Scrutinizer (part of our test infrastructure).

v1.0.2

Bugfix release

  • Don't throw an exception if there is no base encoding defined (as of PDF 1.5 Reference Table 5.11) - #433, thanks @LucianoHanna
v1.0.1

Bugfix release

  • Fixed decode octal regex (#421, thanks @gdiasb12)
  • Fixed remaining places which use Config class and threw exceptions (#420, #424, thanks @TivoSoho)
v1.0.0

Highlights

  • Removed support for PHP 5.6 and 7.0, requires at least PHP 7.1 or newer❗
  • extended Config.php with white space characters: it allows developers to override regex for white space recognition (#411, thanks @LucianoHanna)
  • Fixed some test-infrastructure related issues (#412, #413, #414)
v0.19.0

Bugfix and feature release

Features:

  • Add support for PDF 1.5 Xref stream (#400, thanks @smalot)
  • Add support for Reversed Chars instruction in BMC blocs (#402, thanks @smalot)

Fixes:

  • Encoding::__toString complies with PHP specification from now on (#407, thanks @igor-krein and others from #85)
  • fix Call to a member function getFontSpaceLimit() on null (#406, thanks @xfolder)
  • Consider all PDF white-space characters in object header (#405, thanks @LucianoHanna)
v0.18.2

Maintenance release

  • Bugfix for #391 (Uncaught Error: Call to undefined method Smalot\PdfParser\Header::__toString() in /var/www/vendor/smalot/pdfparser/src/Smalot/PdfParser/Font.php) (thanks @fsmoak)
  • Addition of an alternative autoloader for non-Composer installations (#388). Based on the work of @apmuthu and others from #117.
Weaver

How can I help you explore Laravel packages today?

Conversation history is not saved when not logged in.
Prompt
Add packages to context
No packages found.
davejamesmiller/laravel-breadcrumbs
artisanry/parsedown
christhompsontldr/phpsdk
enqueue/dsn
bunny/bunny
enqueue/test
enqueue/null
enqueue/amqp-tools
milesj/emojibase
bower-asset/punycode
bower-asset/inputmask
bower-asset/jquery
bower-asset/yii2-pjax
laravel/nova
spatie/laravel-mailcoach
spatie/laravel-superseeder
laravel/liferaft
nst/json-test-suite
danielmiessler/sec-lists
jackalope/jackalope-transport