Weave Code
Code Weaver
Helps Laravel developers discover, compare, and choose open-source packages. See popularity, security, maintainers, and scores at a glance to make better decisions.
Feedback
Share your thoughts, report bugs, or suggest improvements.
Subject
Message

Lexer Laravel Package

doctrine/lexer

Doctrine Lexer is a lightweight base library for building lexers used in top-down, recursive descent parsers. It powers tokenization in Doctrine projects like Annotations and ORM (DQL), providing a reusable foundation for custom language parsing.

View on GitHub
Deep Wiki
Context7

Getting Started

The Doctrine Lexer package provides a base AbstractLexer class to build custom tokenizers for parsing domain-specific languages (DSLs) — especially common in annotation systems (e.g., Doctrine Annotations), query languages (e.g., DQL), or configuration formats.

First step: Create a subclass of Doctrine\Common\Lexer\AbstractLexer and implement:

  • protected function getType(&string $value): int|string|null — returns the token type (e.g., integer constant or enum)
  • protected function getMatch(): string — regex pattern used to match tokens (e.g., '[a-zA-Z_][a-zA-Z0-9_]*' for identifiers)

Then instantiate your lexer with input text and call moveNext() and getCurrent() to iterate tokens. Example:

use Doctrine\Common\Lexer\AbstractLexer;

class MyLexer extends AbstractLexer
{
    public const TOKEN_WORD = 1;
    public const TOKEN_NUMBER = 2;

    protected function getMatch(): string
    {
        return '[a-zA-Z_\x80-\xff][a-zA-Z0-9_\x80-\xff]*|-[0-9]+|[0-9]+|\.';
    }

    protected function getType(&string $value): int|string|null
    {
        if (is_numeric($value)) {
            return is_int($value) || ctype_digit($value) ? self::TOKEN_NUMBER : self::TOKEN_WORD;
        }
        return self::TOKEN_WORD;
    }
}

$lexer = new MyLexer('SELECT 42 FROM users');
while ($lexer->moveNext()) {
    echo $lexer->getToken() . ' → ' . $lexer->peek() . "\n";
}

Check AbstractLexer source (src/AbstractLexer.php) for available helper methods: peek(), reset(), getPrevious(), and token offset tracking.

Implementation Patterns

1. Integration with Recursive Descent Parsing

Use the lexer as the token producer feeding into your parser methods. Store current token state in a property for lookahead/rollback logic. Typical workflow:

class MyParser
{
    private MyLexer $lexer;

    public function parse(string $input): void
    {
        $this->lexer = new MyLexer($input);
        $this->lexer->moveNext();
        $this->statement();
    }

    private function statement(): void
    {
        if ($this->lexer->peek() === 'SELECT') {
            $this->match(MyLexer::TOKEN_WORD); // or your const
            // ... consume more tokens
        }
    }

    private function match(int|string $type): void
    {
        if ($this->lexer->getToken() !== $type) {
            throw new ParseError('Unexpected token');
        }
        $this->lexer->moveNext();
    }
}

2. Type-Safe Token Enums (v2.0+)

Leverage PHP enums for robust token definitions (requires v2.0+):

enum TokenType: string
{
    case IDENTIFIER = 'identifier';
    case INTEGER = 'integer';
    case WHITESPACE = 'whitespace';
}

class MyLexer extends AbstractLexer
{
    protected function getType(&string $value): ?TokenType
    {
        // ... return enum case or null for skipped tokens
    }
}

This enables strict typing and IDE autocomplete, reducing runtime errors.

3. Extensibility via Properties & State

Subclass to maintain state (e.g., context-aware lexing):

class AnnotationLexer extends AbstractLexer
{
    private bool $inAnnotation = false;

    protected function getType(&string $value): int|string|null
    {
        if ($value === '@') {
            $this->inAnnotation = true;
            return '@';
        }
        if ($this->inAnnotation && $value === ')') {
            $this->inAnnotation = false;
        }
        return $this->inAnnotation ? 'ANNOTATION_TOKEN' : 'DEFAULT';
    }
}

Gotchas and Tips

⚠️ Backward Compatibility Notes

  • v3.0+ drops PHP < 8.1 support and removes legacy BC layers. Ensure runtime compatibility before upgrading from v1.x/v2.x.
  • In v2.0+, token types must be either int|string|null or enum cases — arrays/objects no longer supported.

⚠️ Whitespace & Skipped Tokens

By default, whitespace is not skipped. Your lexer must explicitly ignore tokens (e.g., by returning null in getType() or omitting them from getMatch()). To mimic built-in behavior:

protected function getMatch(): string
{
    return '\s+|[^\s]+'; // Match whitespace separately, then skip it
}

protected function getType(&string $value): int|string|null
{
    if (trim($value) === '') return null; // Skip whitespace
    // ... other cases
}

🛠️ Debugging Tips

  • Use $lexer->getTokens() to dump full token list for testing.
  • Override AbstractLexer::getError() for custom error context (includes line/column via internal index tracking).
  • For multibyte support (v1.2+), ensure your regex and preg_* handling respect UTF-8 — use the u modifier and \p{L}/\p{N} if needed.

🛠️ Performance Optimization

  • Cache expensive regex patterns in a static property if reused across many instances.
  • Avoid modifying $value by reference unnecessarily; it’s passed &string for performance but shouldn’t be altered beyond token recognition.

🔌 Extension Points

  • Override AbstractLexer::getString() for custom token string extraction (e.g., raw unescaped input).
  • Extend AbstractLexer::get Token() (final in v3) via overriding internal state before parsing begins.
Weaver

How can I help you explore Laravel packages today?

Conversation history is not saved when not logged in.
Prompt
Add packages to context
No packages found.
davejamesmiller/laravel-breadcrumbs
artisanry/parsedown
christhompsontldr/phpsdk
enqueue/dsn
bunny/bunny
enqueue/test
enqueue/null
enqueue/amqp-tools
milesj/emojibase
bower-asset/punycode
bower-asset/inputmask
bower-asset/jquery
bower-asset/yii2-pjax
laravel/nova
spatie/laravel-mailcoach
spatie/laravel-superseeder
laravel/liferaft
nst/json-test-suite
danielmiessler/sec-lists
jackalope/jackalope-transport