yethee/tiktoken
PHP port of OpenAI’s tiktoken tokenizer. Get encoders by model name, encode text to token IDs, and cache vocab files for speed. Optional experimental Rust/FFI “lib mode” for faster encoding of medium/large texts.
EncoderProvider, enabling consistent usage across controllers, jobs, and services.TokenLimitMiddleware).composer require with minimal boilerplate (EncoderProvider instantiation).getForModel().AppServiceProvider for global access:
$app->singleton(EncoderProvider::class, fn() => new EncoderProvider());
EncoderProvider into controllers/jobs).setVocabCache().| Risk Area | Severity | Mitigation Strategy |
|---|---|---|
| Performance Bottleneck | Medium | Benchmark with composer bench; use LibEncoder (Rust FFI) for high-throughput needs (but requires Rust setup). |
| Cache Corruption | Low | Fixed in v1.1.1 (race condition patch); use filesystem or Redis for reliability. |
| Model Support Gaps | Low | No GPT-2 or special tokens; track OpenAI’s model releases for updates. |
| LibEncoder Stability | High | Experimental; avoid unless benchmarked for your workload (e.g., >10k tokens/sec). |
| BC Breaks | Low | Minor (e.g., cache dir can’t be null); versioned releases with changelog. |
| Vendor Lock-in | None | MIT license; no external APIs; can fork if needed. |
Tokenization Volume:
LibEncoder (Rust FFI) or offload to a microservice.Model Requirements:
<|endofprompt|>)? If yes, consider Python tiktoken or a custom solution.Caching Strategy:
$encProvider->setVocabCache(storage_path('framework/cache'));
Cache::extend('tiktoken', function () {
return Cache::repository(new RedisCache());
});
Error Handling:
Encoder interface or wrap calls in a try-catch:
try {
$tokens = $encoder->encode($text);
} catch (Exception $e) {
Log::error("Tokenization failed: {$e->getMessage()}");
throw new \RuntimeException("Invalid input for tokenization.");
}
Testing:
use Yethee\Tiktoken\EncoderProvider;
Validator::extend('max_tokens', function ($attribute, $value, $parameters, $validator) {
$encoder = app(EncoderProvider::class)->getForModel('gpt-4');
$tokens = $encoder->encode($value);
return count($tokens) <= $parameters[0];
});
Future-Proofing:
composer update.LibEncoder).Cost Optimization:
$tokenCount = count($tokens);
Log::channel('ai_metrics')->info('Tokens used', ['model' => 'gpt-4', 'count' => $tokenCount]);
TokenLimitMiddleware).Evaluation Phase (1–2 days):
gpt-3.5-turbo):
composer require yethee/tiktoken
composer bench and compare against OpenAI’s SDK.Pilot Integration (3–5 days):
EncoderProvider as a singleton in AppServiceProvider:
public function register()
{
$this->app->singleton(EncoderProvider::class, fn() => new EncoderProvider());
}
public function sendChatRequest(Request $request)
{
$encoder = app(EncoderProvider::class)->getForModel('gpt-4');
$tokens = $encoder->encode($request->input('prompt'));
if (count($tokens) > 8000) {
throw new \RuntimeException("Prompt exceeds 8k token limit.");
}
// Proceed with API call...
}
Full Rollout (1 week):
event(new TokensUsed($tokens, 'gpt-4', $request->user()));
Optimization (Ongoing):
LibEncoder (Rust FFI) and build the native library:
git clone https://github.com/yethee/tiktoken-php.git
cd tiktoken-php
cargo build --release
Then configure the path:
Yethee\Tiktoken\Encoder\LibEncoder::init(__DIR__.'/path/to/libtiktoken_php.so');
$encProvider = new EncoderProvider(true); // Force lib mode
TIKTOKEN_CACHE_DIR or switch to Redis if needed.| Component | Compatibility Notes |
|---|---|
| Laravel | Works with Laravel 8+ (tested with latest stable). |
| PHP | Requires PHP 8.0+ (check composer.json for exact version). |
| OpenAI SDK | Complements existing SDK usage (e.g., validate tokens before createChatCompletion). |
| Redis | Supports Redis cache backend via Laravel’s cache drivers. |
| Queues | Thread-safe for Laravel queues/jobs (cache is process-local |
How can I help you explore Laravel packages today?