- How do I install yethee/tiktoken in a Laravel project?
- Run `composer require yethee/tiktoken` in your project root. No additional setup is needed for basic usage. The package supports PHP 8.1+ and integrates cleanly with Laravel’s Composer dependency system.
- Which GPT models are supported out of the box?
- The package supports all OpenAI models using the standard tiktoken vocabularies, including GPT-3.5 (e.g., `gpt-3.5-turbo-0301`), GPT-4, GPT-5, and embeddings. Check the [changelog](https://github.com/yethee/tiktoken-php/blob/master/CHANGELOG.md) for the latest supported models.
- Can I use this package to validate token counts before calling OpenAI’s API?
- Yes. Use `$encoder->encode()` to pre-tokenize user input and compare against your model’s token limit (e.g., 8,192 for GPT-4). Integrate this logic in Laravel middleware or a service layer to reject oversized prompts early.
- How does the optional FFI/LibEncoder mode improve performance?
- The experimental `LibEncoder` uses Rust’s `tiktoken-rs` via FFI for ~2x faster encoding on large texts (e.g., >10,000 tokens). However, it requires building native libraries and adds setup complexity. Benchmark it for your workload—it may not help for small texts due to marshalling overhead.
- Where should I store the vocabulary cache for production?
- Use Laravel’s cache system (e.g., Redis) instead of the filesystem for `TIKTOKEN_CACHE_DIR`. Set the path via `EncoderProvider::setVocabCache()` or the `TIKTOKEN_CACHE_DIR` environment variable. This avoids I/O bottlenecks in high-traffic apps.
- How do I integrate tokenization into Laravel queues for batch processing?
- Dispatch a job with the text payload, then use `$encoder->encode()` inside the job’s `handle()` method. For example: `TokenizeTextJob::dispatch($userInput)->onQueue('tokenization');`. This offloads heavy tokenization from API requests.
- Are there any known issues with token counts matching OpenAI’s SDK?
- The package replicates OpenAI’s tokenization logic, but edge cases (e.g., rare Unicode characters) may diverge. Validate counts against OpenAI’s Python SDK in unit tests. Report discrepancies to the [GitHub issues](https://github.com/yethee/tiktoken-php/issues).
- Can I use this package with GPT-2 or custom models requiring special tokens?
- No. This package only supports models using standard tiktoken vocabularies (e.g., `p50k_base`). GPT-2 and models with custom tokens (e.g., `<|startoftext|>`) are not compatible. Check the [supported models list](https://github.com/openai/tiktoken#supported-models) for details.
- How do I configure the package to use Redis for vocabulary caching?
- Set `TIKTOKEN_CACHE_DIR` to a Redis-backed path (e.g., `redis://127.0.0.1:6379/0`). Alternatively, wrap the cache in Laravel’s `Cache` facade: `Cache::put('tiktoken/vocab', $vocabData, now()->addHours(1));` and override `EncoderProvider::getVocabCache()`.
- What are the tradeoffs of using LibEncoder in production?
- LibEncoder offers speed but requires Rust toolchain setup and native library builds. Avoid it unless profiling shows a bottleneck (>10k tokens/sec). Test thoroughly—unstable builds or misconfigured `LD_LIBRARY_PATH` can crash PHP. Use the native encoder for simplicity unless performance demands it.