Weave Code
Code Weaver
Helps Laravel developers discover, compare, and choose open-source packages. See popularity, security, maintainers, and scores at a glance to make better decisions.
Feedback
Share your thoughts, report bugs, or suggest improvements.
Subject
Message

Tiktoken Laravel Package

yethee/tiktoken

PHP port of OpenAI’s tiktoken tokenizer. Get encoders by model name, encode text to token IDs, and cache vocab files for speed. Optional experimental Rust/FFI “lib mode” for faster encoding of medium/large texts.

View on GitHub
Deep Wiki
Context7

tiktoken-php

Packagist Version Build status Code Coverage License

This is a port of the tiktoken.

Installation

$ composer require yethee/tiktoken

Usage


use Yethee\Tiktoken\EncoderProvider;

$provider = new EncoderProvider();

$encoder = $provider->getForModel('gpt-3.5-turbo-0301');
$tokens = $encoder->encode('Hello world!');
print_r($tokens);
// OUT: [9906, 1917, 0]

$encoder = $provider->get('p50k_base');
$tokens = $encoder->encode('Hello world!');
print_r($tokens);
// OUT: [15496, 995, 0]

Cache

The encoder uses an external vocabularies, so caching is used by default to avoid performance issues.

By default, the directory for temporary files is used. You can override the directory for cache via environment variable TIKTOKEN_CACHE_DIR or use EncoderProvider::setVocabCache():

use Yethee\Tiktoken\EncoderProvider;

$encProvider = new EncoderProvider();
$encProvider->setVocabCache('/path/to/cache');

// Using the provider

Lib mode

Experimental

You can use tiktoken-rs library via FFI binding. This can improve performance when need to encode medium or large texts. However, the overhead of data marshalling can lead to poor performance for small texts.

use Yethee\Tiktoken\Encoder\LibEncoder;
use Yethee\Tiktoken\EncoderProvider;

// LibEncoder::init('/path/to/lib');

$encProvider = new EncoderProvider(true); // Force using the lib encoder

You need to provide path to the lib before using the provider. There are several ways to do this:

  • Use Yethee\Tiktoken\Encoder\LibEncoder::init() method.
  • Use Yethee\Tiktoken\Encoder\LibEncoder::preload() method, inside opcache preload script.
  • Use environment variable TIKTOKEN_LIB_PATH or LD_LIBRARY_PATH

Build lib

Requirements

git clone git@github.com:yethee/tiktoken-php.git
cd tiktoken-php
cargo build --release

Copy binary from target/release:

  • libtiktoken_php.so for linux
  • libtiktoken_php.dylib for MacOS
  • tiktoken_php.dll for Windows

NOTE: You can see .docker/Dockefile for an example.

Benchmark

You can see benchmark result in #27 or run it locally:

composer bench

TODO

  • Add implementation for Yethee\Tiktoken\Encoder\LibEncoder::encodeInChunks() method

Limitations

  • Encoding for GPT-2 is not supported.
  • Special tokens (like <|endofprompt|>) are not supported.

License

MIT

Weaver

How can I help you explore Laravel packages today?

Conversation history is not saved when not logged in.
Prompt
Add packages to context
No packages found.
hamzi/corewatch
minionfactory/raw-hydrator
hexters/coinpayment
rjcodes/rjcms
act-training/laravel-permissions-manager
alimarchal/laravel-chart-of-accounts
babenkoivan/elastic-scout-driver
mkwebdesign/filament-watchdog-v5
renatomarinho/laravel-page-speed
zedmagdy/filament-business-hours
renatovdemoura/blade-elements-ui
devgeek/beacon-admin
benjamin-rqt/data-watcher-bundle
atriumphp/atrium
sandermuller/package-boost-laravel
sandermuller/boost-skills
redaxo/core
yusufgenc/filament-api-forge
l3aro/rating-star-for-filament
leek/filament-subtenant-scope