symfony/ai-click-house-store
ClickHouse vector store integration for Symfony AI Store. Store and query embeddings in ClickHouse using distance functions and ANN/vector indexes for fast similarity search. Links to ClickHouse docs plus Symfony AI contributing and issue tracker.
Install the Package:
composer require symfony/ai-click-house-store
Configure ClickHouse Connection:
Add to config/services.php or config/clickhouse.php:
'clickhouse' => [
'dsn' => 'http://clickhouse:8123/ai_vectors?database=default&index=1',
// OR for native driver:
// 'dsn' => 'clickhouse://user:password@clickhouse:9000/ai_vectors',
],
Define a ClickHouse Table:
Run this SQL in ClickHouse (adjust embedding size to your vector dimensionality, e.g., 768 for embeddings):
CREATE TABLE vectors (
id UInt64,
embedding Array(Float32), -- Must match your vector size (e.g., 768)
metadata String,
INDEX ann_index embedding TYPE ann(768) GRANULARITY=3 GRAPH_SIZE=1000
) ENGINE = MergeTree() ORDER BY id;
First Usage:
use Symfony\Component\AI\Store\ClickHouseStore;
use Symfony\Component\AI\Store\Query;
$store = new ClickHouseStore('http://clickhouse:8123/ai_vectors');
$vector = [0.1, 0.2, ..., 0.768]; // Your 768-dim vector
$store->add('doc1', $vector, ['source' => 'user_upload']);
// Query with filtering
$query = new Query($vector, 5); // Find 5 nearest neighbors
$query->filter(['source' => 'user_upload']);
$results = $store->find($query);
// In a Laravel controller or service
public function searchDocuments(string $queryText, int $limit = 3) {
// 1. Generate embedding (e.g., using Symfony AI's embedder)
$embedding = $this->embeddingService->embed($queryText);
// 2. Query ClickHouse store
$query = new Query($embedding, $limit);
$query->filter(['type' => 'document']); // Optional metadata filter
$results = $this->store->find($query);
// 3. Return formatted results
return array_map(fn ($item) => [
'id' => $item->id,
'distance' => $item->distance,
'metadata' => $item->metadata,
], $results);
}
// Laravel Artisan Command for bulk import
use Symfony\Component\AI\Store\ClickHouseStore;
class ImportVectorsCommand extends Command {
protected $signature = 'ai:import-vectors {file}';
protected $description = 'Import vectors from JSON to ClickHouse';
public function handle() {
$store = new ClickHouseStore('http://clickhouse:8123/ai_vectors');
$data = json_decode(file_get_contents($this->argument('file')), true);
foreach ($data as $item) {
$store->add($item['id'], $item['embedding'], $item['metadata'] ?? []);
}
$this->info('Import completed!');
}
}
// Combine vector similarity with SQL filtering
public function hybridSearch(array $vector, array $filters, int $limit = 5) {
$query = new Query($vector, $limit);
$query->filter($filters); // e.g., ['category' => 'tech', 'date' => '>2023-01-01']
$results = $this->store->find($query);
return $results;
}
Service Provider Binding:
// app/Providers/AppServiceProvider.php
public function register() {
$this->app->bind(\Symfony\Component\AI\Store\StoreInterface::class, function ($app) {
return new ClickHouseStore(
config('clickhouse.dsn'),
config('clickhouse.options', [])
);
});
}
Event-Driven Updates:
// Listen to model events and update vectors
use Illuminate\Database\Eloquent\Model;
Model::observe(VectorObserver::class);
class VectorObserver {
public function saved(Model $model) {
if ($model->isDirty('content')) {
$embedding = $this->generateEmbedding($model->content);
$this->store->add($model->id, $embedding, $model->metadata());
}
}
}
Caching Layer:
// Cache results for frequent queries
public function findCached(Query $query, int $ttl = 3600) {
$cacheKey = 'ai:query:' . md5(serialize($query));
return cache()->remember($cacheKey, $ttl, function () use ($query) {
return $this->store->find($query);
});
}
Vector Dimensionality Mismatch:
ClickHouseException: Vector size mismatch (expected 768, got 384).embedding column in ClickHouse matches the dimensionality of all inserted vectors. Use Array(Float32) with the correct size (e.g., Array(Float32, 768)).ANN Index Not Used:
SELECT name, type FROM system.indexes WHERE table = 'vectors' AND name = 'ann_index';
ALTER TABLE vectors DROP INDEX ann_index;
ALTER TABLE vectors ADD INDEX ann_index embedding TYPE ann(768) GRANULARITY=3 GRAPH_SIZE=1000;
Metadata Filtering Issues:
ClickHouseException: Unknown column 'metadata.category' in 'where clause'.CREATE TABLE vectors (
id UInt64,
embedding Array(Float32),
metadata JSON, -- Store metadata as JSON for flexible querying
INDEX ann_index embedding TYPE ann(768)
) ENGINE = MergeTree();
$query->filter(['metadata.category' => 'tech']);
Connection Timeouts:
Connection refused or Network timeout.$store = new ClickHouseStore('http://clickhouse:8123/ai_vectors', [
'timeout' => 10.0,
'retry' => 3,
]);
Enable ClickHouse Query Logging:
Add to config/clickhouse.php:
'options' => [
'logger' => function ($query, $params) {
\Log::debug("ClickHouse Query: {$query}", $params);
},
],
Profile Slow Queries:
Use ClickHouse’s EXPLAIN to analyze query plans:
EXPLAIN SELECT * FROM vectors ORDER BY vector_distance(embedding, [0.1, 0.2, ...]) LIMIT 5;
Monitor ANN Index Performance: Check index usage metrics:
SELECT * FROM system.asynchronous_metrics WHERE name LIKE '%ann%';
Custom Distance Functions: Override the default distance function in your store class:
use Symfony\Component\AI\Store\ClickHouseStore;
class CustomClickHouseStore extends ClickHouseStore {
protected function getDistanceFunction(): string {
return 'cosineDistance'; // Use cosine instead of L2
}
}
Dynamic Indexing: Implement runtime index selection based on query patterns:
public function find(Query $query) {
if ($query->limit > 100) {
// Use a coarser index for large batches
$this->useIndex('coarse_ann_index');
}
return parent::find($query);
}
Batch Operations: Extend for bulk operations (not natively supported):
public function batchAdd(array $items) {
$sql = 'INSERT INTO vectors (id, embedding, metadata) VALUES ';
$values = [];
foreach ($items as $item) {
$values[] = "('{$item['id
How can I help you explore Laravel packages today?