Edge TTS PHP

Overview

Edge TTS is a powerful Text-to-Speech (TTS) package for PHP that leverages Microsoft Edge's speech synthesis capabilities. This package allows you to synthesize speech from text, manage voice options, and process audio streams with real-time callbacks through both programmatic and command-line interfaces.

Features

Text-to-Speech: Convert text into natural-sounding speech using Microsoft Edge's TTS capabilities.
Multiple Voices: Access a wide variety of voices to suit your project's needs.
Real-time Streaming: Support for audio streaming with real-time processing callbacks.
Word Boundaries Metadata: Get word boundary information with precise timestamps.
Flexible Export Options: Export synthesized audio in different formats (raw, base64, file, or PHP stream).
Command-Line Interface: Use a simple CLI for easy access to functionality.
Easy Integration: Modular structure allows for seamless inclusion in existing PHP projects.
Extended Compatibility: Compatible from PHP 7.4+ to PHP 8.1+

Requirements

PHP 7.4+ (compatible up to PHP 8.1+)
PHP Extensions: json, curl
Composer for dependency management

Installation

You can install Edge TTS via Composer. Run the following command in your terminal:

composer require afaya/edge-tts

This package is a fork of the original afaya/edge-tts package with improvements and fixes. We've downgraded the dependencies to be compatible from PHP 8.1+ to PHP 7.4+ while maintaining the original functionality .

Usage

Command-Line Interface

To synthesize speech from text:

php ./vendor/bin/edge-tts edge-tts:synthesize --text "Hello, world!"

To list available voices:

php ./vendor/bin/edge-tts edge-tts:voice-list

Integration into Your Project

<?php

require __DIR__ . '/vendor/autoload.php';

use Afaya\EdgeTTS\Service\EdgeTTS;

// Initialize the EdgeTTS service
$tts = new EdgeTTS();

// Get available voices
$voices = $tts->getVoices();
// var_dump($voices); // array -> use ShortName with the voice name

// Synthesize text with options for voice, rate, volume, and pitch
$tts->synthesize("Hello, world!", 'en-US-AriaNeural', [
    'rate'   => '0%',   // Speech rate (range: -100% to 100%)
    'volume' => '0%',   // Speech volume (range: -100% to 100%)
    'pitch'  => '0Hz'   // Voice pitch (range: -100Hz to 100Hz)
]);

// Export synthesized audio in different formats
$base64Audio = $tts->toBase64();    // Get audio as base64
$tts->toFile("output");             // Save audio to file
$rawAudio = $tts->toRaw();          // Get raw audio stream

Real-time Streaming Synthesis

For real-time audio processing:

$tts->synthesizeStream(
    "Your text here", 
    'en-US-AriaNeural', 
    ['rate' => '10%'],
    function($chunk) {
        // Process each audio chunk in real-time
        echo "Received chunk of " . strlen($chunk) . " bytes\n";
        // You can stream this directly to output, save incrementally, etc.
    }
);

Available Methods

Audio Information

// Get basic audio information
$info = $tts->getAudioInfo();
// Returns: ['size' => bytes, 'format' => 'mp3', 'estimatedDuration' => seconds]

// Get estimated duration
$duration = $tts->getDuration();

// Get size in bytes
$size = $tts->getSizeBytes();

Word Boundaries Metadata

// Get word boundaries with timestamps
$boundaries = $tts->getWordBoundaries();

// Save metadata to file
$tts->saveMetadata('metadata.json');

Export Options

After synthesizing speech, you can export the audio in various formats:

toBase64(): Returns the audio as a Base64 string
toFile($path): Saves the audio to a specified file
toRaw(): Returns the raw audio stream
toStream(): Returns a PHP stream resource

Voice Management

// Get all available voices
$voices = $tts->getVoices();

// Each voice contains information like:
// - ShortName: The voice identifier to use in synthesis
// - DisplayName: Human-readable voice name
// - LocalName: Localized voice name
// - Gender: Voice gender
// - Locale: Language/region code

Audio Configuration

Default Settings

The package uses high-quality audio settings by default:

Format: MP3 (audio-24khz-48kbitrate-mono-mp3)
Sample Rate: 24kHz
Bitrate: 48kbps
Channels: Mono

Custom Output Format

You can customize the audio output format using the outputFormat option. The package supports all formats provided by Microsoft Azure Speech Service:

use Afaya\EdgeTTS\Service\EdgeTTS;
use Afaya\EdgeTTS\Config\Constants;

$tts = new EdgeTTS();

// Use a custom output format
$tts->synthesize("Hello, world!", 'en-US-AriaNeural', [
    'outputFormat' => 'audio-48khz-192kbitrate-mono-mp3',
    'rate' => '0%',
    'pitch' => '0Hz'
]);

$tts->toFile("output"); // Will save as output.mp3

Available Output Formats

The package supports 36 different audio formats through Constants::OUTPUT_FORMAT:

Streaming Formats (MP3)

audio-16khz-32kbitrate-mono-mp3
audio-16khz-64kbitrate-mono-mp3
audio-16khz-128kbitrate-mono-mp3
audio-24khz-48kbitrate-mono-mp3 (default)
audio-24khz-96kbitrate-mono-mp3
audio-24khz-160kbitrate-mono-mp3
audio-48khz-96kbitrate-mono-mp3
audio-48khz-192kbitrate-mono-mp3

Opus Formats

audio-16khz-16bit-32kbps-mono-opus
audio-24khz-16bit-24kbps-mono-opus
audio-24khz-16bit-48kbps-mono-opus
ogg-16khz-16bit-mono-opus
ogg-24khz-16bit-mono-opus
ogg-48khz-16bit-mono-opus
webm-16khz-16bit-mono-opus
webm-24khz-16bit-24kbps-mono-opus
webm-24khz-16bit-mono-opus

Raw PCM Formats

raw-8khz-8bit-mono-alaw
raw-8khz-8bit-mono-mulaw
raw-8khz-16bit-mono-pcm
raw-16khz-16bit-mono-pcm
raw-16khz-16bit-mono-truesilk
raw-22050hz-16bit-mono-pcm
raw-24khz-16bit-mono-pcm
raw-24khz-16bit-mono-truesilk
raw-44100hz-16bit-mono-pcm
raw-48khz-16bit-mono-pcm

WAV/RIFF Formats (Non-Streaming)

riff-8khz-8bit-mono-alaw
riff-8khz-8bit-mono-mulaw
riff-8khz-16bit-mono-pcm
riff-22050hz-16bit-mono-pcm
riff-24khz-16bit-mono-pcm
riff-44100hz-16bit-mono-pcm
riff-48khz-16bit-mono-pcm

Other Formats

amr-wb-16000hz
g722-16khz-64kbps

Testing Output Formats

You can test all available formats using the included test script:

php test-formats.php

This will test all 36 formats and provide a detailed report showing which formats are compatible with the Edge TTS service.

Error Handling

The package includes comprehensive error handling:

try {
    $tts->synthesize("Hello", 'invalid-voice');
} catch (InvalidArgumentException $e) {
    echo "Invalid voice: " . $e->getMessage();
} catch (RuntimeException $e) {
    echo "Runtime error: " . $e->getMessage();
}

Testing

./vendor/bin/phpunit

Contributing

We welcome contributions! Please read our CONTRIBUTING.md for guidelines on how to contribute to this project.

License

This project is licensed under the GNU General Public License v3 (GPLv3).

Acknowledgments

We would like to extend our gratitude to the developers and contributors of the following projects for their inspiration and groundwork:

Edge Tts Laravel Package