API Documentation

📡 API Reference

Authentication

All API requests require Bearer Token authentication. The token contains user identity, permissions, and quota information.

Authorization: Bearer YOUR_JWT_TOKEN

POST/api/tts

Standard TTS API

Suitable for short text synthesis, returns complete audio file at once

Request Parameters

text (string, required): Text to convert

voice (string, optional): Voice ID, default 'x5_lingfeiyi_flow'

speed (number, optional): Speech speed 0-100, default 50

volume (number, optional): Volume 0-100, default 50

pitch (number, optional): Pitch 0-100, default 50

streamMode (boolean, optional): Enable streaming, default false

Response Format

Success: Returns audio/mpeg format audio data

Error: Returns error message JSON

Response Headers:

Content-Type: audio/mpeg

X-User-ID: User ID

X-Characters-Used: Characters consumed

Request Example

// JavaScript Example
const response = await fetch('/api/tts', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': 'Bearer YOUR_JWT_TOKEN'
  },
  body: JSON.stringify({
    text: 'Hello, this is a test text',
    voice: 'x5_lingfeiyi_flow',
    speed: 60,
    volume: 80,
    pitch: 50,
    streamMode: false
  })
});

if (response.ok) {
  const audioData = await response.arrayBuffer();
  const audioBlob = new Blob([audioData], { type: 'audio/mpeg' });
  const audioUrl = URL.createObjectURL(audioBlob);
  
  // Play audio
  const audio = new Audio(audioUrl);
  audio.play();
} else {
  const error = await response.json();
  console.error('Synthesis failed:', error);
}

cURL Example

curl -X POST https://your-domain.com/api/tts \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -d '{
    "text": "Hello, this is a test text",
    "voice": "x5_lingfeiyi_flow",
    "speed": 60,
    "volume": 80,
    "pitch": 50
  }' \
  --output audio.mp3

GET/api/tts-streamwebsocket

Streaming TTS API

Real-time streaming via WebSocket. Billing is simplified to one request per call, deducted upon getting the WebSocket URL.

Connection Flow

Step 1: GET - Get direct WebSocket URL. Usage is billed now.

Step 2: Connect directly to WebSocket and start streaming.

API Operations

GET: Get WebSocket URL & Finalize billing.

Billing: Count-based system. One request deducted per API call.

Plan Limits: Free: 100/day, Starter: 1000/day, Pro: 10000/day.

GET Request Parameters

text (string, required): Text to convert

voice (string, optional): Voice ID, default 'x5_lingfeiyi_flow'

speed (number, optional): Speech speed 0-100, default 50

volume (number, optional): Volume 0-100, default 50

pitch (number, optional): Pitch 0-100, default 50

Authorization (header, required): Bearer JWT token

Simplified Example

// Step 1: Get WebSocket URL (Usage billed on this request)
const getStreamUrl = async () => {
  const text = 'Hello, this is a streaming test with a direct connection.';
  const voice = 'x5_lingfeiyi_flow';
  const speed = 50;
  const volume = 80;
  const pitch = 50;

  const response = await fetch('/api/tts-stream?' + new URLSearchParams({
    text,
    voice,
    speed: String(speed),
    volume: String(volume),
    pitch: String(pitch),
  }), {
    headers: {
      'Authorization': 'Bearer YOUR_JWT_TOKEN'
    }
  });

  if (response.ok) {
    const data = await response.json();
    const wsUrl = data.wsUrl;
    
    // Step 2: Connect to WebSocket and send synthesis parameters
    const socket = new WebSocket(wsUrl);
    
    socket.onopen = () => {
      console.log('WebSocket connection established.');
      // Based on official documentation, construct and send the request JSON
      const requestData = {
        header: {
          app_id: "73d88ee4", // Use this fixed app_id for this service
          status: 2, // 2 indicates this is the final and only chunk of text
        },
        parameter: {
          tts: {
            vcn: voice,
            speed: speed,
            volume: volume,
            pitch: pitch,
            audio: {
              encoding: "lame",
              sample_rate: 24000,
              channels: 1,
              bit_depth: 16,
              frame_size: 0
            }
          }
        },
        payload: {
          text: {
            encoding: "utf8",
            compress: "raw",
            format: "plain",
            status: 2, // 2 indicates the final text chunk for this request
            // Text must be base64 encoded
            text: btoa(unescape(encodeURIComponent(text)))
          }
        }
      };
      
      socket.send(JSON.stringify(requestData));
    };

    socket.onmessage = (event) => {
      // The server will send back audio data, likely in a JSON object.
      // The specific format depends on the service implementation.
      console.log('Received data:', event.data);
      // Example of handling a JSON response with base64 audio:
      // const response = JSON.parse(event.data);
      // if (response.type === 'audio' && response.data) {
      //   const audioChunk = atob(response.data);
      //   // Your logic to play or buffer the audio chunk
      // }
    };

    socket.onclose = (event) => {
      console.log('WebSocket connection closed.', event.code, event.reason);
    };

    socket.onerror = (error) => {
      console.error('WebSocket error:', error);
    };

  } else {
    const error = await response.json();
    console.error('Failed to get WebSocket URL:', error);
  }
};

getStreamUrl();

WebSocket Request Parameters

After establishing a WebSocket connection, you must send a JSON object with the following structure to initiate the synthesis.

`header` (object)

app_id (string, required): Your application ID. Use the fixed value 73d88ee4.

status (int, required): The request status, indicating the position of the text chunk.

0: Start of the stream.
1: Middle of the stream.
2: End of the stream.

For a single, complete text message, this should be set to 2.

`parameter` (object)

tts (object, required): Contains TTS engine parameters like voice, speed, pitch, and the desired audio format.

`payload` (object)

text (object, required): Contains the text to be synthesized. The nested text field inside this object must be Base64 encoded.

POST/api/tts-proxyHTTP Stream

HTTP Streaming TTS Proxy API

Perform secure, real-time speech synthesis by sending a single HTTP POST request. Audio is returned incrementally as audio/mpeg chunks, allowing immediate playback without waiting for the full file. The proxy layer hides vendor credentials and performs quota billing on the server so your token cannot be abused by the client.

Request Body

text (string, required): Text to convert

voice (string, optional): Voice ID, default 'x5_lingfeiyi_flow'

speed (number, optional): Speech speed 0-100, default 50

volume (number, optional): Volume 0-100, default 50

pitch (number, optional): Pitch 0-100, default 50

Response

Content-Type: audio/mpeg

The response body is a streamed MP3. Use the ReadableStream API together with MediaSource for gap-free playback.

JavaScript Streaming Example

// Create an <audio> element in your page
audioElement = new Audio();

await (async () => {
  const response = await fetch('/api/tts-proxy', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': 'Bearer YOUR_JWT_TOKEN'
    },
    body: JSON.stringify({
      text: 'Hello via HTTP streaming!',
      voice: 'x5_lingfeiyi_flow'
    })
  });

  if (!response.ok || !response.body) throw new Error('Request failed');

  const mediaSource = new MediaSource();
  audioElement.src = URL.createObjectURL(mediaSource);
  mediaSource.addEventListener('sourceopen', async () => {
    const sourceBuffer = mediaSource.addSourceBuffer('audio/mpeg');
    const reader = response.body.getReader();
    while (true) {
      const { done, value } = await reader.read();
      if (done) break;
      sourceBuffer.appendBuffer(value);
    }
    mediaSource.endOfStream();
    audioElement.play();
  });
})();

Try the HTTP Streaming Demo

POST/api/tts-stream-manualManual Control

Manual Control Streaming TTS API

Provides precise control over streaming TTS sessions with manual state management. Perfect for complex applications requiring fine-grained control over audio generation flow.

🎯 Key Features

• Manual session state control (0→1→2)

• Real-time audio streaming

• Precise text segmentation control

• Session persistence and recovery

• Custom sequence number management

• Character-based quota tracking

Request Parameters

text (string, required): Text segment to convert

status (number, required): Session status (0=start, 1=continue, 2=end)

sessionId (string, required for status 1,2): Session identifier

voice (string, optional): Voice ID, default 'x5_lingfeiyi_flow'

speed (number, optional): Speech speed 0-100, default 50

volume (number, optional): Volume 0-100, default 50

pitch (number, optional): Pitch 0-100, default 50

seq (number, optional): Manual sequence number control

Status Flow

0Start: Create new session, begin streaming

1Continue: Add more text to existing session

2End: Final text segment, close session

Supported Voices

Voice Categories

• Free: Available for all users without additional cost

• Premium: High-quality voices with emotional expression (paid)

• Multilingual: Support for multiple languages

• Emotional: Voices with advanced emotional capabilities

Voice ID	Name	Language	Gender	Category	Permission Required
x5_lingfeiyi_flow	Lingfeiyi	zh-CN	Female	Free	`tts:basic`
x4_EnUs_Grant_emo	Grant	en-US	Male	Premium	`tts:premium`
x4_EnUs_Lila_emo	Lila	en-US	Female	Premium	`tts:premium`

Usage Notes

• Default Voice: If no voice is specified, 'x5_lingfeiyi_flow' is used

• Free Users: Can only use /api/tts endpoint, streaming interfaces are not available

• Premium Voices: Require 'tts:premium' permission and paid subscription

• Free Voice: 'x5_lingfeiyi_flow' is available for all users without additional cost

• Emotional Voices: Voices with '_emo' suffix support emotional expression

📡 API Reference

Authentication

Standard TTS API

Request Parameters

Response Format

Request Example

cURL Example

Streaming TTS API

Connection Flow

API Operations

GET Request Parameters

Simplified Example

WebSocket Request Parameters

header (object)

parameter (object)

payload (object)

HTTP Streaming TTS Proxy API

Request Body

Response

JavaScript Streaming Example

Manual Control Streaming TTS API

🎯 Key Features

Request Parameters

Status Flow

Supported Voices

Voice Categories

Usage Notes

`header` (object)

`parameter` (object)

`payload` (object)