API Documentation

Comprehensive API documentation with complete request parameters, response formats, and code examples.

📡 API Reference

Authentication

All API requests require Bearer Token authentication. The token contains user identity, permissions, and quota information.

Authorization: Bearer YOUR_JWT_TOKEN
POST/api/tts

Standard TTS API

Suitable for short text synthesis, returns complete audio file at once

Request Parameters
text (string, required): Text to convert
voice (string, optional): Voice ID, default 'x5_lingfeiyi_flow'
speed (number, optional): Speech speed 0-100, default 50
volume (number, optional): Volume 0-100, default 50
pitch (number, optional): Pitch 0-100, default 50
streamMode (boolean, optional): Enable streaming, default false
Response Format
Success: Returns audio/mpeg format audio data
Error: Returns error message JSON
Response Headers:
Content-Type: audio/mpeg
X-User-ID: User ID
X-Characters-Used: Characters consumed
Request Example
// JavaScript Example
const response = await fetch('/api/tts', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': 'Bearer YOUR_JWT_TOKEN'
  },
  body: JSON.stringify({
    text: 'Hello, this is a test text',
    voice: 'x5_lingfeiyi_flow',
    speed: 60,
    volume: 80,
    pitch: 50,
    streamMode: false
  })
});

if (response.ok) {
  const audioData = await response.arrayBuffer();
  const audioBlob = new Blob([audioData], { type: 'audio/mpeg' });
  const audioUrl = URL.createObjectURL(audioBlob);
  
  // Play audio
  const audio = new Audio(audioUrl);
  audio.play();
} else {
  const error = await response.json();
  console.error('Synthesis failed:', error);
}
cURL Example
curl -X POST https://your-domain.com/api/tts \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -d '{
    "text": "Hello, this is a test text",
    "voice": "x5_lingfeiyi_flow",
    "speed": 60,
    "volume": 80,
    "pitch": 50
  }' \
  --output audio.mp3
GET/api/tts-streamwebsocket

Streaming TTS API

Real-time streaming via WebSocket. Billing is simplified to one request per call, deducted upon getting the WebSocket URL.

Connection Flow
Step 1: GET - Get direct WebSocket URL. Usage is billed now.
Step 2: Connect directly to WebSocket and start streaming.
API Operations
GET: Get WebSocket URL & Finalize billing.
Billing: Count-based system. One request deducted per API call.
Plan Limits: Free: 100/day, Starter: 1000/day, Pro: 10000/day.
GET Request Parameters
text (string, required): Text to convert
voice (string, optional): Voice ID, default 'x5_lingfeiyi_flow'
speed (number, optional): Speech speed 0-100, default 50
volume (number, optional): Volume 0-100, default 50
pitch (number, optional): Pitch 0-100, default 50
Authorization (header, required): Bearer JWT token
Simplified Example
// Step 1: Get WebSocket URL (Usage billed on this request)
const getStreamUrl = async () => {
  const text = 'Hello, this is a streaming test with a direct connection.';
  const voice = 'x5_lingfeiyi_flow';
  const speed = 50;
  const volume = 80;
  const pitch = 50;

  const response = await fetch('/api/tts-stream?' + new URLSearchParams({
    text,
    voice,
    speed: String(speed),
    volume: String(volume),
    pitch: String(pitch),
  }), {
    headers: {
      'Authorization': 'Bearer YOUR_JWT_TOKEN'
    }
  });

  if (response.ok) {
    const data = await response.json();
    const wsUrl = data.wsUrl;
    
    // Step 2: Connect to WebSocket and send synthesis parameters
    const socket = new WebSocket(wsUrl);
    
    socket.onopen = () => {
      console.log('WebSocket connection established.');
      // Based on official documentation, construct and send the request JSON
      const requestData = {
        header: {
          app_id: "73d88ee4", // Use this fixed app_id for this service
          status: 2, // 2 indicates this is the final and only chunk of text
        },
        parameter: {
          tts: {
            vcn: voice,
            speed: speed,
            volume: volume,
            pitch: pitch,
            audio: {
              encoding: "lame",
              sample_rate: 24000,
              channels: 1,
              bit_depth: 16,
              frame_size: 0
            }
          }
        },
        payload: {
          text: {
            encoding: "utf8",
            compress: "raw",
            format: "plain",
            status: 2, // 2 indicates the final text chunk for this request
            // Text must be base64 encoded
            text: btoa(unescape(encodeURIComponent(text)))
          }
        }
      };
      
      socket.send(JSON.stringify(requestData));
    };

    socket.onmessage = (event) => {
      // The server will send back audio data, likely in a JSON object.
      // The specific format depends on the service implementation.
      console.log('Received data:', event.data);
      // Example of handling a JSON response with base64 audio:
      // const response = JSON.parse(event.data);
      // if (response.type === 'audio' && response.data) {
      //   const audioChunk = atob(response.data);
      //   // Your logic to play or buffer the audio chunk
      // }
    };

    socket.onclose = (event) => {
      console.log('WebSocket connection closed.', event.code, event.reason);
    };

    socket.onerror = (error) => {
      console.error('WebSocket error:', error);
    };

  } else {
    const error = await response.json();
    console.error('Failed to get WebSocket URL:', error);
  }
};

getStreamUrl();
WebSocket Request Parameters

After establishing a WebSocket connection, you must send a JSON object with the following structure to initiate the synthesis.

header (object)

app_id (string, required): Your application ID. Use the fixed value 73d88ee4.

status (int, required): The request status, indicating the position of the text chunk.

  • 0: Start of the stream.
  • 1: Middle of the stream.
  • 2: End of the stream.
For a single, complete text message, this should be set to 2.

parameter (object)

tts (object, required): Contains TTS engine parameters like voice, speed, pitch, and the desired audio format.

payload (object)

text (object, required): Contains the text to be synthesized. The nested text field inside this object must be Base64 encoded.

POST/api/tts-proxyHTTP Stream

HTTP Streaming TTS Proxy API

Perform secure, real-time speech synthesis by sending a single HTTP POST request. Audio is returned incrementally as audio/mpeg chunks, allowing immediate playback without waiting for the full file. The proxy layer hides vendor credentials and performs quota billing on the server so your token cannot be abused by the client.

Request Body
text (string, required): Text to convert
voice (string, optional): Voice ID, default 'x5_lingfeiyi_flow'
speed (number, optional): Speech speed 0-100, default 50
volume (number, optional): Volume 0-100, default 50
pitch (number, optional): Pitch 0-100, default 50
Response
Content-Type: audio/mpeg
The response body is a streamed MP3. Use the ReadableStream API together with MediaSource for gap-free playback.
JavaScript Streaming Example
// Create an <audio> element in your page
audioElement = new Audio();

await (async () => {
  const response = await fetch('/api/tts-proxy', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': 'Bearer YOUR_JWT_TOKEN'
    },
    body: JSON.stringify({
      text: 'Hello via HTTP streaming!',
      voice: 'x5_lingfeiyi_flow'
    })
  });

  if (!response.ok || !response.body) throw new Error('Request failed');

  const mediaSource = new MediaSource();
  audioElement.src = URL.createObjectURL(mediaSource);
  mediaSource.addEventListener('sourceopen', async () => {
    const sourceBuffer = mediaSource.addSourceBuffer('audio/mpeg');
    const reader = response.body.getReader();
    while (true) {
      const { done, value } = await reader.read();
      if (done) break;
      sourceBuffer.appendBuffer(value);
    }
    mediaSource.endOfStream();
    audioElement.play();
  });
})();
POST/api/tts-stream-manualManual Control

Manual Control Streaming TTS API

Provides precise control over streaming TTS sessions with manual state management. Perfect for complex applications requiring fine-grained control over audio generation flow.

🎯 Key Features
• Manual session state control (0→1→2)
• Real-time audio streaming
• Precise text segmentation control
• Session persistence and recovery
• Custom sequence number management
• Character-based quota tracking
Request Parameters
text (string, required): Text segment to convert
status (number, required): Session status (0=start, 1=continue, 2=end)
sessionId (string, required for status 1,2): Session identifier
voice (string, optional): Voice ID, default 'x5_lingfeiyi_flow'
speed (number, optional): Speech speed 0-100, default 50
volume (number, optional): Volume 0-100, default 50
pitch (number, optional): Pitch 0-100, default 50
seq (number, optional): Manual sequence number control
Status Flow
0Start: Create new session, begin streaming
1Continue: Add more text to existing session
2End: Final text segment, close session

Supported Voices

Voice Categories
Free: Available for all users without additional cost
Premium: High-quality voices with emotional expression (paid)
Multilingual: Support for multiple languages
Emotional: Voices with advanced emotional capabilities
Voice IDNameLanguageGenderCategoryPermission Required
x5_lingfeiyi_flowLingfeiyizh-CNFemaleFreetts:basic
x4_EnUs_Grant_emoGranten-USMalePremiumtts:premium
x4_EnUs_Lila_emoLilaen-USFemalePremiumtts:premium
Usage Notes
Default Voice: If no voice is specified, 'x5_lingfeiyi_flow' is used
Free Users: Can only use /api/tts endpoint, streaming interfaces are not available
Premium Voices: Require 'tts:premium' permission and paid subscription
Free Voice: 'x5_lingfeiyi_flow' is available for all users without additional cost
Emotional Voices: Voices with '_emo' suffix support emotional expression