Python

This sample script demonstrates how to interact with the Whisperly TTS API to convert text into speech. The script sends a POST request to the /generate-voice endpoint with the required form data, including an optional speaker sample file, and prints the JSON response.

API Endpoints

POST /transcription

URL: https://tts.recordly.ai/generate-voice
Form Data Parameters:
- text (required): The text that you want to convert into speech.
- speaker_wav: Used to upload a speaker sample WAV file. This file can guide the voice synthesis to mimic a particular speaker’s tone or style.
- language: Specifies the language of the generated voice (default is "tr").
- file_path: The name for the output audio file (default is "output.wav").
- speed: Controls the speed of the voice output (default is "2.0").
- split_sentences: Determines whether the text should be split into sentences before processing (default is "false").
Making the Request:
- The script uses the requests library to send a multipart/form-data POST request. If the request is successful, the JSON response containing details about the generated audio file is printed to the console.

Usage

The example below shows how to start a text-to-speech request and then retrieve the result.

import requests
import json

BASE_URL = "https://tts.recordly.ai"
TOKEN = "<TOKEN>"

def generate_voice(text: str, speaker_wav_path: str, language: str = "en",
                   file_path: str = "output.wav", speed: str = "2.0", split_sentences: str = "false"):
    """
    Generates voice from text using the Whisperly TTS service.

    Parameters:
    -----------
    text : str
        The text to be converted into speech.
    speaker_wav_path : str
        The file path to the speaker sample WAV file.
    language : str, optional
        The language code for the voice output. Default is "tr".
    file_path : str, optional
        The desired output filename for the generated audio. Default is "output.wav".
    speed : str, optional
        The speed factor for the voice output. Default is "2.0".
    split_sentences : str, optional
        Whether to split the text into sentences before processing. Default is "false".

    Returns:
    --------
    audio file
        The generated audio file.
    """
    endpoint = f"{BASE_URL}/generate-voice"
    headers = {
        "Authorization": f"Bearer {TOKEN}"
    }
    data = {
        "text": text,
        "language": language,
        "file_path": file_path,
        "speed": speed,
        "split_sentences": split_sentences
    }
    files = {
        "speaker_wav": open(speaker_wav_path, "rb")
    }
    
    print("Sending voice generation request with the following payload:")
    print(json.dumps(data, indent=2))
    
    response = requests.post(endpoint, headers=headers, data=data, files=files)
    response.raise_for_status()
    return response.json()

if __name__ == "__main__":
    # The text to be converted into speech
    text_to_convert = "Lorem ipsum dolor sit amet, consectetur adipiscing elit."
    
    # Path to the speaker sample WAV file (adjust the path as needed)
    speaker_wav_path = "/path/to/file/speaker.wav"
    
    print("Starting voice generation...")
    result = generate_voice(text=text_to_convert, speaker_wav_path=speaker_wav_path)
    
    print(f"Voice generation complete. Audio saved to 'output.wav'")

API Endpoints​

Usage​

API Endpoints

Usage