Python
This sample script demonstrates how to interact with the Whisperly TTS API to convert text into speech. The script sends a POST request to the /generate-voice endpoint with the required form data, including an optional speaker sample file, and prints the JSON response.
API Endpoints
POST /transcription
- URL: https://tts.recordly.ai/generate-voice
- Form Data Parameters:
- text (required): The text that you want to convert into speech.
- speaker_wav: Used to upload a speaker sample WAV file. This file can guide the voice synthesis to mimic a particular speaker’s tone or style.
- language: Specifies the language of the generated voice (default is "tr").
- file_path: The name for the output audio file (default is "output.wav").
- speed: Controls the speed of the voice output (default is "2.0").
- split_sentences: Determines whether the text should be split into sentences before processing (default is "false").
- Making the Request:
- The script uses the requests library to send a multipart/form-data POST request. If the request is successful, the JSON response containing details about the generated audio file is printed to the console.
Usage
The example below shows how to start a text-to-speech request and then retrieve the result.
import requests
import json
BASE_URL = "https://tts.recordly.ai"
TOKEN = "<TOKEN>"
def generate_voice(text: str, speaker_wav_path: str, language: str = "en",
file_path: str = "output.wav", speed: str = "2.0", split_sentences: str = "false"):
"""
Generates voice from text using the Whisperly TTS service.
Parameters:
-----------
text : str
The text to be converted into speech.
speaker_wav_path : str
The file path to the speaker sample WAV file.
language : str, optional
The language code for the voice output. Default is "tr".
file_path : str, optional
The desired output filename for the generated audio. Default is "output.wav".
speed : str, optional
The speed factor for the voice output. Default is "2.0".
split_sentences : str, optional
Whether to split the text into sentences before processing. Default is "false".
Returns:
--------
audio file
The generated audio file.
"""
endpoint = f"{BASE_URL}/generate-voice"
headers = {
"Authorization": f"Bearer {TOKEN}"
}
data = {
"text": text,
"language": language,
"file_path": file_path,
"speed": speed,
"split_sentences": split_sentences
}
files = {
"speaker_wav": open(speaker_wav_path, "rb")
}
print("Sending voice generation request with the following payload:")
print(json.dumps(data, indent=2))
response = requests.post(endpoint, headers=headers, data=data, files=files)
response.raise_for_status()
return response.json()
if __name__ == "__main__":
# The text to be converted into speech
text_to_convert = "Lorem ipsum dolor sit amet, consectetur adipiscing elit."
# Path to the speaker sample WAV file (adjust the path as needed)
speaker_wav_path = "/path/to/file/speaker.wav"
print("Starting voice generation...")
result = generate_voice(text=text_to_convert, speaker_wav_path=speaker_wav_path)
print(f"Voice generation complete. Audio saved to 'output.wav'")