voices (alloy
, echo
, fable
, onyx
, nova
, and shimmer
)
openai/whisper: Robust Speech Recognition via Large-Scale Weak Supervision @GitHub
python
from pathlib import Path
from openai import OpenAI
client = OpenAI()
speech_file_path = Path(__file__).parent / "speech.mp3"
response = client.audio.speech.create(
model="tts-1",
voice="alloy",
input="Today is a wonderful day to build something people love!"
)
response.stream_to_file(speech_file_path)
node.js
import fs from "fs";
import path from "path";
import OpenAI from "openai";
const openai = new OpenAI();
const speechFile = path.resolve("./speech.mp3");
async function main() {
const mp3 = await openai.audio.speech.create({
model: "tts-1",
voice: "alloy",
input: "Today is a wonderful day to build something people love!",
});
console.log(speechFile);
const buffer = Buffer.from(await mp3.arrayBuffer());
await fs.promises.writeFile(speechFile, buffer);
}
main();
solution: create smaller mp3, then concatenate to single file, i.e. by using this module
fluent-ffmpeg - npm
fluent-ffmpeg - npm
this is turn requires a separate (CLI) app installed on the computer
nothing is simple...