Saturday, August 24, 2024

OpenAI API: text-to-speech

 Text to speech - OpenAI API 

 voices (alloyechofableonyxnova, and shimmer

openai/whisper: Robust Speech Recognition via Large-Scale Weak Supervision @GitHub


python

from pathlib import Path
from openai import OpenAI
client = OpenAI()

speech_file_path = Path(__file__).parent / "speech.mp3"
response = client.audio.speech.create(
  model="tts-1",
  voice="alloy",
  input="Today is a wonderful day to build something people love!"
)

response.stream_to_file(speech_file_path)

node.js

import fs from "fs";
import path from "path";
import OpenAI from "openai";

const openai = new OpenAI();

const speechFile = path.resolve("./speech.mp3");

async function main() {
  const mp3 = await openai.audio.speech.create({
    model: "tts-1",
    voice: "alloy",
    input: "Today is a wonderful day to build something people love!",
  });
  console.log(speechFile);
  const buffer = Buffer.from(await mp3.arrayBuffer());
  await fs.promises.writeFile(speechFile, buffer);
}
main();


max txt len for OpenAI TTS API: 4096 chars

solution: create smaller mp3, then concatenate to single file, i.e. by using this module
fluent-ffmpeg - npm

this is turn requires a separate (CLI) app installed on the computer

nothing is simple...



No comments: