Microsoft AI TTS models are currently better (more consistent) than OpenAI's gpt-4o-mini-tts
March 2025: Azure AI Speech’s HD voices are generally available and more | Microsoft Community Hub
Azure AI Speech’s Dragon HD (language model-based TTS) Neural TTS voices are particularly well-suited for voice agents and conversational scenarios, thanks to the following key features:
- Context-Aware and Dynamic Output
The Azure AI Speech’s Dragon HD TTS models are enhanced with Language Models (LMs) to ensure better context understanding, producing more accurate and contextually appropriate outputs. Each voice incorporates dynamic temperature adjustments to vary the degree of creativity and emotion in the speech, allowing for tailored delivery based on the specific needs of the content. - Emotion-Enhanced Expressiveness
The Azure AI Speech’s Dragon HD TTS voices incorporate advanced emotion detection, leveraging acoustic and linguistic features to identify emotional cues within the input text. The model will adjusts tone, style, intonation, and rhythm dynamically to deliver speech with rich, natural variations and authentic emotional expression. - Improved Multilingual Support
Following the addition of more voice variety and enhanced multilingual capabilities last month, Azure AI Speech’s Dragon HD TTS have gained immense popularity across various use cases, including conversational agents, podcast creation, and video content production. - Cutting-Edge Acoustic Models
GA voices utilize the latest acoustic models, continually updated by Microsoft to ensure optimal performance and superior quality. These models adapt to changing needs, providing users with state-of-the-art speech synthesis.
Microsoft completes acquisition of Nuance, ushering in new era of outcomes-based AI - Source
Dragon Speech Recognition - Get More Done by Voice | Nuance
Nuance Communications - Wikipedia