DistantNews
Google Challenges Polish AI Speech Firm ElevenLabs with New Emotional Voice Model
🇵🇱 Poland /Technology

Google Challenges Polish AI Speech Firm ElevenLabs with New Emotional Voice Model

From Rzeczpospolita · (45m ago) Polish

Translated from Polish, summarized and contextualized by DistantNews.

TLDR

  • Google has launched its new Gemini 3.1 Flash TTS model, challenging ElevenLabs' dominance in realistic speech synthesis.
  • The new model boasts over 200 emotional tags and native support for dialogues, enabling AI voices to whisper, cry, or laugh, mimicking human nuances.
  • This advancement positions Google to compete for podcast creators, developers, and the creative industry, potentially disrupting the market currently led by ElevenLabs.

Google has thrown down the gauntlet to Poland's own ElevenLabs, a startup already valued at $11 billion, with its new Gemini 3.1 Flash TTS model. This isn't just a technical showcase; it's a direct assault on the leader in the speech synthesis sector. By integrating over 200 emotional tags and native dialogue support, Google aims to capture the podcast, developer, and creative markets.

Google is throwing down the gauntlet to the Polish ElevenLabs. The new AI model can whisper and cry.

— Article HeadlineThe headline directly states Google's challenge to ElevenLabs and highlights the new AI's capabilities.

For years, the line between AI-generated speech and human voices was starkly defined by a lack of subtlety. While AI could read text accurately, it struggled to convey the underlying intention or emotion. Google's Gemini 3.1 Flash TTS promises to change that with its proprietary approach to audio. The extensive library of audio tags allows developers to embed instructions directly into the text, enabling the AI to deliver lines with determination, curiosity, or even shift into a whisper, laugh, or cry – capabilities previously synonymous with ElevenLabs.

The new Gemini 3.1 Flash TTS not only speaks Polish but can also convey emotions, which could shake the position of leaders like ElevenLabs.

— Article TextThis sentence elaborates on the threat posed by Google's new model to ElevenLabs.

This new model supports over 70 languages, including challenging markets like Hindi, Japanese, and German, offering 30 base voices for personalization. Crucially for content creators, Gemini 3.1 Flash TTS natively handles multi-person dialogues. Unlike traditional systems requiring separate API calls for each voice, this new model maintains a natural conversational flow within a single process. This significantly lowers technical barriers and production costs for scriptwriters and creators of voice assistant interfaces.

Google wants to take over podcast creators, developers, and the creative industry.

— Article TextThis quote explains Google's strategic target market for its new TTS technology.

Experts see Google's move as a clear signal that the tech giant intends to compete fiercely, even against specialized smaller players. While ElevenLabs has cultivated a strong brand and loyal community, Google possesses a formidable infrastructure and ecosystem. In the prestigious Artificial Analysis TTS Leaderboard, Google AI Studio's model has already achieved a top score, indicating that the race for realistic and emotionally nuanced AI speech is heating up, with significant implications for the global creative industries.

Google is going like a bulldozer – the model supports over 70 languages at launch, including demanding markets like Hindi, Japanese, or German, offering 30 ready-made voices as a base for further personalization.

— Article TextThis quote emphasizes the broad language support and personalization options of the new Gemini model.
DistantNews Editorial

Originally published by Rzeczpospolita in Polish. Translated, summarized, and contextualized by our editorial team with added local perspective. Read our editorial standards.