[Meet The Team] The Future of Voice AI: Insights from Karthik Ganesan at Anyreach

[Meet The Team] The Future of Voice AI: Insights from Karthik Ganesan at Anyreach

Voice AI is evolving beyond simple commands to sophisticated conversational agents. Building truly human-like voice interactions requires more than just advanced models—it demands purpose-built data and ethical implementation.


ARTICLE HIGHLIGHTS

In this episode of Anyreach Roundtable's "Meet The Team" series, Richard Lin speaks with Karthik Ganesan, CTO and Co-founder at Anyreach, about his journey from using AI to practice conversations to revolutionizing enterprise voice agents. They explore the technical evolution from LSTMs to LLMs, the limitations of foundation models, and why the future of voice AI lies in agentic systems built on ethical data practices.

Key Takeaways

• Personal Problems Drive Innovation – Karthik's journey began with a relatable challenge: using chatbots to practice conversations, which evolved into an eight-year mission to build human-like voice agents.
• Context Over Classification – Rather than trying to classify emotions, effective voice AI understands context and responds naturally, just like humans do.
• Open Source Can Compete – With proper data and techniques, smaller open source models can rival proprietary giants, as proven by Starling Beta outperforming ChatGPT 3.5.
• Foundation Models Aren't Enough – The "cocktail problem" of mixing everything together loses the specific character needed for individual use cases.
• Agentic Systems Win – Multiple specialized speech LLMs working together like a "wolf pack" outperform single monolithic models.
• Ethics Matter – Voice AI requires consent-based data collection and proactive safety measures to handle the 20% of cases where agents fail.

The Unconventional Beginning: From Dating Anxiety to Voice AI Pioneer

As a trained computer scientist with a very human problem, Karthik Ganesan found himself in 2017 struggling with conversation anxiety. Rather than taking traditional advice, he channeled this challenge into building chatbots and voice bots for practice.

💡
"I was like, hey, you know what? Like, how do I start off the conversations? How do I need to be sounding? So I started off with chatbots and then voice bots for that."

This personal challenge became the catalyst for envisioning "a thousand times better version of voicemail" as the future of human-AI interaction.

The Technical Evolution: From LSTMs to Contextual Understanding

Karthik's academic journey at Carnegie Mellon positioned him at the forefront of dialogue systems research. Starting with Long Short-Term Memory networks in 2017, he witnessed the fundamental challenges of early voice AI where simply understanding user intent was extraordinarily difficult.

His experiences at Robert Bosch and Mercedes Benz led to a crucial realization about emotion recognition:

💡
"You can never classify emotions. There is no way that humans don't classify in their head that oh, this guy is angry, oh this guy is excited... they just understand context and start speaking."

At Amazon Alexa, he tackled the "rare words" problem—why Alexa played popular artists instead of the less common ones users actually requested.

💡
"You tried asking Alexa to play a less popular artist and then it tries to pick out another artist who's a little more popular and then sounds similar."

The Open Source Revolution: Democratizing Advanced AI

When ChatGPT emerged in 2022, Karthik didn't just observe—he acted. Working with UC Berkeley researchers, his team created Starling Beta, one of the first open source models to outperform ChatGPT 3.5 using a 7-billion parameter Llama2 model.

💡
"We democratized the idea of how do you do RL from human feedback."

His subsequent coding models achieved such impressive results that he experienced a surreal moment:

💡
"I started feeling weird after some time that the model that I created was much better in coding than even me. I had to start using it to start doing the rest of the work."

The Data Crisis: Why Foundation Models Fall Short

This success revealed a critical insight about the limitations of foundation models. Karthik argues that trying to build "AI for everyone" creates what he calls the "cocktail problem."

💡
"You go to a bar and then you try to have certain type of drinks... But then suddenly you go to the bar after two weeks and then they've mixed up all the drinks together. There are only cocktails now."

The solution isn't better prompting but purpose-built data, especially the wealth of unspoken knowledge that exists only in people's heads, not in written form online.

Building Anyreach: The Agentic Voice AI Revolution

Rather than relying on single monolithic models, Anyreach pioneers "agentic voice AI systems"—multiple specialized speech LLMs working together.

💡
"We have multiple agents talking to each other. But think about like, what if multiple speech LLMs spoke to each other that they are able to accurately detect above, turn detection."

The key differentiator is spontaneous data collection through role-play scenarios and user simulators, capturing natural conversational flow rather than performative content.

The Ethics and Safety Imperative

Unlike companies using scraped data without consent, Anyreach takes a different approach to data collection and safety.

💡
"We do pay our voice actors, we do take their consent... we hire them, we pay them on an hourly basis for the data and then we get them all their consent."

But ethical data is just the beginning. The real challenge is the 20% of cases where voice agents fail:

💡
"What happens to the 20% man? What happens to those people who fall into the 20% use case?"

Anyreach's solution includes proactive call transfer technology that detects user frustration and automatically routes to human agents when needed.

The Future: Beyond the Hype Cycle

Looking ahead, Karthik warns against the industry's rush to deploy "okay-ish agents" with plans to improve iteratively. Voice AI isn't like mobile apps—it needs to work perfectly from day one, especially for mission-critical applications like emergency services.

💡
"You only want to have amazing duplex conversations. And for those conversations that you're not able to handle really well, you should transfer to a human and keep learning from humans."

Perhaps most importantly, he identifies AI wealth disparity as a growing concern:

💡
"It's almost like there's a whole economy for AI. If that was the case, where the rich companies have amazing AI and then the poor companies or the ones that don't have as much money will have lower quality AI."

Preserving Culture in the Age of AI

At its core, Anyreach's mission is ensuring that technology democratizes access to excellent service rather than creating new barriers.

💡
"Humans are trying to maximize their experience. They want everything instantaneously, they want everything super quick, and they want much, much more value for the same money."

The goal isn't to create an entirely new AI-driven world, but to "checkpoint the world the way it is" and enhance it with AI while preserving human culture, language, and identity.

Conclusion

As voice AI becomes ubiquitous, companies like Anyreach carry the responsibility of ensuring this technology serves humanity rather than replacing it. Through ethical data practices, rigorous safety measures, and a commitment to quality across all customer segments, they're working to make voice AI a tool for human flourishing rather than frustration.

The future belongs to those who can build voice AI that works perfectly from day one while maintaining the human connections that make conversations meaningful—and Karthik Ganesan at Anyreach is leading the way.


How to connect with Karthik from Anyreach

Keywords: AI, agentic systems, conversational AI, ethical AI, speech recognition, dialogue systems, open source AI, human-AI interaction

Subscribe for more insights on how AI is transforming industries!

Youtube
LinkedIn
X.com
Instagram
Tiktok
Meta
Discord
Website
Blog

Read more