Voice AI startups are drawing in VC cheques fast. Will they 'differentiate or suffocate'?
56m 9s
The dialogue opens with a lighthearted reference to renting partners during Chinese New Year to manage familial expectations, segueing into AI's evolving role in personal and commercial spheres. It then shifts core focus to the booming voice AI sector in India, where funding has skyrocketed, fueled by key technological advancements: improved large language models (LLMs), highly realistic neural voice synthesis, reduced latency for real-time interaction, and falling operational costs. The conversation features insights from a reporter and a founder, illustrating how voice AI agents are trained by analyzing top human sales performers to replicate effective, emotion-aware conversations, while navigating challenges like preventing mis-selling. The segment concludes by noting the industry's likely future consolidation, as businesses move toward standardized solutions, emphasizing that sustainable success will require differentiation beyond the initial funding rush.
- Ready, what was that about renting fiancees that we were talking about just now? When I asked you about the Chinese Lunar New Year? - Yeah, so we were talking about how it's Chinese New Year and some people get asked about their plans to have children and start a family and expand the family. - The context being that people go back, I mean, you know, it's like for the New Year, they go back from wherever they're working to meet their parents, right? And then of course parents, like all Asian parents end up asking, so what about kids? - Yeah, their parents, their grandparents, their aunts and uncles, basically everybody. So, you know, when that question comes, there's a lot of pressure, I guess. And at least a few years ago, one way to get around that would be to rent a boyfriend, girlfriend, fiance, just to kind of like demonstrate that you're working towards it. I don't know if that still goes out now, maybe it's become like too commonly known. But at least-- - Yeah, I'm sure people are renting AI. I mean, we've talked earlier on this podcast about the difference between Chinese AI apps for relationships and American ones. I think the American ones tend to focus more on AI girlfriends and the Chinese ones tend to focus more on AI boyfriend. You wrote about it as well. - Yeah, yeah, that's true. - All right, like, I mean, thank God, we're no longer renting fiancees. And on that note, I'd like to welcome you back to this week's edition of Zero Shot, the Kenz podcast on all things AI. And of course, this is taking place as the AI Impact Summit is taking place in Delhi to various definitions of taking place. Maybe we'll get to it towards the end of the podcast depending on the time that we have. Maybe we can just briefly talk about anthropic 'cause they just opened office in Bangalore. And this week, we ended up purchasing a bunch of cloud licenses for our office as well. I couldn't buy for you though, Brady. - No, I'm in Hong Kong. But again, like, I got to use it through perplexity. So, you know, I do it, I can. - Again, remind it to listeners, Brady's the only person in the world who's actually paid for a public city pro annual license. - That might be true. Hey, but you know what might be fun? Is if the Kenz office gets one of those vending machines from anthropic. - That's right. You know what? For those of you who are listening and wondering what the hell this is, this was basically this experiment. If I'm not mistaken, even the Wall Street Journal did this experiment where they set up an office vending machine, hooked it up to a cloud and agents, you can't see me right now, but I'm making air quotes. And people were allowed to determine what price they wanted to pay for the things that they wanted from the vending machine and the vending machine was, it wasn't a great success, right? - Yeah. - But I will tell you another example of an agent. Just the other day, just earlier this week, my son, he's 16 years old, he wanted to buy, he wanted me to buy him audio interface. He's a budding guitarist. And I looked it up on Amazon and then I looked it up on this other site, which is one of the leading Indian music sites. I shall not name it right now. And when I went to the site and I saw this exact same product at the exact same price that it was on Amazon, there was a notification right up there which said, click the negotiate button to negotiate with us. - Interesting. - Right? Literally on the product page. And I go down and then next to the buy now button, there is a even larger negotiate button. And I click that negotiate button because literally the site asked me to. And a chatbot pops up on my right. And it says, "Hey, welcome. This is the listed price." And it's already dropped it by about 1,500 bucks. Does this look good to you? And I'm like, I'm an Indian, right? And I'm like, no, it doesn't. And I reduce the price by another 4,000 and say, how about this? And it says, no, it's that's too much. And it increases that by 200 rupees and says, how about this? I'm like, sure, okay. And it adds that product with a 3,500 rupee discount into my, you know, the basket. And I make that purchase. And I'm thinking, how on earth does this AI bot make sense? 'Cause literally the customer came to your side. You didn't even ask me for my email. You just gave me a discount that margin just evaporated. - Right. - Since then, I've been checking but that botch disappeared from that site. I couldn't find it anywhere now. - K, too many to scound so far. (laughing) So that's the anthropic example with Wall Street Journal that where they were dispensing coax and my own intelligent bot a great example to start with. Less people get bored of the episode. This is not what the episode is. The story is a great episode on one of the most, I think areas in Indian AI, which is seeing most traction. And that is voice AI. Now, long time listeners or even new listeners, you know that we tend to have this fairly, I think journalistic skeptical view of the AI boom, especially in India. And we keep saying that look, I mean, where are the Indian startups? Where is the action? Where is the funding, right? Like, by and large, that's true. Even if you look at all the funding that's been announced in the last few days of the India Impact AI Summit, it's really going to data centers, right? It's going towards the hardware, which is a fairly like, you know, known model by now. But there is one space where the action is really hot in the AI space in India. Over the last year, the amount of cumulative funding that's come into the AI space has gone up from if I'm not mistaken, something like seven crores to about 280 crores. It's a huge jump in AI funding. There are lots of voice AI startups that have started in the last three, four years. And lots of much older startups that have conveniently pivoted to become voice AI startups right now as well. As a result of which, the narrative has now shifted to, oh my God, India was always a voice country. And India runs on voice. You know, all those retrofitted narratives around, we are fundamentally a voice country. We don't type as well. We have so many languages. Voice is something everyone is comfortable with. Therefore, voice is where all the action is. Now, all of this is true, which is why you're seeing all of this funding. But, but there's a lot of nuance buried in it, which is like all gold rushes, like all venture funding, stampede, the last people standing or the sustainable businesses are not the ones, all of the ones that get funded in the beginning. And that's what today's episode is about. And, you know, my first guest is Runmai Kolkarni, who is a reporter with the Ken, she's our colleague. And she's written a fantastic, she reported and wrote a fantastic story on the voice AI space, which we link to in the show notes. Welcome to ZeroShard, Runmai. - Hi. - Great to have you here. - Runmai has a range of interests. She's reported on politics, MSMEs, industrial policy, EV ecosystem. And at the Ken, she does cover AI as well. I love her stories because her stories have incredible writing. Very strong, her turns of freeze are just like, you know, fantastic, very strong characters, et cetera. So I'm really thrilled that Runmai is there on the episode. Joining us from Chennai is Devyani Gupta. Devyani is the CEO of Arrowhead. Arrowhead is a Bangalow based voice AI company. Remember this episode is about voice, so therefore, how could we not have the founder of a voice AI startup? Arrowhead has over 50 enterprise clients, including Aditya Verala, Capital, PTM, Tournament and many others in the BFS space. Interestingly, there is $3 million. Just last month from Stellaris Venture Partners for the series, they make emotion aware voice agents can't even out like me. I don't know we'll have to ask Devyani. Devyani, welcome to ZeroShot. Thank you so much, Ron. Great to be here. (soft music) Originally, Devyani was supposed to join us from a cap because she's in fully founder mode. She's jetting from one place to the other, but thankfully, I think she managed to find something slightly better than a cap, which is a cafe. You can't make that out. If you can't make that out, that's because Rajiv, a sound engineer has done some magic in the back end. We'll figure out. If you can't make it out, please write to me and tell me because then I'll tell Rajiv. It didn't do a good job. Interestingly, all the cafes here in Chennai only serve coffee. So I've been trying to figure out a cafe where I can get some herbal tea, but I'm having my first black coffee in five months because that's what I'm finding. Really? Did you consciously give up coffee or was just, you're a tea person. You've always been a tea person. Yeah, I actually never drank coffee since I was a child. My first cup of coffee in my life was when I
started my funder is, which is very out of the circle. Absolutely, we did that. So yes, since then I haven't really, but now it seems like I might have to come back. Glad to have you on this side of the coffee fan equation. And with that, let me turn it over to Merne Mye. Merne Mye. What was your story about? What was so fascinating about the voice AI space that made you dive in and report and write that story? The story begins with a flurry of messages on our Slack channel from Rohin, from P.G.K, from SEMA. All of whom were giving me notes. P.G.K is missing. We forgot to mention that. Yeah, he's supposed to be there, but he's not there today, which is just as fine because we have two other guests. Thank you. All right. So there were, as I said, there was a flurry of messages on our Slack channel. Everyone was telling me this voice AI startup has gotten funded. That has gotten funded. Maybe Merne Mye is something you should look into because you are a AI reporter. And I kept putting it off. I said, yes, I'll check it out. I'll check it out. And the day I published another story about the VC ecosystem in Sambaji Nagar, I walked into office and three people called out me and said, here's a space. Why are you not covering it? Go cover it. And like any Paccata app reporter, I sent out a text message to all my techie friends, all my friends in the VC ecosystem. And I met a friend of mine who works in a bank. And she showed me a model she was tweaking. Obviously, she knew all the myths and the facts about the voisy I think. And she also knew that there are chances of voisy calls, which go terribly off script sometimes. So she typed in a few lines of code on her computer. She pressed play. And that's when I heard my first conversation with the voisy I agent. So my friends bank, it works very closely with a couple of startups. And she works on the models themselves as well. And fairly soon, because the use case of both these startups overlaps with each other a lot, one of them will be replaced by an internal team, which is what she told me. And her boss, he put it in way cooler words. He said differentiate or suffocate. Whoa. Yeah. That is not what the voisy I market looks like. Like Rohan said, they have raised about 280 crores of funding in the last one year. So I spoke to analysts at VC firms who were funding these startups. And I was like, guys, what gives? What's going on? They said that voisy I is at an inflection point right now. I mean, latency has improved. Speech quality has become super decent. So right now, it's very realistic. So what's latency? Just explain it to our like in listeners. That would be the lag between speaking. See, when you speak, the AI has to process what you're speaking, maybe have a written transcript of it, and then think on it, have a reply, and then convert it to voice and talk to you. If it improves, that means that gap is getting smaller. So right now, that has improved. Like I said, speech quality has become pretty great. So it's fairly realistic for enterprises to think that there could be a little AI bought that can handle your inbound and outbound calling. And it could require a little supervision. Eventually, maybe it wouldn't even require that. It would take that section off your plate entirely. But what we see firms, what enterprises are telling me now is that this situation is not really going to stay static. A lot of voisy I models will become much more standardized. A lot like how chat bots on Swiggy and Sumato and Zepto all sound the same to you. A bit like that. And enterprises will want to reduce the number of vendors they work with, like my friends bank is already starting to do now. And that's what's going to happen to-- Sorry. I mean, we'll get to that part. I get that you're saying that this field is going to thin out over time, right? But before that, I'd like to get in Devyani. Devyani, tell us what's Arrowhead and how did it come about? How do you folks land on the voisy I space? Yeah, definitely. So before building in voisy I, we were actually doing analytics on human sales calls. So for these very large again calling teams, identifying where is misselling happening, where is proud happening, and how do we improve the performance of these human sales agents? So basically the large takeaway from that was one of the biggest pain points that these large tall centers face is the performance parity that exists. Your top 10 to 20% of your agents contribute to 70 to 80% of your sales. And it takes six months plus to get a new agent up to speed. Even then, it's very hard to get them to be the top performing agent. And this is what we were trying to essentially standardize. How do we get everyone to be like your top performing agents? And essentially when voisy I started to boom-- and this was a byproduct of lots of different technological technology facets kind of coming together and advancing at the same time and can dive much deeper into what those were exactly as well. At a simple level, what would you say those were just for our listeners? Yeah, so for a large facets I would look at, number one is the advancements of LLMs themselves. So the ability of these LLMs to understand context of what the user is saying versus what we had earlier, which is IBR, which is very NLP-based processing. Number two would be your neural models which are able to convert text to speech, speech to text in a very human-like way. Where today we've actually blown my voice for certain use cases to the point where if I make a real client call, are my customer on the other side asking me if I'm an AI. So that's how real these calls have gotten. Number three is the ability to build an infrastructure and orchestration layer that does all of this real time within less than 800 or 500 to 800 MS latency, like what Mornmai was saying. And the last one is cost optimizations. So you've probably as well seen that over the last year, cost to operate has committed in voice AI, the cost per minute to actually execute all of this. And we only see that reducing because models are becoming more efficient. And so we're able to actually pass on a lot of those cost advantages to our customers. So when all of these four facets essentially came together, it enabled voice AI to be very human-like in a way that could also scale from a cost perspective. And that's when you saw a lot of companies start to come up and build this because it was finally something that was operationally efficient enough for companies to deploy. Got it. Can I ask you a quick follow-up question? You mentioned earlier on in your response that typically the 10 to 20 top most performers in a call center are responsible for 80% of the outcomes, which is the paratoid principle, right? Like 20% are responsible for 80% outcomes. When you are doing this research, what is it about those 10, 20% people that make them better than the rest, excluding, of course, their own motivations, et cetera? But purely from a outbound sales agent point of view, what are they doing differently that the others aren't doing? Sales is very much an art. And even at Wharton when I was studying, we had a class on negotiation, right? Because negotiation also is an art. There's certain techniques and skills that enable you to be a better negotiator and sales almost like a negotiation. And so what they have learned over time is a call center ahead might give them a script. But often you see the top performing agents don't even follow the script. Because they realize what do I need to say in the beginning to make sure that I hope this customer onto the call? Well, how do I create urgency with this customer to make them take this loan to date rather than to take me in tomorrow? How do I create, actually, an India fear in this customer? To make them feel like they don't have insurance, they will go bankrupt because something will come up that will create a massive financial burden. These are all sales tactics that these agents have learned over time that make them very high quality sales agents. And this is not on a script. This is purely based on their qualitative experience. And so that's what we've seen really makes them very good sales agents. I've actually visited the calling centers of many of these large companies to understand how to emulate that for our bought. And it is an emotional experience, like the what they say to these customers and how much fervor they have in the way they speak. So yeah, it's very different with the top and the bottom of the four main agents. And just to kind of close this out, now we are obviously, like you said, those four points that you mentioned are all kind of evolving very rapidly. Now, it seems conceptually possible. Many of these, especially modern call centers and financial services companies, et cetera, record all the calls outbound inbound calls, et cetera, and that's been happening for years. So you have this vast corpus of call data that is sitting within these organizations. Now within knowledge, just assume that within this corpus of data, there are these 10% 20% calls, which are actually the best calls. So now, wouldn't it be possible to just take that entire corpus, train an AI model on it, understand what makes those 10%, 20% people better, and then just replicate it so that the post-human AI call center is essentially instead of 10%, 20% operating and the rest, everyone is operating.
at that level because everyone has, I mean in some the close analogy that I can think of is self-driving cars because when one self-driving car encounters a situation that it doesn't know of and it learns from it all the other cars also learn simultaneously from it because it's a network. So would it something similar like this happening to the voice AI space where suddenly everyone is at a double-form? Yeah so there are two main things that contribute to a really high converting bot. One is exactly what you're seeing Rohan which is what does the bot actually say on these calls? Right and that actually is an output of training it on your top agents. One caveat I will make is that these agents when they're these top performing agents there's also sometimes miscellion fraud involved in that right? So you have to be very careful to not just blindly take those calls. You have to understand what the guardrails are of what they can and cannot say and then apply that as a layer on top of these top performing calls because that's what we saw when you were doing analytics as well you know they'll say like we guarantee you a job if you take this course with us which you cannot guarantee. Right so so that's one caveat I would just make and the second aspect beyond just what they see on this calls is also how they say it. So that is what we were saying which is like how human like is your bot? Does it sound like a human so they don't drop off? Does it handle interruptions? Does it have a motion attached to it? So these are the two aspects so even if a company were to do the first which is take your top performing agents and extrapolate that across all your calls the second still has to be there for you to be able to retain and convert a customer at the end. God I'm definitely going to come back to you later in the conversation to understand more in depth about that second part because of a bunch of questions but my first question right now is back to Brady Brady have you are you aware if you've spoken to a bot customer service agent in the last I don't know months years. Yeah we wonder your doesn't make sense. Yeah so this is actually something I wanted to ask about as well but now that the question is directed at me the answer is no I have not spoken to any bots because I don't really pick up my phone at all. So I avoid this completely but like I'm super curious about like you know whether somebody in India can even realize this when they receive a call is there any kind of labeling is there any kind of like signal that says this is not a human that's calling you. Yeah so first thing Brady actually people like you who don't pick up calls that's going to put us out of business are not more people like you but interestingly on the second point. I'm raising my hands and I'll explain that's also because Adele conveniently labels most calls these days as spam and I don't pick up if Adele labels it as spam but that's a separate matter sorry but please continue the event. Yeah and also just to add to that thread actually like try has now released regulation that you have to use one for zeros numbers one six zero numbers for sales calls so this will of course limit connectivity going forward but yeah separate conversation probably but to your point Brady in India there's no regulation that you have to declare that you were a voice AI bot in the US this is the case which is why in the US you'll see the market for voice AI is very much around customer support where it's on the intent of the customer to have that conversation they want to get through them on oftentimes it's actually more efficient than talking to a human who might put you on hold in India you don't have to declare that which is why that bar for human likeness is so high because you want to solve as a human exactly exactly you want to convince them for your human speaking. Runmay what about you do you think that you've spoken to a bot anytime I have yes and all of that happened when I pick up picked up my mom's call so I'm a little worried about her algorithm right now like why are so many AI agents calling her I have no clue woman doesn't actually have a credit card but yes so I have spoken to AI agents I can tell the we can't tell the difference it sounds like what are the nuances that you pick up it sounds too nice it doesn't sound annoyed enough generally every call center agents call her picked up they sound a little bit annoyed a little overworked maybe a little caffeinated this one sounds happy to be talking to me and I'm like something's wrong with this picture over you you need friction you need yes you do the crumpiness something I noticed which was pretty interesting though that there are people who would naturally distrust a call because it came from an AI sometimes but there is also another school of thought where people are like okay maybe it's an AI agent that's calling me but I still want to hear what they're saying maybe out of curiosity maybe they want to talk to the agent and try to trip it up whatever the case is they actually want to engage in the conversation like I was I was negotiating yeah yeah or in the Claude branding machine example which you gave the journalist at Wall Street Journal actually interacted with it and they ordered a life fish among other things but yeah interesting this is sort of now brought us to this point where you're sort of wondering I mean I'll tell you one is what is one of my markers and you know I think Daviani you will have to tell me how true this is one of my markers of if I know because occasionally these calls come from a number which is not marked as spam right when you pick up the phone there is a delay of this two three seconds that I used to understand is the switching delay because you have these automated dialers inside these large organizations like I mean most of us picture that people are sitting on a phone and punching numbers to make calls that's not happening right so they are it's a very highly automated efficient system there's sitting with headsets etc there's an automated dialer that's making outbound calls when the call gets connected and when someone speaks from the other end is when it you know switches the call to an agent so that the agent is not wasting time on calls that don't get picked so for me that's usually a marker of I guess I must be speaking to a real agent because if it was a virtual agent why would there be a delay why would not instantly connect the moment I pick up would that be a fair assumption to make Daviani it's a really interesting point that you mentioned because we actually realized that in our box the bot is able to say it immediately when as soon as it picks up right because all you have to do is sending the TTS you don't even have a ASR LLM layer that gets invoked in the beginning right on that TTS latency might be like 100 MS or less if it's cashed so it's very quick and that quickness is actually what makes people think that it's a bot so now we're intentionally adding some delay this is game theory happening live right Daviani is actually going to go back today and tune her offering on the basis of what we discussed I'm just kidding of course I know her I'm guessing that their levels of sophistication will be much much more than that but thank you for validating one of the things that I kind of used to take as a marker for speaking to a human agent. frustration I'm going to my is the other point which I fully agree with you because just last week I got this call from a person who's like you know I recently like you know upgraded my car so he's like you bought this car and I'm like why are you calling me you bought this car and he's obviously trying I mean trying to get me to confirm that factoid which he has because he bought that database from somewhere and I'm not confirming it for him right and then he's like I want to give you a send you a gift and I'm like I don't want a gift see how can you not want a gift it's a free gift and he started getting more and more right like you know finally I said I don't want it fine don't take your free gift any bang the phone so that frustration part I get it I'm guessing a bot is going to be like maybe try to be nice with me a bot would probably say like okay well thank you for your time that's that's good now let's get back to the point that you hand run my right we sort of established how this is a fairly large significant market how this operates we all know intuitively as Indians and as customers and consumers that yeah I mean of course this has to make sense because there's so much of money involved in the things that are being sold etc and all that kind of stuff right but now to come back to that original premise of everyone has spotted this opportunity and everyone is getting into it what's happening inside organizations once startups like Arvada and of course they only will come to you and ask for the same question as well right what happens when voice AI startups meet enterprise clients so this is pretty interesting because enterprise clients already had someone dealing with the issue of outbound calls and endbound calls maybe it was a call center maybe they had an IVR system was something and right now if you have to replace that with a voice AI setup again there's a line I quoted from someone in my article which is one of my favorite lines to date from a source he said that we don't increase our budgets just because startups exist I remember that I love that line so but explain that line what does that line mean to a layperson so basically what he was trying to say he that source was this person I mean like even if you can't tell us a name like what what was his role he used to work at wns one of the largest outsources and all bp's yeah ironically that was one of the first stories
I covered at the KENWN is getting acquired by Cap Gemini. So he was talking to me and he was saying that yes, we do use voice AI, but we are not essentially creating a new line item in our budget for it. We will use it if it can replace one of our existing systems and if we feel like it could be profitable. Obviously, in most cases it can, but it did give me a lot of insight on how enterprises would approach this. And another perspective I got was, wasn't from a source actually, it was from an edit call when I had pitched this story. And Sita Saur said that this could be a feature, why is this a startup? And I realized that unless a voice AI startup had something really differentiated to offer, or if it was something that was actually integrating with the company's workflows completely, it would be fairly easy for enterprises to just buy the technology or they could maybe acquire startups or they could build the technology on their own. So that's how they are seeing it. I guess people would just come, sorry, go ahead, British. Yeah, this actually echoes our last episode when Nikhil from Tri-Leagle spoke about how any kind of AI tool for the legal industry kind of needs that expertise for it to be integrated into a law firm as well. So the comparison was with like Cloud Co-Work having a legal plug-in as well. So yeah, this kind of mirrors that situation that we covered last week too. Devyani, what about you? I mean, and let's start backwards from the last thing that run my set, which is, this is a feature and not a company. I'm sure you disagree. Tell us why. I mean, of course, the context behind that is that, I mean, for listeners, when run my set, this is a feature and not a company. I mean, the underlying context is that if you're a large organization, could you adopt all of the benefits of a voice AI-driven calling system without having to essentially partner with the company? Is it just available off the shelf? Is it some upgrade that your existing one of the vendors is providing, etc. Or is this something much more strategic and deeper? Which ones did they win? This is definitely the question of the hour, I think, in this space is the build versus by question. Do you build this in-house or do you work with partners? Many of your large players in this ecosystem are experimenting with building this in-house, right? And complete transparency. We have also have many players who have said, you know, we're not going to work with you because we're building this in-house. This month's later, they come back saying, we're not able to do it, please work with us. And the reason for this is that people, there's a difference between creating a voice AI demo versus creating a voice AI in production. A voice AI demo is very easy to create, right? That, again, you can kind of integrate your various elements, your TTS, SET to create a bot that is able to have a simple conversation. What's a DDS? It protects the speech. That's what the TTS is. So it basically converts whatever your L11 gives you that says that the bot should say this to speech that the bot actually says. So some of your largest TTS companies would be like your 11 labs, Cartagia is one of the big ones that's coming up, etc., etc., right? So these are the example of your TTSs. So a lot of companies are, again, thinking of building this in-house, but a voice AI company is a lot more than just your voice AI bot. What you have to build around that is you have to build monitoring to be able to monitor all of the calls to ensure there are no hallucinations or no side effects coming up. You have to be able to build scalable infrastructure that enables us to go from one call to one thousand calls concurrently at the same time without any power outage. You have to be able to self-host your models, optimize your models constantly. Their new model is coming up on a weekly basis. You have to have a very experimental team that is trying them and seeing how can we constantly improve the efficacy of our bot. So people think, oh yeah, building a voice AI bot is so simple because I'm just connecting all of these models on a vapier retail and then deploying that. But it is a lot more than that to actually create a product company that's able to do that. Okay. So you talked a little bit about build versus buy and running an AI demo versus running something in production. So what happens when a major tech player releases a voice API that actually works well? How does a firm like Arrowhead respond to that? What's your defensibility in that situation? That is actually an advantage to us because we are not tied to any one model or any one API. Right. Companies that decide to build foundational models along with building the app layer, those we find are slightly at a disadvantage because then you're tied to using your models, but each model has its own advantages and disadvantages. There's some that are better at reasoning. There's some that are more optimized for latency. You have to think of these models like clay, then you essentially mold to accommodate your use case, right? There are means to an end and not the end in and of itself. So when someone releases a new API, what we're able to do is say, Hey, which use cases does this fit really well in? Plug this into one part of that use case. Deploy that we test. We have a scenario testing model in house that actually runs this testing across thousands of scenarios concurrently to check if there are any adverse effects. And if it looks good, then we deploy it. Right. So actually when they release this, that's great for us. That just means that our voice bots are now going to be able to perform better. And because we are enterprise first and we work with large enterprise companies, we focus on that app layer where it's very white glove and we build these custom solutions for them, which these API companies would not do. They sell detect teams. We sell to business teams. Okay. Then I have a question for Renmey actually. So in your article about voice AI companies, I believe one of the points you make is that eventually there will be some consolidation. There will be fewer players. Is that in contrast? Is that run against what Daviani just described? Not entirely in contrast. She did describe that some models would have a better advantage over other models. So what I did find in my article is that while in this is in no sense a death note for all voice AI startups, but the ones that can actually make it through the consolidation wave would be startups that offer something else along with just their voice AI services. Maybe they offer to integrate workflows. Maybe they offer compliance services as well. So the idea was that if you're very deeply interpreted, sorry embedded in an enterprise, you would be harder to take out of the workflow. Daviani, if I could ask you to imagine yourself not as a founder, but as a VC, say you were Rahul or Ritesh, Edstelaris and you're trying to take a call, long call on look five years from now, seven years from now, which are which of this current cohort of so many voice startups is going to be not just still around, but actually in a strong position. What might be the things that you look for, which might be the startups as an investor that you look for saying that look, we know that not everyone is going to make it because a lot of this is a gold rush, but the ones that have ABC will be long term winners. And I get that I'm giving you a great opportunity to talk up your own, but hey, yeah, no, definitely. So true main things that a VC would look at in this space, number one is the product. How good is your product and number two is GTM. How quickly are you able to get into these companies and are you able to convert them from a POC to go live? Those are the two aspects they look like. Look at double clicking on each of those. I'll start with a second actually. This space is a land grab market. It is about how quickly can you get into these companies because as you rightly said, run my because they work with a few vendors and in banks, you have to go through info, second approvals. Once they've already done that with vendors, it's almost impossible to get in after that. So timing is imperative in this space. So that's why it's about are you able to hustle to get into these companies? And then once you get in, normally what they do is they have POCs where they have five to six voice AI companies competing and you have to be able to prove that your product is the best to be able to convert that to go live. Till today, every POC that we've done, we have won. And every POC that we've done has converted to a multi year goal live contract. And that is what is indicative of how good the product is. To date, even though people say it's commoditized, there's actually a massive difference between your really good performing voice AI companies and your less performing who are still good, but maybe not as high performing. And ultimately, conversion rates is the main KPI that every company will look at to be able to drive this forward. What is that? if you stay with that product aspect and you sort of drag
drag the slider to let's say two, three, four, five years. What about the product or what aspects about a voice AI product? Are the ones that allow it to not just be good now, but good a year later, good three years from now, good five years from now, because in the AI space, in sector after sector, we've seen how your definition of what was good 12 months ago, changes six months ago, changes three months ago. So the things that were on top in the past aren't on top today, right? So what is it? I mean, it's great that you guys are winning everything right now. What are you doing at a product level that will allow you to win two years from now, one year from now, three years from now? Especially because we know that the underlying threat of these giant models, you're not trying to kind of eat upwards and get into a direct relationship with business. So that's always there. So what's your mode? What's your product strategy mode? Yeah, to be honest, the AI space is evolving so quickly that it's not about what is that product that you can create that's going to survive the next three to five years. It's what is the thing that you're going to create that can enable you to survive those next three to five years? The reason I say this is there are behemoth companies today in India who have the best GTM in the country that we are competing against, right? They're already in with every single bank with their WhatsApp offerings or their CRM offerings, et cetera, right? If they were to deploy a very strong voice AI agent, they would be able to capitalize the entire market. But do they have the DNA within their team to be able to be very tech first and product first in the way that they deliver their voice AI products? We've seen this be kind of a question mark with some of those companies. Whereas if you have a very early stage company that is very agile, very experimental and we see this in many of our competitors as well, you're able to keep up with the performance and constantly improve your bots such that you are always number one, which means also there's a sense of agility that's required because if at any point you start to become laks, you will no longer be at the forefront. So yeah, it's much more on the DNA of the team rather than the product itself that determines long-term success. What's your team's DNA? Where are you folks from? How do you folks get together? I mean, just at the founder level or early team level? Yeah, so I mean, my CTO and tech co founder, Vengod, probably one of the best tech leaders I've ever worked with. He actually worked at, he was one of the first five employees at AWS's elastic search. He grew that to about 200 people. He was also at Airbnb Uber Rippling, so some of the largest tech companies in the world. And when you're working with banks, it's really about not just building a product, but building confi and scalable products. And he has so much expertise in that. Beyond that, I think our team, the main things we hire, I think skills set is something that can be built. You have to have a base level of skills, but the main thing that we hire for is someone who is very intellectually curious, wants to continuously learn and also just has a massive hustle mindset. So I mean, it's interesting in the last year, we've obviously scaled with many banks and NBC's. We did this with 10 people. And I think a lot of people were like, how is that possible? Now of course, we're scaling our team much more, but it's incredible how many folks are there at Arrowhead now? So by end of this year, we expect to get to about 40, 50 people. Well, it's still a fairly small team. So that thing that they mention about AI enables much smaller teams to punch way above their weight is fairly true then. 100%. Brun Mai, what's going to happen to all of these? Now, one of the things that when a venture capitalist invests in a startup, whether it be a voice AI startup or any other startup is like, they're always thinking even before they've invested that, what's the exit? Because they need to return money to their own LPs, they need to get an exit as well. So what's going to be the exit? I suspect it's not that all of these voice AI startups are going to be doing a IPO's 10 years from now. Maybe a few might, maybe they won't. We don't know. So what's going to happen? What's what explains this gold rush? Everyone is rushing in for the gold rush. Everyone wants to sell shals like to use that cliche. How many will be left standing at the end of the day? And what's everyone's exit? What's everyone's secretly hoping will be their exit plan? Even if they don't say it publicly? So this was pretty interesting because this was the one point where enterprises, VCs and a few of the voice AI startups I spoke to actually agreed in different words if something. And they did say that the funding rush for voice AI startups is because it tells a very simple story. They are replacing labor. It's super easy to say that if you use one AI agent, you won't have to pay 15 more people to do that job for you. It looks like instant ROI. And no matter what your technology is, no matter what sector you're working in, that's a rule that holds and VCs tend to fund it anyway. VCs know that not everyone is going to survive in this rush. But again, this is a space which is expanding very fast and they'd rather bet on somebody and be wrong, then be left out and be wrong. It's like that cliche of I'd rather have a horse in the race. Rather than the horses. Yeah, they're no horses. Exactly. So one of the analysts I spoke to for my article actually described a very scaled down version of the future where there would be a few voice AI startups offering very differentiated services. They would exist. Everybody else would be bought out by enterprises. Would be bought out by BPMs like GenPact, which is very big on AI acquisitions and tech acquisitions. And maybe infrastructure providers like Nvidia's Personaplex would exist on its own as a platform on which people can build more voice AI agents and startups. Interesting. It feels I guess limited to me. The way we're thinking about voice AI is like something calls you over the phone, you interact with it. But I just feel like to really get to that point where the returns make sense, it's got to be integrated into things like cockpit of cars and smart speakers in every home. No, I think you're thinking too far out. And look where it got Amazon. The opportunity to integrate smart speakers. Look where it got Apple with Siri. But I think most of these startups are just going after very sharp, very real market where they can go to an existing business that does calls whether in bound support or outbound sales and just say that, look, you're doing this with people. You can now do this with software at one x of the cost. Sure. And we'll do it better, right? I don't think many of them. And perhaps rightly so, right? Because when that's such a real and addressable market, which is also like, you know, I mean, which has revenue, then why would you try to kind of expand into like, you know, so many other areas, etc. In the future, who knows how some of those, but I suspect they'll all be, I suppose, extensions or adjacencies of this, right? It's like, oh, I'm, I'm using this software to make outbound sales. Can I also use it to handle in bound support or some other such stuff, right? Exactly. And one of the points that also came up in the reporting was that it's easy for enterprises to ask their existing vendors, hey, this is a service that I've heard of. There are maybe three startups offering it to me, but I've already integrated with you guys. Can you guys do it for me? Okay. You guys just buy one of these. Can you buy one of them? Can you invent a technology? Do something. I just don't want to switch my systems. So that, I mean, that is one way things would definitely work. Makes sense. Imagine a scary world of every AI voice AI agent that's calling you is the best in the world. Just get them to talk with each other. I don't need to be there. That's exactly. Maybe I install open-clone my phone and get my agent to talk with. And did you know that the founder of OpenClaw has joined opening eye? Yes, I read about that. Two chaos creators merge. Yeah. Entropy, entropy in the AI universe continues to increase on a decrease. Lady, you were looking to get him on the point because I guess we can't reach him now that he's joined this big company. He'll never talk to ever. Never say never. I have a today's my day of cliches. All right, looks like we are near the end and we did promise you that we're going to talk about the AI impact summit, the most successful, the most ambitious, the most AI enabled AI summit in the world out there. Sadly, we're sitting here in Bengaluru not Delhi where the action is for various definitions of action. Don't mind. Have you heard about the AI impact summit? So like I said, I did make friends with a lot of startup founders for this voice AI piece and everyone who's anyone has gone for the AI summit right now. So I was texting them and saying things like I'm so jealous you get to
to go there, why am I not there? And that's when I saw the tweets starting to come in of people actually being kept out when they already had boots in there. Some of-- - Yeah, I saw one of those tweets where it was a person who was supposed to give a talk. And that person was like, I can't get in, they've closed the gate. So if any of you want to listen to my talk, come to Konart Place where I'm giving a talk. - Yes. This is a voice I found out who we have-- - I've quoted in our article basically. - Oh, really? - Yeah, we have. - Oh, awesome. - And there was another person who had devices, some kind of AI devices. And they had set up a booth inside. And they had set up the booth and then the security-- - They don't even-- - You couldn't carry the devices inside. - No, no, no, the devices were inside. They were already inside. The security folks came and said, the Prime Minister was supposed to visit yesterday and he did. So they had to come and like, you know, clear out a bunch of folks as part of the security clearance. So they said that you have to leave for a few hours. And I think that guy was like, but what about all the stuff that's lying around here? They said, don't worry, people have left laptops and all what's gonna happen to your devices. And I think they were gone for almost six hours and they came back everything was bigger. - They were taken out at 12 in the afternoon. They were led back inside at 6. People were very frustrated. There were some people who actually texted and went, "Do they really think we'll clone a bot of our Prime Minister? What are they doing? Why are they leaving us out?" - You know, I mean, it's a-- With our insides, it's a good thing that none of-- Do you think that they used AI to organize the conference and the flows, et cetera? - Absolutely not. - I hope a person did this. - Oh, so now we're past this thing where we're saying, an AI can't be this bad. - I don't want people to be this bad. - It's getting fun, it's a beat I'm covering. If it's this bad, how much should we-- - Imagine if there was a line on the India AI summit webpage that said, AI powered by and one of these large, can you imagine the amount of bad PR that would be right now? But I don't know, I hope the rest of the sessions sort of made up. So from whatever I've read, looks mostly like a schmooz fest. - All the interesting people are in line. So all the interesting conversations are offsite. Like at most conventions, most summits. You know what, I've seen versions of this play out in India over the years, like a couple of decades ago when India's IT services boom or was at its peak. You would have this procession of, you know, you have these massive NASCOM events which is the Indian Association, the industrial association for all the software service companies. There annual events used to be like this. It was just like, everyone is coming there, the biggest CEOs. And it's a certain script, right? Because I've briefly spent a few years in PR as well, right? The script is, you know, I mean foreign companies before coming to India, they decide that, all right, so what's the announcement that we're making? - Sure. - How many people are we hiring? How much, how many millions of dollars or hundreds of millions of dollars are we investing? And how great is India market, right? Like, you know, everyone would come and say the same thing and the press would lap it up. And it's like this company to hire so many people, this company to invest so many hundreds of millions of dollars. And you see some version of that played in India right now with AI, where every AI founder is coming and tell us exactly what we want to hear. India is where planet scale AI adoption is going to take place. India is where AI is going to go from A to B. India is where we're going to see the most interesting use cases emerged. Like, it's literally that. India is one of the most critical markets. India, like, I mean, it's just literally that, right? Like, it's sort of this kind of, everybody has their place in this Kabuki play and, you know, everyone is doing their role and we'll read about it. - Yeah, well, sorry. I mean, this is why journalists are cynical, you know? Sorry, skeptical, skeptical. I must correct myself. It's skeptical, not cynical. - Yeah, a lot of us are cynical too. That's a different conversation. (laughing) I just want a new book. It's called Hope for Sinek's. Hope for cynical people. It's written by this professor at, I think, Stanford. Jamil Zaki. I, some of you may have heard about him. I think he teaches something. I don't have any sense. I try and all of that. Very highly recommended book. Not a recommendation for this AI podcast, of course. Hope. What hope? - You know, the full title of this is Hope for Sinek's. The surprising science of human goodness. - Yeah. - Sinek would be surprised, yes. (laughing) - Thanks so much, Devyani, for joining us. It was wonderful understanding the voice AI space from you and best wishes. - Thank you so much, guys. Great to connect with you and we'll talk soon. - Thank you for listening to Zero Shot. Next week, PGK will be back. And who's the lead host for next week? Is it PGK? - It is PGK. - All right, let him better this episode. Standards have been sent. - That was Zero Shot, the Ken's weekly podcast on the biggest developments in artificial intelligence. Our hosts and commentators are Praveen Gopal Krishnan, Rohan Darmakumar, and me, Brady. (humming) Our sound engineer is Regif CN, who makes everything sound spectacular. Don't miss our Zero Shot columns, which are published every Saturday. We'll be back with more for this podcast next week. (upbeat music) [BLANK_AUDIO]
Podcast Summary
Key Points:
The conversation begins with a cultural anecdote about renting partners during Chinese New Year to avoid family pressure, then transitions into discussing AI's role in relationships and business.
A significant focus is on the growth of voice AI in India, highlighting a surge in funding and startup activity, driven by technological advances in LLMs, neural voice models, low-latency infrastructure, and cost reductions.
The discussion explores the practical applications and challenges of voice AI, such as emulating top-performing human agents in sales, ensuring ethical guardrails, and the market's eventual consolidation as enterprises seek fewer, more standardized vendors.
Summary:
The dialogue opens with a lighthearted reference to renting partners during Chinese New Year to manage familial expectations, segueing into AI's evolving role in personal and commercial spheres. It then shifts core focus to the booming voice AI sector in India, where funding has skyrocketed, fueled by key technological advancements: improved large language models (LLMs), highly realistic neural voice synthesis, reduced latency for real-time interaction, and falling operational costs. The conversation features insights from a reporter and a founder, illustrating how voice AI agents are trained by analyzing top human sales performers to replicate effective, emotion-aware conversations, while navigating challenges like preventing mis-selling.
The segment concludes by noting the industry's likely future consolidation, as businesses move toward standardized solutions, emphasizing that sustainable success will require differentiation beyond the initial funding rush.
Chinese AI relationship apps tend to focus more on AI boyfriends, while American ones often emphasize AI girlfriends, reflecting cultural preferences in digital companionship.
AI funding in India has surged, with cumulative investment rising significantly, especially in voice AI startups, driven by India's multilingual landscape and comfort with voice-based interactions.
Key advancements include improved LLMs for context understanding, neural models for human-like speech synthesis, reduced latency for real-time interactions, and cost optimizations making voice AI more operationally efficient.
Top agents often deviate from scripts, using advanced sales tactics like creating urgency or emotional appeals, which are learned through experience and contribute to higher conversion rates.
Voice AI startups risk becoming standardized, similar to chatbots, leading enterprises to reduce vendors and potentially replace external solutions with internal teams, emphasizing the need for differentiation.
Chat with AI
Loading...
Pro features
Go deeper with this episode
Unlock creator-grade tools that turn any transcript into show notes and subtitle files.