The neXt Curve reThink Podcast

XC Webcast 2025-Snapdragon Summit 2025 with Vinesh Sukumar

Leonard Lee Season 7 Episode 38

Send us a text

Exploring AI and Generative AI Innovations at Snapdragon Summit 2025

In this episode, Leonard Lee, executive analyst at neXt Curve, interviews Vinesh Sukumar, Vice President of Product Management AI at Qualcomm, during the Snapdragon Summit 2025 in Maui. The discussion delves into Qualcomm's bold vision of AI as the next user interface, the evolution of AI experiences, and the critical role of heterogeneous computing involving NPU, GPU, and CPU. Vinesh elaborates on the advancements in generative AI, the challenges of inference on edge devices, and the concept of Agentic AI, a personalized AI experience. The episode highlights Qualcomm's commitment to overcoming these challenges and shaping the future of digital experiences with AI.

00:00 Welcome to Snapdragon Summit 2025
00:27 Introduction to AI and Generative AI
01:32 AI Experiences and Infrastructure
03:56 The Role of NPU, GPU, and CPU in AI
06:04 Challenges and Future of AI Computing
07:36 Agentic AI and Personalization
10:11 Concluding Thoughts and Future Prospects

Hey everyone, this is Leonard Lee, executive Analyst at Ncur, and I am here in Maui, day two of Snapdragon Summit 2025. And it's a pleasure to be here. Nothing like being in person at Snapdragon. Summit and, just wanna number one, thank the Qualcomm organization for inviting me out, flying me out here to be here live. And I'm here with Vanesh Kumar, who is the vp. And oh, by the way, you're a PhD, so he's a doctor. Yeah. And he is the vice president of product management AI and generative ai. Right, not just ai, generative AI as well, and he just got off, stage a little while ago and did a wonderful presentation on AI and how the next UI. Is going to be ai, right? Which is a pretty bold claim and ambition and vision. And so, what I'd love to do here is have a chat with you about some of those salient points and key points in what Qualcomm's vision is of ai. Really becoming this next stage in the user experience and the user interface. So really glad to have this opportunity to talk to you.

Vinesh Sukumar:

Thanks Leonard, for having me today.

There's so much to unpack, right? And I guess we, let's start with the MPU.

Vinesh Sukumar:

no, absolutely. I understand.

Yeah. Your point here, you look at.

Vinesh Sukumar:

Some of the AI experiences these days, which is happening, let's say in the mobile space, in the PC space, you generally, you can bucketize them into either perception based experiences or even generative experiences. Perception has always been about detection, classification, segmentation, enhancements. They're still popular, they're still important for either camera, video, audio related use cases, and we put a lot of, investment, really make sure we improve on the speeds and feeds and quality kind of stuff. But this has been commoditized, has been in this, the perception related use cases has been there for quite some time, over a couple of years ago. I think when, Dali and Chat GPT was announced, it kind of completely revitalized the experience portfolio wherein it was all about creating synthetic data, either on the image video. or even textual space, right? So we have been, working with a lot of ecosystem partners to really put a lot more emphasis on making sure we have the right infrastructure for higher token rate of a better latency. Right now, what has happened also at the same time is in the last, I would say nine, 12 months-ish kind of stuff, is most of these experiences, especially in the generative space, has transitioned from on demand to be ubiquitous. Ubiquitous meaning it's running in the background at all times, and the moment you're trying to run it in the background at all times then becomes extremely critical that your performance per wat or power efficiency becomes extremely critical. Mm-hmm. So by definition, you have to choose an IP that does not compromise in performance and does not compromise in power efficiency. And the only IP that's able to do both of them happens to be the hexagon NPO. Right. And that is your foundation for most of the experiences, either on the PC or on the mobile platform.

That's largely for power constrained devices, right? So when we hear about the MPU, it's largely for portable laptops, right? as well as the smartphone and other device. Classes that are, closer to earbuds, there's a whole diversity of, devices that are now capitalizing on low power AI compute. And the mpu being one of those classes of IP, as you mentioned. Correct. But when we look at these emerging model architectures and models that are coming down the pike, they. Are presenting diversity themselves. Yeah. Right. Yeah. These things are looking different than what we started with when, Dolly and Chat GPT came to the scene. So maybe you can help the audience understand why heterogeneity, how the role of the MPU, and the GPU and the CPU are important as we look at the future of AI computing.

Vinesh Sukumar:

Yeah, I think it's a great question. I think when you look at, most of the experiences today either be perception or generative. It's most important is what is your problem statement? What are you trying to really solve for? What is your key performance indicators? They're gonna be experiences which mostly focusing on latency, especially prediction or planning networks. Those are quite small in nature. You can absolutely use the Orion CP for that purpose. a common example would be if you're trying to work on tabular data and you want to create, let's say, Excel graphs. You wanna come up with 10 different recommendations based on the type of the tabular data. Which means the recommended graphs needs to be instantaneous. This are like small, prediction networks, and they could absolutely run on a CP. No problem with that. If you're looking at workloads which are in the reconstruction, or rendering domain, especially in the gaming space, wherein frame rate becomes extremely important and you do want keep transitioning between the GP and the NP. All these AI workloads can absolutely run on the GPU areas. As you mentioned before, wherein you're operating in a very constrained environment form factor, but power efficiency is critical. Then you push position exa again, NPO. So you know, it really depends on how we wanna push, what is your metrics of success, and then we push for it. Mm-hmm. Now that being said, there are also certain use cases where you absolutely have to use the, multiple IP blocks. A simple example would be, let's say, a conversation. traditionally how it has been set up as you have the frontend component of automatic speech recognition, which translates your speech into text. The text goes to a large language model. And the large language model then goes back to a text of speech to come back a speech. So when you look at the very first component of automatic speech recognition, the A SR component, they can absolutely run this on the CPU. No problem with that. When you want to have large language models, especially when you wanna have higher token rate, you can push that into your NP to get this done. when you happen to have the text to speech, you can either run this on the GPU or the CPU. So if you happen to have the compute portfolio, these are three different unique models, but they map together towards a single experience and they could totally drive towards heterogeneity.

Right. And then when we start to talk about a gentech, which is this next layer on top of all the AI conversations, generative AI conversations we've had to date, how does that change the requirements set? When people think about ai, they typically think for some reason about training. And that it's all about GPUs up in the hyperscale, uber ridiculous cloud. Yeah. Right. Uh, but when we look at the edge and where AI matters, We're looking at a completely different dynamic. And then you have a gentech. Yeah. That's going to be likely a whole, collection of different types of AI applications, functions chained together, In order to, provide these new functions and features that in concert constitute this feature of the user. Interface. Right. that Christiano has been talking about for the last two days. Correct. Right. So maybe help us understand how does AgTech shape the UI of the future? And then how does that correlate with the type of compute that's gonna be required to enable that?

Vinesh Sukumar:

Yeah, great question. I think, I'll probably make a controversial statement as, please, from my personal experience, I believe inferencing on the edge is very hot. It's very tough. And, compared to that of training, right, because you have to do inference on multiple form factors, and each form factor has its own set of challenges. Mm-hmm. So from my experience, I think influencing on the edges hard, but at Qualcomm we find ways to solve the problem. Answering a second question of what does it mean from an agent DKI standpoint now we are moving towards Annapolis environment wherein we are trying to make sure most of the conversations with any device which is portable in nature, is based off voice. And when you want to do it, it's completely Annapolis environment wherein you want to understand the intent of the user. And then translate that into actions which are explainable. Mm-hmm. Repeatable, safe, and accurate. Yeah. And they're not easy. The very first step is, do you happen to have a good enough model that is able to classify the intent of the user into certain classes? Is it, mostly a navigation domain request or is it an entertainment kind of domain request, or is more a financial related kind of request? So classifying them into different domains is very critical. so the very first step is we wanna make sure we put a lot more investment in that model, and the model is accurate enough to make it, do the right amount of classification. Now once you happen to have a classification, then you need to have a small language model that is specifically trained for that specific domain to take a certain action, which is again, repeatable and accurate. The difficult problem that we have yet to solve, and I think we are doing all the work, working with the partners, is there's always gonna be limitations of small language models. These small language model at some point of time, will not be able to answer the user query, and at that point of time you have to intelligently route it to a cloud To an MCP protocol and then carry the conversation. Now, how do you make that router dynamic? It's not the predefined, right? It needs to understand that, after maybe five or six multi-term conversations, the agent is not able to help you. And at that point of time, I need to transition towards a certain cloud model. But making that intelligence very dynamic is something that we have to work on and get it done. So that's, I think, a problem area number one, that we're trying to focus and get them resolved. The other element of it is, as Christiana mentioned, and he laid the foundation for this agency, ai. Experience to be a lot more personal. And to be personal, you have to create a knowledge graph. Mm-hmm. The knowledge graph is trying to understand your emotional intelligence, your patterns, your tone, so that you are able to create a vector database that can augment the prompt. Every time you work a large model. The challenge has always been, you have multiple applications out there and every application is not willing to share the user profile data so that, you can create a much stronger knowledge graph, that is spread across multiple, applications. So that is, something that we're trying to work with industry. Try to see if there's a consortium that we can reach upon so that the user data can be shared quite securely and safely. So that, the intention is to make sure that end of the day, the human user experience should not be compromised. These are some of the elements that we still have to go figure out and make sure we get things done. But again, these are tough challenges and as always mentioned in, we allow challenges. Yeah. And we'll find a way to solve it. The first step in getting to

a solution is recognizing the problems and the challenges. Yes. And if you don't do that correct. You never move forward and so Absolutely. Yeah. Very obviously, he's thought of this stuff. I'm really looking forward to what your team is going to be doing in the future to. Resolve a lot of these challenges and issues and are bringing about an era of agentic ai as well as, AI as the ui. So thank you so much. I really appreciate you jumping on and, having me because it, it's really a pleasure to be here experiencing how Qualcomm is looking to, change. Experiences in our digital future. Absolutely.

Vinesh Sukumar:

Yeah,

Absolutely right. Not, not digital ai. Right. It's a pleasure. Thank you for, thank you so much.

Vinesh Sukumar:

Thank you.

All right. Take care everyone. Thank you.

People on this episode

Podcasts we love

Check out these other fine podcasts recommended by us, not an algorithm.

IoT Coffee Talk Artwork

IoT Coffee Talk

Leonard Rob Stephanie David Marc Rick
The IoT Show Artwork

The IoT Show

Olivier Bloch
The Internet of Things IoT Heroes show with Tom Raftery Artwork

The Internet of Things IoT Heroes show with Tom Raftery

Tom Raftery, Global IoT Evangelist, SAP