On this episode of the Six Five Webcast – AWS Serverless Series, hosts Daniel Newman and Patrick Moorhead are joined by Amazon Web Services‘ Uma Ramadoss and Eric Johnson, Principal Solutions Architect, and Principal Developer Advocate, respectively. Together, we dive into the fascinating world of Building Generative AI Applications with Serverless technology, shedding light on the advancements and best practices facilitated by AWS.

Their discussion covers:

The foundational elements of generative AI applications in a serverless environment.
Key benefits and challenges associated with building AI applications without server limitations.
Real-world examples of how businesses are leveraging AWS serverless solutions for generative AI.
Best practices and tips for developers starting with serverless AI applications.
Future directions for generative AI and serverless computing with AWS.

Learn more at Amazon Web Services, and join AWS to learn about GenAI trends within the serverless landscape.

Watch the video below, and be sure to subscribe to our YouTube channel, so you never miss an episode.

Or listen to the audio here:

Disclaimer: Six Five Webcast – AWS Serverless Series is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors, and we ask that you do not treat us as such.

Transcript:

Patrick Moorhead: The Six Five is back. And Daniel, we are talking about two really awesome topics here. This is one interview in a series of serverless interviews. We kicked it off with Holly Mesrobian. She runs VP of Server Compute, really a broad-based look at Serverless Compute, talking Lambda, ECS, Fargate and Adventure and Architectures. And then Steven and Keith did some double-clicks on there about developing future-proof integration strategies and then even touched on security. But here we are, Dan, talking about our favorite topic, generative AI and how it relates to serverless.

Daniel Newman: Pat, you knew we’d end up here.

Patrick Moorhead: Of course.

Daniel Newman: You can’t do a series of five or six podcasts in the tech industry now and not have some reversion to the mean, as I would like to say. And right now there’s only one topic that everybody on the planet cares about. And that is?

Patrick Moorhead: AI or generative AI, right?

Daniel Newman: I don’t know, I just wanted to see what you’d come up with. At least you didn’t say something silly.

Patrick Moorhead: Exactly. Hey, let’s introduce our guests, Uma and Eric, first time Six Five guests. Welcome. Let’s talk gen AI and serverless.

Eric Johnson: All right, thank you. Good to be here.

Daniel Newman: Yeah, it’s great to have you both here. And the preamble, jokes aside, it has been a really, really exciting 18, 19 months since that November, October moment in 22 when ChatGPT dropped and it just changed the whole trajectory. Pat and I love to talk about four plus decades of AI, four plus decades of algorithms. AWS has been a leader in machine learning. Some of the innovations around SageMaker, the work you’ve been doing with your partners. It’s not necessarily new, but it’s been an inflection. It’s been a significant transformation that’s gone on.

And so we’ve been covering it closely. We’ve been talking to your peers, we’ve been talking to other companies in your space, we’ve been talking to your customers. And of course we’ve been opining on our own. But hearing from you both, and Eric, I’d love to start with you. We are in this period now where it’s all about the adoption. Everyone’s saying, “Okay. Good to see all this CapEx investment, great to see all these companies building data centers.” But we want to talk about people using generative AI. Where you sit in AWS, you’re looking at organizations that are using it, they’re adopting it. What are some of the trends you’re seeing as this is starting to take place?

Eric Johnson: Oh, man. Just on this one question, we could go for hours, but I’ll try to keep it limited. I told Uma she would have her chance to talk as well. By the way, I’m Eric, that’s Uma over there if you’re confused. Yeah.

Daniel Newman: It’s all in the lower third, my friend.

Eric Johnson: That’s right, that’s right.

Patrick Moorhead: All there.

Eric Johnson: There you go. The trends we’re seeing, like I said, there’s a ton of them, but I’ll pull out some of the ones that raise to the top for me or the ones that I’m seeing. One is really the obvious. It’s the domain-specific information. With the invention or the introduction of these LLMs, these large language models that do everything. We look at them and say, “Ah, that’s going to do everything I ask it to,” but it doesn’t have some of the domain-specific information that folks are looking for. And that’s what companies want to capitalize on. They want to say, “Look, we’re going to give you the power of LLM, but we’re going to give you this private data that only we have and we’re going to put that to use to provide answers that only we can provide.” And so, they’re scrambling to build to use these powerful LLMs, but still make it very specific to their data, to their information, to everything they have to offer. That’s the first one, and I think that’s what really obvious one, and we see that a lot.

The second two I’m going to talk about are more about the technical side of it. We’re seeing some shifts in how folks are getting data. And I’ve done this talk several times, and I’m not going to go into it here, but we talk about developers utilizing data and they just want to use the LLM. We add some data to it. We have the LLM do some stuff. And for a long time we were seeing this pattern of agents where the LLM would have to, it would get an answer and they would use the agent to get more and it would just be grinding and grinding. But we’re seeing this trend leaning towards functions or function calling. And what this does is rather than having to use an agent to get in and do a lot of the data and come back to the LLM and go and come back and go and come back, what we’re asking is the LLM to just say, “Hey, give me an answer, give me a deterministic answer.” Which is LLMs are designed to give non-deterministic, but it says, “Give me a deterministic answer and then I’ll have the agent do the work.” And so it’s a much quicker way and more cost-efficient way of getting that out. And I won’t go into all the tech behind it, but this function calling idea instead of agents, we’re seeing that trend happening.

And finally, another big one, this is number three that looks like a one, but it is a three, is multi-model. And multi-model, we can look at a couple of ways. One is a multi-purpose, one model doing multiple things. Because it used to be you went to this model to get text, you went to this model to get images, but now we have models that are handling both of those. This multifaceted return or artifacts coming out of one model. The second is this use of multiple models and this idea of using the right model or the right tool for the job and say, “Okay, we’re going to use this model to do something. We found that that’s a better one. It’s interesting, and I won’t throw names out yet, but we see these models flip-flopping. This one’s, “Oh, now this one’s jumping. Now this one’s jumping, this does this a little better. This does this a little better.” As companies see that they’re starting to fine tune. We have a great example with TUI, which is a tour agency out of the UK where they’re actually using Llama to get the text, but they’re using… And Uma, help me out. What are they using for the second part of it? Do you remember? She’s going to throw it in there.

Uma Ramadoss: They’re using Anthropic.

Eric Johnson: That’s right. They’re using Anthropic cloud. Thank you very much. They’re using Llama to get the text, but using Anthropic to make it more thematic or within their tone that they want to talk. It’s this multi-model idea of getting the best of each of those. That was a little long-winded, but I hope that gives… Again, we could go on this forever, but that’s the trend I’m seeing.

Patrick Moorhead: No, I appreciate that. And all good points. It’s so funny, regardless what year you intercept IT, it’s been garbage in, garbage out. And using your own proprietary data to make the results even better, I mean it just makes sense. That was true in early day analytics. It was true in machine learning. It certainly is true with generative AI. And yes, there are some differences in that you have different types of data that generative AI can light up as opposed to the domain specific. But listen, with all the options and all the opportunities, there also come challenges. I’m going to ask you, Uma, what are some of the challenges that some of these choices, sometimes choices can be confusing, sometimes having all these options can be challenging. What are your customers running into?

Uma Ramadoss: Yeah. Before I talk about the challenges I want to highlight, serverless computing is first, customers are primarily using serverless computing for building applications that consume the foundation models. These foundation models could have been deployed in Amazon Bedrock or Amazon SageMaker, Amazon EKS, or even outside AWS. And so, I want to talk about particularly the challenges, maybe considerations that revolves around building these applications.

First one, we all obviously know this space is new, but it is rapidly evolving. New features and new models are coming to the market regularly. Like Eric said, these new features like multimodal. Models being able to understand both text and images that change the way how we are processing documents and features like function calling change the way we perceive AI agents or we are going to use AI agents. This rapid evolution, that’s where we need to be able to react quickly. We need to be building applications that react quickly to the changes. And the second consideration or challenge, Eric mentioned about it, and you also mentioned about it, garbage in, garbage out. If you want these models to generate accurate response, you should be giving them accurate, unbiased to mine specific information. Having a good data architecture is really a must. A data lake where they can periodically review, make sure they have qualitative information, they have good access control, and also sometimes human in the loop is must. And that’s a growing pain point for customers because these data, typically unstructured data, this is siloed across line of businesses.

And the third one I would say is building asynchronous and event-driven applications. For many of us, this is a mind-shift change. We have been building synchronous application for years. In my mind, asynchronous event-driven applications are critical for generative AI for two reasons. One, these models are trained with billions of parameters and so they’re going to take time to generate the response. Think about it, if you are using these models for interactive applications like chatbot, this is going to take time to generate response and this latency is going to be accurate. That’s going to result in poor user experience. And the second point is because of this scale, and these models have a limited throughput. And so when you are experimenting, which is what many of our customers are, you don’t recognize or realize the scaling challenge. But once you go to production, your user base grows and this becomes a challenge. And that’s also going to result in failure, including poor user experience.

And the last thing is about interpretability. What I mean by that is being able to identify the root cause of how this particular decision is made or finding the lineage of that data. That is something that all especially enterprise customers will be looking at. This may be simple or easy to do for document summarization kind of use cases. How about if you’re relying on LLM for complex use cases where you’re depending on decision making and reasoning support? And so I would say these are some of the challenges or actually considerations every developer must be aware of and they should be making conscious decisions in terms of technology choices or architectural choices or even design choices.

Daniel Newman: Yeah. There’s a lot there to unpack, Uma, and I appreciate you running through all of that. And of course, we know serverless has a number of different attractive qualities as well. You’ve got cost efficiencies, you’ve got scalability, you’ve got rapid time to value. These are all reasons that of course people go to the cloud. There’s a lot of parallels there. But Eric, I gave you the high levels, but as it pertains to gen AI, is it this the same trajectory, the reason people use serverless in the cloud one era is going to be the same as what I like to refer to as the cloud two or gen AI era? Is it the same long tail of reasons or what’s really driving people to go serverless with their gen AI projects?

Eric Johnson: Yeah, I would say yes and then some. Yes plus maybe is my answer.

Patrick Moorhead: Yeah, Good answer. I like that.

Eric Johnson: You like that? All right. We’ll move on. That yes plus answer is serverless accelerates building applications. We’ve seen this in how fast and I can sit on at my laptop and I can get a Lambda function with an API up and going in roughly five minutes depending on how bad I fat-finger. And yeah, maybe that’s a hello world, but I’ve taken a lot of the, I got to provision a server, I got to put the RAM in. I can’t even name some of the things that go into servers now because I’ve been doing serverless so long and we can quickly move. It’s very, very fast.

But the other aspect is, yes, you can come up fast with serverless. And the same applies to gen AI. When we’re building endpoints. And again, I go back to the story, I have no data behind this, but it’s my opinion that 97, 99, whatever percent of developers are mostly consuming gen AI. They’re not the ones training models, so to speak. We have a lot of people doing that as well, but developers are usually consuming, so they have to prepare data going in and they have to use the data coming out. And so this, how do you orchestrate that data? How do you handle that data? It’s just like any other application we’re building. We’re moving data around, we’re responding, we’re manipulating, we do all those kinds of things. Serverless really makes sense here. The flexibility of it, how fast you can come up. But the next step of that is this evolvability. And I’m not sure that’s a word, but we’ll use it for the moment.

Patrick Moorhead: Sounds good.

Eric Johnson: The evolvability of serverless. How quickly can I change from, “Okay, a Lambda function made sense here, but now if I’m doing a lot of orchestration in my code, so I think I’m going to go to a step function”? Well, it’s not taking down racks and stacks, my infrastructure is changing how that works. And we’re seeing this a lot. As we’re seeing different aspects of gen AI coming in as it evolves, serverless evolves with it. It’s this tool base that said, “Okay. This is the right tool. This fits there. That makes sense.” And so, I think those two things. And then you also get into, when you’re working with gen AI, you have to handle a lot of data.

Again, going back to this idea of, “Look, I’m putting together, so what if I am training or I’m crunching or I’m pulling all this in?” One of the cool things in the way we’ve built serverless is it’s really well known for the rich integrations into the different services. These are just one of thousands of examples. But if I’m using step functions, I can directly integrate with SageMaker, I can directly integrate with Bedrock. I can directly integrate with Lambda functions and so many other things. I can kick off batches, I can do all kinds of things without having to code for that. And literally as we’ve got it to a drag and drop and you just configure it and off you go. And so as this evolves, I can quickly say, “Oh, you know what? This needs to happen first. Let’s change that order. Let’s re-orchestrate that.” And so it gives us, this direct integration lets us move very, very quickly as we’re doing that.

Patrick Moorhead: Yeah. I appreciate-

Eric Johnson: Finally… I’m sorry?

Patrick Moorhead: Sorry. Go right ahead. Sorry.

Eric Johnson: You wait till I’m done talking, you’ll never get a chance, but I’ll just throw this in real quick. Finally, the scale. The scale of serverless, the way we can scale up and scale back down is really critical to any app. And even more so when you’re building gen AI apps. It allows us, we come up and down and it also, serverless, I said finally, but I will throw this tag in as well. Serverless, it leads the developer to build asynchronously. One of the things with gen AI is there’s a lot of waiting right now, isn’t there? I sent something and if I don’t get an answer back and we’ve trained our users that if I don’t get an answer back, it’s broken. With serverless, it allows you, “Hey, I acknowledge your request, but I have to wait for a while.” And serverless just inherently is asynchronous and allows you to build that architecture quickly, easily, and efficiently.

Patrick Moorhead: Yeah, I appreciate you filling in the blanks there and even you even walked us through some common use cases developers are using serverless for. I’d like to shift to Uma here. And we’ve got room for one more question. Let’s do the double click on one use case. Uma, I’m going to pick RAG because it seems to be a very popular one out there. And maybe dive deeper into how you’ve seen developers solve RAG with serverless.

Uma Ramadoss: Yeah. RAG is also Eric’s favorite topic. Yeah.

Patrick Moorhead: We’re going to give this question to you though. It might be his favorite, but you know.

Daniel Newman: He’ll sit there and he’s going to have to just shake a little while he wants to talk.

Uma Ramadoss: Right. Right. Let me start with an intro to RAG. RAG is basically just very popular these days. It’s a popular and cost-effective way to provide LLMs with domain specific information. As the name says, it has three parts, retrieval, augmentation and generation. You have a prompt, you retrieve relevant data for that prompt. You augment the prompt with the retrieved data and then ask the LLM to generate. Obviously it’s going to give you a more accurate response because you’ve augmented it with relevant information. To me, the retrieval part is really, really important. It’s actually happens at a database and that’s typically a vector database.

And it’s not just one document stored in that vector database. It’s a large amount of documents stored in that vector database. Customers typically run like a pipeline, it’s called RAG pipeline to build this database. And this pipeline consists of many steps. It generally starts from sourcing the document from many different places. Eric talked about integration, rich integration. Serverless rich integrations helps here. You source documents from many places and then you iterate on these documents. And then you create or transform the data to image or text format because that’s what is understandable by the embedding model that’s going to create that vector data. And the transformation is important because this data is unstructured data. This is audio file, video file, PowerPoint presentation. You transform the data and you use an embedding model to generate the vector data and you store that vector data into a database and it doesn’t stop there. You also need to validate this new data is now giving you a more accurate response, so there is also a validation step. As we can see, there are a number of steps in this process, and these are distinct steps, but they need to be orchestrated in a sequence.

And so our customers use AWS step functions for orchestration. And Eric talked about the number of benefits of AWS step functions, but I want to call out two features that’s why customers choose for building RAG pipeline. Number one is I talked about a vast amount of data. You need to be able to iterate on that data effortlessly and also have to manage concurrency. Managing concurrency is important here because you don’t want to inundate these models or the vector database with too many requests. Customers use step functions distributed map to iterate on these documents, and they also can manage concurrency with that.

And then number two is this is and the scale of this document is really vast. What happens if there is a failure when you’re processing the 1,000th file? Now if you’re writing it as a program and running it in a EC2 instance, would you restart this processing for all the 100,000 files that you are processing? Failures are natural and especially when you are dealing with the data. And so, step functions offer very sophisticated failure handling, even being able to handle failures from the time of failure, from where the file or whatever the processing that failed. Yeah. I think those are the two reasons, but I also want to call out RAG is not the answer for everything. You can augment but from various other ways as well. For example, you may be having dynamic data, like an inventory or price or those kinds of information better comes from API than RAG.

Patrick Moorhead: Totally. Yeah. RAG might be the panacea for most, but it doesn’t fix everything, huh? Yeah. Sorry. Sorry, Dan.

Daniel Newman: No, I mean look, you gave the playbook. And the playbook is there are these various best understood use cases. I think we are still in the infancy. I think what we all agree from an opportunistic standpoint is finding the way to pair the power of a large language model, whether that is anthropic or any of the others that we’ve discussed here. And really put your high value proprietary data to use remains the killer workload right now for so many businesses. The techniques, RAG and fine-tuning and the combinations of these things are all going to continue to evolve. And hopefully technologies like AWS has and serverless computing either financially makes it more available, or makes it a lower barrier of entry for people to take advantage of these capabilities. And of course, as many of our enterprises are aware, we don’t maximize our data. Generative AI and these tools can get us there.

We could talk about this for a long time, but Eric and Uma, we do have to run. I did want to say thank you both. I like the amount of detail and preparation that you put into this. Clearly, you’re doing this every day and the world needs to hear about that. The world needs to hear about not just how this can work, but how it is working. Because I do believe, and I think Pat would agree that AI is one of the most formidable and important inflection points in our lifetime and it’s only going to get faster from here. We’ll keep talking. Eric, Uma, hope to have you back on the show sometime soon. Thanks so much for joining us here on The Six Five.

Uma Ramadoss: Thank you.

Eric Johnson: Thank you.

Daniel Newman: All right, everybody. Hit that subscribe button. You are here with The Six Five. We are On the Road. This is part of a multi-part AWS serverless Six Five series. Stick with us. We appreciate you tuning in. See you all later.

Author Information

Daniel Newman

Daniel is the CEO of The Futurum Group. Living his life at the intersection of people and technology, Daniel works with the world’s largest technology brands exploring Digital Transformation and how it is influencing the enterprise.

From the leading edge of AI to global technology policy, Daniel makes the connections between business, people and tech that are required for companies to benefit most from their technology investments. Daniel is a top 5 globally ranked industry analyst and his ideas are regularly cited or shared in television appearances by CNBC, Bloomberg, Wall Street Journal and hundreds of other sites around the world.

A 7x Best-Selling Author including his most recent book “Human/Machine.” Daniel is also a Forbes and MarketWatch (Dow Jones) contributor.

An MBA and Former Graduate Adjunct Faculty, Daniel is an Austin Texas transplant after 40 years in Chicago. His speaking takes him around the world each year as he shares his vision of the role technology will play in our future.

Author Information

Daniel Newman

The Six Five Pod | EP 265: Will AI Take Your Job? Plus Intel’s Quiet Transformation & AMD’s Opportunity in the AI Chip Race

Transforming Workplace Experiences with Logitech + Microsoft – Six Five Media Webcast

Meta and Oakley Launch Performance AI Glasses With 3K Video and Built-in Meta AI

Data is Your Strategy: Building Tomorrow Begins with Your Storage Infrastructure – Six Five In The Booth at HPE Discover Las Vegas 2025

The Futurum Group

Building Generative AI Applications with Serverless – Six Five Webcast – AWS Serverless Series

Author Information

Daniel Newman

SHARE:

Latest Insights:

The Six Five Pod | EP 265: Will AI Take Your Job? Plus Intel’s Quiet Transformation & AMD’s Opportunity in the AI Chip Race

Transforming Workplace Experiences with Logitech + Microsoft – Six Five Media Webcast

Meta and Oakley Launch Performance AI Glasses With 3K Video and Built-in Meta AI

Data is Your Strategy: Building Tomorrow Begins with Your Storage Infrastructure – Six Five In The Booth at HPE Discover Las Vegas 2025

The Futurum Group

Welcome to The Futurum Group

Book a Demo

Thank you, we received your request, a member of our team will be in contact with you.