The Futurum Group at Marvell’s Industry Analyst Day 2023 with Loi Nguyen, PhD – Futurum Tech Webcast

The Futurum Group at Marvell's Industry Analyst Day 2023 with Loi Nguyen, PhD - Futurum Tech Webcast

On this episode of the Futurum Tech Webcast – Interview Series, host Daniel Newman welcomes Loi Nguyen, PhD, EVP Optical & Copper Connectivity at Marvell Technology for a conversation on how Marvell’s Optical business has changed under the influence of AI, including bandwidth, power and the key technologies enabling these shifts.

Their discussion covers:

  • The evolution of Marvell’s Optical business, with the increased networking demands of AI
  • How optical modules have developed in terms of bandwidth and power
  • The infrastructure needs of inference, within data centers and beyond
  • Silicon photonics: what they are and why they are important

Watch the video below, and be sure to subscribe to our YouTube channel, so you never miss an episode.

Listen to the audio here:

Or grab the audio on your streaming platform of choice here:

Disclaimer: The Futurum Tech Webcast is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded, and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors, and we ask that you do not treat us as such.

Transcript:

Daniel Newman: Hi, everyone. Welcome to the Futurum Tech podcast. I’m your host, Daniel Newman, CEO of The Futurum Group here in beautiful Santa Clara by the Bay at Marvell’s offices here. Joined by Loi, today. We’re going to be talking a little bit about … Well, first of all, we’re going to talk about AI, but that’s not the only thing. We’re going to talk a little bit about optical. We’re going to talk about light. We’re going to talk about some of the info that you shared here at the Industry Analyst Day. But first of all, thanks so much for joining. How are you doing today?

Loi Nguyen: Good. Good. Excellent.

Daniel Newman: It’s great to have you here. First of all, I always love coming out this way, so much energy. It is December of 2023, and yet the Silicon Valley is booming and there are announcements it feels like every single day. So many of them are around what’s going on. Generative AI has kind of taken the year by storm, but there’s so much to be done to make this work. I had the chance to sit down with you, watch your presentation here at the event.

Loi Nguyen: Oh. Good.

Daniel Newman: Really appreciate you doing that. I’m going hit you on all that a little bit. You came in over the last couple of years and, excuse me, I don’t have the exact date in front of me, but Inphi was acquired …

Loi Nguyen: April 21st, 2021.

Daniel Newman: April ’21. I actually did a podcast with … I think you were on it with me. We were still in semi-shutdown time.

Loi Nguyen: Yes.

Daniel Newman: I was in the back of an SUV and we were on video together recording, and I was catching up.

Loi Nguyen: Yes.

Daniel Newman: Because it was a big piece of news, but we weren’t traveling yet. We couldn’t come into an office and sit down like this.

Loi Nguyen: Right. Exactly.

Daniel Newman: It was a huge moment. Big for Marvell. Big for Inphi. But give a little bit of your background and tell us a little bit about what you’re working on here right now at Marvell.

Loi Nguyen: Okay. Great. So I was co-founder of Inphi. I started the company 20 some years ago and grew it to a leader in optical interconnect. And then, COVID hit and Inphi brought out all the wonderful chips that connect people in the world together, and Marvell made an offer that the Inphi board couldn’t refuse. And so, I joined Marvell as executive VP reporting to Matt Murphy, the CEO, and I’m running the optical group.

Daniel Newman: I did like your story though when you were up on stage. You were talking about how, when you got into the business, you were looking for a company to acquire that was doing what you ended up doing.

Loi Nguyen: Yes.

Daniel Newman: You said, “Well, we didn’t really find anyone,” so that kind of was the beginning of your trajectory. By accident, as you were looking for a company to buy, you found that there was an opportunity, that no one was really doing what you saw as the biggest opportunity.

Loi Nguyen: Yeah. That was like 2010 to 2012. My dream was to build an optical integrated circuit that could integrate hundreds and thousands and eventually hundreds of thousands of components that will process light the way an integrated circuit does with electrons today. At the time, there were three or four startups, but they were all struggling and they did not really have a lot of technology. So I decided I’m going to go and build it myself. Of course, not by myself. I had to go and find a team, build a team, but add one by one and build the team into a really world-class team in silicon photonics.

Daniel Newman: Even Woz had a team. Nobody does it all by themselves. Some people know how to handle a soldering iron, but there’s a little bit more to it than that.

Loi Nguyen: Exactly.

Daniel Newman: Well, congratulations on the rocket ship growth. It’s always a great indicator that you were right when the bigger company comes and knocks on your door. We spent all the time here today talking about … It was almost all about AI, all about accelerated computing. You focused a lot on the business and the optics business. How do you generally … Accelerated computing and big GPUs, that’s been the story of the year. Everyone’s talking about that, but there’s a lot more that’s required to do what’s going to need to be done with AI. Talk a little bit about what else the industry should be focusing on besides just the compute.

Loi Nguyen: Absolutely. Because in the presentation, both Raghib and I discussed, and Noam too, that the generative AI model side has gotten so big now. 10X a year over the last 10 years. Today, to train a model like GPT-3, no single computer or AI server can actually handle the workload. Even with 1,000 GPUs, it still takes more than a month to train one single GPT-3 model. That’s why there is a race to go and build the larger and larger AI clusters to really reduce the training time to something very manageable. To do that, really the connectivity is what allows the different clusters … Thousands of servers can be connected together. And in fact, today, in the largest AI clusters that have thousands or ten-thousands of these GPUs, the performance of the connectivity of the network or the bandwidth, the latency, actually determines … In a large part, they perform in a home cluster, so the pillar of AI is compute and connectivity.

Daniel Newman: We hear the term cluster a lot in the industry and it’s kind of thrown around sometimes. I don’t fully think people appreciate it, but what goes into that? What goes into making one of these big clusters? Especially, in regards to the stuff you’re working on.

Loi Nguyen: So a cluster is just a collection of servers. A server could have eight GPUs. It could have 10 GPUs.

Daniel Newman: But you’re talking hundreds or thousands at one point, right?

Loi Nguyen: Exactly. Exactly. A cluster is a way to connect the different servers, different GPUs together to make them behave as a single supercomputer, data center-class computer, that does certain things in tandem. Each of the servers has one chunk of the workload. They do a lot of competition and they have to wait and they exchange data. They do a little bit more and they stop, they exchange data. That is why the performance of the network, the connectivity, the bandwidth, the latency determine the performance overall. Because servers, GPUs, they need the data from other GPUs. Inside the clusters today, there are 1,000 AI servers, tens of thousands of GPUs, thousands of switches to connect them, tens of thousands of active cables.

And the cable could be electrical active cable, or it could be optical active cable for the shorter reach interconnect. And then, as the AI cluster gets bigger and bigger and bigger to occupy the home data center for the longer reach over, say, five meters today, then you need to totally go with the optics. And so, there will be tens of thousands of optics. These clusters being built today, basically, the next target is to get to 100 extra FLOPS of computing power that need 100 petabits of per second of bandwidth to connect all of them. And this cluster will probably consume somewhere around 100 megawatts.

Daniel Newman: But when you break it down into, say, a server with two GPUs, you can kind of say, “Okay. The two GPUs need to talk.” And then, you create another server that has two. And then, the two have to talk, and then the two have to talk, and you’ve got to connect. I don’t think people can always visualize, but now when you start to get into thousands … What I’m saying is it’s usually a mix of electrical, of optical, and it scales very quickly.

Loi Nguyen: Yes. Yes.

Daniel Newman: It was a really nice explanation you gave about how they’re doing a little bit of work and then they stop. Because it’s not a real stop, the goal is to make that period in which they stop almost as close to zero. Right?

Loi Nguyen: Yes.

Daniel Newman: Removing all the latency out. That’s exactly right. That’s exactly right. That is it.

Loi Nguyen: You’re talking milliseconds, and you want fractions. Exactly.

Daniel Newman: That’s been possible … Especially, with large clusters with a big distance, the only way that becomes possible is you’ve got to move at the speed of light.

Loi Nguyen: Yes.

Daniel Newman: Talk a little bit about how much optical has changed, because it’s become very commercially viable, in many cases where it wasn’t a decade ago, over the past 10 years.

Loi Nguyen: Absolutely. So in terms of … Optical communication has been really the backbone, first of the data center, then cloud computing, and now moving to AI. Today, anything more than a few meters, more than half the size of this room, you need optics. Because the bandwidth is going so fast, you need to travel the speed light. Only fiber can have the bandwidth to do that. The bandwidth of optical, especially things in applicable optics, which is the dominant form factor today, has grown about 40 times over the past 10 years. 10 years ago, people were excited when they get 40 gigabits in applicable optics. Today, it’s 1.6 terabits per second. So 40X increase in 10 years.

And the industry has done a pretty decent job in terms of keeping the power consumption not scaling as much as with bandwidth. So the power consumption has gone grown only effectively eight times during the 10 years time span while the bandwidth gone up 40X. But nevertheless, because with AI, a lot more bandwidth needs to be used, the power of connectivity is also becoming important. Because when I talk about a cluster size of 10,000 sub-GPUs and consuming 100 megawatts of power, the power that you need for connectivity, meaning for the optics and the switches, is about 12%, which is quite substantial.

Daniel Newman: One of the things I talk quite a bit about is we’re seeing this really high linear growth rate for training. There’s a lot of focus on that right now. But the real application, the killer app when you talk about GPT is inference. The killer app that everybody’s consuming day in and day out is, “Hey. I’m using natural language talking to this thing.” And so, inference is something that is going to require more infrastructure too. Right now, when we talk all about … We talk a lot about training. Kind of the whole example you gave. It’s training, training, training. We will use cloud-optimized silicon to do inference. FPGAs will do inference. You’ll be doing inference on general purpose, compute, and we’re seeing lots of different variants. How do we see the infrastructure and the accelerated infrastructure shift in the era when inference growth becomes more and more important?

Loi Nguyen: That’s a really good question. When you talk about the training itself, let’s just say, a hyperscale provider, they need to train X number of models. They may need X number of training clusters, but those are … I don’t know. Maybe 10. Maybe 100. When you talk about inference, now you are talking about not even thousands, but you need to talk about tens of thousands or even more number of inference machines. An inference machine will come in all kinds of different sizes, different workloads. There are inferences that process natural language that still require a lot of computing power. But there are inferences that can say, “This thing between a dog and a cat.” That doesn’t require much power. It can be on your phone.

You can be on your phone and you take a picture, “What’s this plant? What is it?” It will tell you what plant it is. That is a much smaller inference machine, but nevertheless, the way for inference to work is that inference is sensitive to end-to-end latency between the users and the machine. When you ask them to do some something or you’re talking to them, the inference machine needs to start processing while you’re talking, so that they can anticipate what your next word is going to be, so that they can start preparing the answer, whatever that is. Latency is critical, so inference will drive the infrastructure much wider globally. You think about inference machines need to be deployed globally, to the local market, to have different inference machines to do different things.

Daniel Newman: That’s right. We’re hearing about an era of a new kind of personal computing. New AI PCs, because that’ll be about being able to do more inference locally. That token. We’ve all seen with some of the different models … Some would be writing as you were talking to it, it’d be starting the answer. Others would wait and you’d see it spinning. We’re not very patient people.

Loi Nguyen: Yes. We’re not.

Daniel Newman: We’re like, “I want an answer now. I don’t want to wait 10 seconds.” So the idea is right. You’ve got to bring the compute close. You’ve got to have the compute in the edge, in the regional, and in the core data center, and you’ve got to optically connect all of these things in order to make that. Because what we want is a completely ubiquitous experience. We want it to be like I’m talking to you.

Loi Nguyen: Absolutely.

Daniel Newman: You’re going to react just as quickly.

Loi Nguyen: But I’m not smart enough to have a trillion models in my head.

Daniel Newman: A trillion parameters. A trillion parameters with Loi. Look, I’ve got just a couple of minutes left, but I would love to spend the end of this conversation on silicon photonics and the need for silicon photonics. Marvell’s story.

Loi Nguyen: Absolutely.

Daniel Newman: How important do you see that as part of Marvell’s future?

Loi Nguyen: I think with AI, as we just talked about, optics is the key part to connect different machines together. Whether it’s training or inference or whatever. Bandwidth demands continue to rise even more so with AI, and silicon photonics is the technology that can scale. Because it fundamentally is an integrated circuit technology. Optic, but still integrated. Compared to the traditional optics today, which is the discreet laser, you have to do chip and wire … It’s pretty hard to continue to scale forever. The industry has done a good job in scaling bandwidth 40X over the past 10 years, but I think it’ll be hard to scale another 40X in the next five years.

The pace of innovation will accelerate. You need integrated optics technology and that’s what silicon photonics brings to the table. And so, I’ve been working with silicon for 10 years, and I’m sincerely happy to see that now there’s a catalyst really driving it to integrate for more, basically, terabits class of bandwidth on a single chip. And I think it will play a very important part of Marvell’s toolbox going forward. Today, the silicon photonic is already creating a class of products for Marvell to create very high-performance applicable optics, connecting data center to data center together. AI will be the next wave, an emerging application for silicon photonics. So I’m really happy to see that.

Daniel Newman: I don’t know if there’s been a forcing function for technology innovation that’s ever come on the way AI has come on in such a short period of time.

Loi Nguyen: Absolutely.

Daniel Newman: Having said that, I always say in this part of the world, we’re here in the Silicon Valley talking … There’s nothing better than when the entire industry as a whole finds that next pot of gold and starts running towards it, because the innovation is just overwhelming and it changes people’s lives. It doesn’t just change them here, it changes them everywhere in the world. Thank you, Loi, for your contributions. Appreciate you sitting down with me today.

Loi Nguyen: Thank you very much for talking to me today. It’s been a pleasure.

Daniel Newman: Absolutely. All right, everyone. Hit that Subscribe button. Join us for all of our episodes here on the Futurum Tech podcast. We’re here at Marvell by the Bay in Santa Clara. Industry Analyst Day. Getting all the inside scoop on where we are heading. Generative AI, AI-accelerated computing and infrastructure, and so much more. For this episode though, it’s time to say goodbye. See you all really soon.

Author Information

Daniel is the CEO of The Futurum Group. Living his life at the intersection of people and technology, Daniel works with the world’s largest technology brands exploring Digital Transformation and how it is influencing the enterprise.

From the leading edge of AI to global technology policy, Daniel makes the connections between business, people and tech that are required for companies to benefit most from their technology investments. Daniel is a top 5 globally ranked industry analyst and his ideas are regularly cited or shared in television appearances by CNBC, Bloomberg, Wall Street Journal and hundreds of other sites around the world.

A 7x Best-Selling Author including his most recent book “Human/Machine.” Daniel is also a Forbes and MarketWatch (Dow Jones) contributor.

An MBA and Former Graduate Adjunct Faculty, Daniel is an Austin Texas transplant after 40 years in Chicago. His speaking takes him around the world each year as he shares his vision of the role technology will play in our future.

SHARE:

Latest Insights:

Daniel Newman sees 2025 as the year of agentic AI with the ability to take AI and create and hyperscale your business by maximizing and automating processes. Daniel relays to Patrick Moorhead that there's about $4 trillion of cost that can be taken out of the labor pool to drive the future of agentics.
On this episode of The Six Five Webcast, hosts Patrick Moorhead and Daniel Newman discuss Microsoft, Google, Meta, AI regulations and more!
Oracle’s Latest Exadata X11M Platform Delivers Key Enhancements in Performance, Efficiency, and Energy Conservation for AI and Data Workloads
Futurum’s Ron Westfall examines why Exadata X11M allows customers to decide where they want to gain the best performance for their Oracle Database workloads from new levels of price performance, consolidation, and efficiency alongside savings in hardware, power and cooling, and data center space.
Lenovo’s CES 2025 Lineup Included Two New AI-Powered ThinkPad X9 Prosumer PCs for Hybrid Workers
Olivier Blanchard, Research Director at The Futurum Group, shares his insights on how Lenovo’s new Aura Edition ThinkPad X9 prosumer PCs help the company maximize Intel’s new Core Ultra processors to deliver a richer and more differentiated AI feature set on premium tier Copilot+ PCs to hybrid workers.

Thank you, we received your request, a member of our team will be in contact with you.