Menu

Gemma and Building Your Own LLM AI – Google Cloud AI at AI Field Day 4

Gemma and Building Your Own LLM AI – Google Cloud AI at AI Field Day 4

Introduction

The Google Cloud AI team presented at AI Field Day 4, telling us about the Gemma large language model (LLM) and what Google Cloud infrastructure you could use to build your own LLM AI.

The Google Cloud has always been known as an excellent platform for analytics and AI. Google Cloud AI build models like the newly released Gemma family of AI models. Gemma uses the same technologies as the Gemini LLMs but with a smaller parameter count to reduce the resources required for inference. Gemma also continues Google’s work on AI safety, trying to avoid chatbots that become radical or generate inappropriate representations in art or video.

We heard a little about Gemma at AI Field Day 4. At least one delegate had already had some hands-on time, even though Gemma had only been released the previous day. The Google Cloud AI presentation focused more on how these models are trained on the same type of infrastructure that Google Cloud offers customers.

The Google Kubernetes Engine (GKE) is central, allowing massive scale out of the compute requirement to train an LLM. In particular, training an LLM needs enormous scale out of accelerated processing, such as adding a TensorFlow Processing Unit (TPU) to each training node. GKE supports TPU-equipped computing and scaling out. GKE can handle 15,000 compute nodes and 50,000 TPUs in a single cluster. You can imagine that not all Google Cloud locations have all that capacity. Some scheduling challenges might be caused by customers occupying that amount of resources for weeks to train a new foundation LLM. Google expects only model training to require massive resourcing, and inference should have more modest resource requirements.

The idea that training to create the model and inference where the model is used has different requirements was repeated throughout AI Field Day 4. The Google Cloud AI team see the vast majority of inference using CPUs, available in various configurations in every Google Cloud location. Some of the development of LLMs will focus on reducing the cost of inference. Gemma is one of the examples. The two Gemma models have 2 billion and 7 Billion parameters yet can, according to Google, provide results as good as Meta’s Llama-2 model, which uses up to 70 billion parameters. Fewer parameters mean less memory and CPU for the model and less cost to get an answer. This smaller footprint and cost will be vital as we start to see AI and LLMs built-in to products rather than being the product. Running these LLMs on the cloud will continue to be the most efficient use of resources because these workloads will have significant variations in resource usage. It is easy to see why Google is developing and embedding LLMs into products.

One of the LLM-based products we saw in the Google presentation was Google Duet, an AI-based assistant for various computing tasks. The demo we saw at AI Field Day was of Duet AI for Developers, which assists with software development and troubleshooting. There is code suggestion, helping developers avoid the repeated work of writing initialization code or quickly get familiar with new services and APIs. We saw the use of Duet to identify the cause of an error message in a log by analyzing the source code that generated the code. Although it was not demonstrated, automated unit testing is a handy component. Good testing is vital to DevOps velocity without compromising safety. Another interesting Duet feature is code explanation. Hopefully, that will tell me what the code I wrote last year is supposed to do! I can see why Jensen Huang of NVIDIA says kids don’t need to learn to code; AI will do the coding. With all this automated AI, kids and adults need to learn critical thinking and AI prompt engineering.

Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.

Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.

Other Insights from The Futurum Group:

Google Cloud Engineering Exec: Welcome to Generative Engineering

Google Enhances GKE With Advanced Security, “Cluster Fleet” Management

Google Cloud Set to Launch NVIDIA-Powered A3 GPU Virtual Machines

Author Information

Alastair has made a twenty-year career out of helping people understand complex IT infrastructure and how to build solutions that fulfil business needs. Much of his career has included teaching official training courses for vendors, including HPE, VMware, and AWS. Alastair has written hundreds of analyst articles and papers exploring products and topics around on-premises infrastructure and virtualization and getting the most out of public cloud and hybrid infrastructure. Alastair has also been involved in community-driven, practitioner-led education through the vBrownBag podcast and the vBrownBag TechTalks.

Related Insights
OpenAI Frontier Close the Enterprise AI Opportunity Gap—or Widen It
February 9, 2026

OpenAI Frontier: Close the Enterprise AI Opportunity Gap—or Widen It?

Futurum Research Analysts Mitch Ashley, Keith Kirkpatrick, Fernando Montenegro, Nick Patience, and Brad Shimmin examine OpenAI Frontier and whether enterprise AI agents can finally move from pilots to production. The...
Amazon Q4 FY 2025 Revenue Beat, AWS +24% Amid $200B Capex Plan
February 9, 2026

Amazon Q4 FY 2025: Revenue Beat, AWS +24% Amid $200B Capex Plan

Futurum Research reviews Amazon’s Q4 FY 2025 results, highlighting AWS acceleration from AI workloads, expanding custom silicon use, and an AI-led FY 2026 capex plan shaped by satellite and international...
Arm Q3 FY 2026 Earnings Highlight AI-Driven Royalty Momentum
February 6, 2026

Arm Q3 FY 2026 Earnings Highlight AI-Driven Royalty Momentum

Futurum Research analyzes Arm’s Q3 FY 2026 results, highlighting CPU-led AI inference momentum, CSS-driven royalty leverage, and diversification across data center, edge, and automotive, with guidance pointing to continued growth....
Qualcomm Q1 FY 2026 Earnings Record Revenue, Memory Headwinds
February 6, 2026

Qualcomm Q1 FY 2026 Earnings: Record Revenue, Memory Headwinds

Futurum Research analyzes Qualcomm’s Q1 FY 2026 earnings, highlighting AI-native device momentum, Snapdragon X PCs, and automotive SDV traction amid near-term handset build constraints from industry-wide memory tightness....
Alphabet Q4 FY 2025 Highlights Cloud Acceleration and Enterprise AI Momentum
February 6, 2026

Alphabet Q4 FY 2025 Highlights Cloud Acceleration and Enterprise AI Momentum

Nick Patience, VP and AI Practice Lead at Futurum analyzes Alphabet’s Q4 FY 2025 results, highlighting AI-driven momentum across Cloud and Search, Gemini scale, and 2026 capex priorities to expand...
Amazon CES 2026 Do Ring, Fire TV, and Alexa+ Add Up to One Strategy
February 5, 2026

Amazon CES 2026: Do Ring, Fire TV, and Alexa+ Add Up to One Strategy?

Olivier Blanchard, Research Director at The Futurum Group, examines Amazon’s CES 2026 announcements across Ring, Fire TV, and Alexa+, focusing on AI-powered security, faster interfaces, and expanded assistant access across...

Book a Demo

Newsletter Sign-up Form

Get important insights straight to your inbox, receive first looks at eBooks, exclusive event invitations, custom content, and more. We promise not to spam you or sell your name to anyone. You can always unsubscribe at any time.

All fields are required






Thank you, we received your request, a member of our team will be in contact with you.