Gemma and Building Your Own LLM AI – Google Cloud AI at AI Field Day 4

Gemma and Building Your Own LLM AI – Google Cloud AI at AI Field Day 4

Introduction

The Google Cloud AI team presented at AI Field Day 4, telling us about the Gemma large language model (LLM) and what Google Cloud infrastructure you could use to build your own LLM AI.

The Google Cloud has always been known as an excellent platform for analytics and AI. Google Cloud AI build models like the newly released Gemma family of AI models. Gemma uses the same technologies as the Gemini LLMs but with a smaller parameter count to reduce the resources required for inference. Gemma also continues Google’s work on AI safety, trying to avoid chatbots that become radical or generate inappropriate representations in art or video.

We heard a little about Gemma at AI Field Day 4. At least one delegate had already had some hands-on time, even though Gemma had only been released the previous day. The Google Cloud AI presentation focused more on how these models are trained on the same type of infrastructure that Google Cloud offers customers.

The Google Kubernetes Engine (GKE) is central, allowing massive scale out of the compute requirement to train an LLM. In particular, training an LLM needs enormous scale out of accelerated processing, such as adding a TensorFlow Processing Unit (TPU) to each training node. GKE supports TPU-equipped computing and scaling out. GKE can handle 15,000 compute nodes and 50,000 TPUs in a single cluster. You can imagine that not all Google Cloud locations have all that capacity. Some scheduling challenges might be caused by customers occupying that amount of resources for weeks to train a new foundation LLM. Google expects only model training to require massive resourcing, and inference should have more modest resource requirements.

The idea that training to create the model and inference where the model is used has different requirements was repeated throughout AI Field Day 4. The Google Cloud AI team see the vast majority of inference using CPUs, available in various configurations in every Google Cloud location. Some of the development of LLMs will focus on reducing the cost of inference. Gemma is one of the examples. The two Gemma models have 2 billion and 7 Billion parameters yet can, according to Google, provide results as good as Meta’s Llama-2 model, which uses up to 70 billion parameters. Fewer parameters mean less memory and CPU for the model and less cost to get an answer. This smaller footprint and cost will be vital as we start to see AI and LLMs built-in to products rather than being the product. Running these LLMs on the cloud will continue to be the most efficient use of resources because these workloads will have significant variations in resource usage. It is easy to see why Google is developing and embedding LLMs into products.

One of the LLM-based products we saw in the Google presentation was Google Duet, an AI-based assistant for various computing tasks. The demo we saw at AI Field Day was of Duet AI for Developers, which assists with software development and troubleshooting. There is code suggestion, helping developers avoid the repeated work of writing initialization code or quickly get familiar with new services and APIs. We saw the use of Duet to identify the cause of an error message in a log by analyzing the source code that generated the code. Although it was not demonstrated, automated unit testing is a handy component. Good testing is vital to DevOps velocity without compromising safety. Another interesting Duet feature is code explanation. Hopefully, that will tell me what the code I wrote last year is supposed to do! I can see why Jensen Huang of NVIDIA says kids don’t need to learn to code; AI will do the coding. With all this automated AI, kids and adults need to learn critical thinking and AI prompt engineering.

Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.

Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.

Other Insights from The Futurum Group:

Google Cloud Engineering Exec: Welcome to Generative Engineering

Google Enhances GKE With Advanced Security, “Cluster Fleet” Management

Google Cloud Set to Launch NVIDIA-Powered A3 GPU Virtual Machines

Author Information

Alastair has made a twenty-year career out of helping people understand complex IT infrastructure and how to build solutions that fulfil business needs. Much of his career has included teaching official training courses for vendors, including HPE, VMware, and AWS. Alastair has written hundreds of analyst articles and papers exploring products and topics around on-premises infrastructure and virtualization and getting the most out of public cloud and hybrid infrastructure. Alastair has also been involved in community-driven, practitioner-led education through the vBrownBag podcast and the vBrownBag TechTalks.

Related Insights
Amazon Q1 FY 2026: AWS Momentum Builds as AI Infrastructure Spend Surges
May 4, 2026

Amazon Q1 FY 2026: AWS Momentum Builds as AI Infrastructure Spend Surges

Futurum Research analyzes Amazon’s Q1 FY 2026 earnings, focusing on AWS re-acceleration, custom silicon expansion, and agentic AI product moves shaping near-term spending and longer-term positioning....
Microsoft Q3 FY 2026 Earnings Show Cloud Growth, With Capacity Still Tight
May 4, 2026

Microsoft Q3 FY 2026 Earnings Show Cloud Growth, With Capacity Still Tight

Brad Shimmin and Futurum Research analyze Microsoft Q3 FY 2026 earnings, focusing on cloud demand, Azure capacity constraints, Copilot usage intensity, and the shift toward user plus usage commercial models....
Agentic ERP Model
May 1, 2026

Can NetSuite’s Agentic ERP Model Survive the SaaS ‘Apocalypse’ and Win the Next AI Platform War?

Keith Kirkpatrick, Vice President & Research Director, Enterprise Software & Digital Workflows at Futurum, examines how NetSuite's agentic ERP model aims to deliver real AI ROI and counter the fragmenting...
Fusion Applications
May 1, 2026

Oracle Bets on Outcome-Driven AI Agents, But Will Enterprises Buy the Vision?

Keith Kirkpatrick, Vice President & Research Director, Enterprise Software & Di at Futurum, examines Oracle's pivot toward AI agents embedded in Fusion Applications, analyzing enterprise demand for measurable business value,...
Marketplace Integration
May 1, 2026

Assessing Ingram Micro’s Q1 2026: Cyclical Growth or Structural Channel Shift?

Ingram Micro's Q1 2026 results show distributors must shift from logistics to marketplace orchestrators or risk disintermediation as CIOs consolidate platforms and adopt AI....
Microsoft Dynamics 365
May 1, 2026

Is Microsoft Dynamics 365 Contact Center the Catalyst for Agentic CX at Scale?

Keith Kirkpatrick, Vice President & Research Director, Enterprise Software & Di at Futurum, Microsoft Dynamics 365 Contact Center's coordinated AI agents transform customer experience orchestration, challenging fragmented legacy solutions....

Book a Demo

Newsletter Sign-up Form

Get important insights straight to your inbox, receive first looks at eBooks, exclusive event invitations, custom content, and more. We promise not to spam you or sell your name to anyone. You can always unsubscribe at any time.

All fields are required






Thank you, we received your request, a member of our team will be in contact with you.