Menu

Gemma and Building Your Own LLM AI – Google Cloud AI at AI Field Day 4

Gemma and Building Your Own LLM AI – Google Cloud AI at AI Field Day 4

Introduction

The Google Cloud AI team presented at AI Field Day 4, telling us about the Gemma large language model (LLM) and what Google Cloud infrastructure you could use to build your own LLM AI.

The Google Cloud has always been known as an excellent platform for analytics and AI. Google Cloud AI build models like the newly released Gemma family of AI models. Gemma uses the same technologies as the Gemini LLMs but with a smaller parameter count to reduce the resources required for inference. Gemma also continues Google’s work on AI safety, trying to avoid chatbots that become radical or generate inappropriate representations in art or video.

We heard a little about Gemma at AI Field Day 4. At least one delegate had already had some hands-on time, even though Gemma had only been released the previous day. The Google Cloud AI presentation focused more on how these models are trained on the same type of infrastructure that Google Cloud offers customers.

The Google Kubernetes Engine (GKE) is central, allowing massive scale out of the compute requirement to train an LLM. In particular, training an LLM needs enormous scale out of accelerated processing, such as adding a TensorFlow Processing Unit (TPU) to each training node. GKE supports TPU-equipped computing and scaling out. GKE can handle 15,000 compute nodes and 50,000 TPUs in a single cluster. You can imagine that not all Google Cloud locations have all that capacity. Some scheduling challenges might be caused by customers occupying that amount of resources for weeks to train a new foundation LLM. Google expects only model training to require massive resourcing, and inference should have more modest resource requirements.

The idea that training to create the model and inference where the model is used has different requirements was repeated throughout AI Field Day 4. The Google Cloud AI team see the vast majority of inference using CPUs, available in various configurations in every Google Cloud location. Some of the development of LLMs will focus on reducing the cost of inference. Gemma is one of the examples. The two Gemma models have 2 billion and 7 Billion parameters yet can, according to Google, provide results as good as Meta’s Llama-2 model, which uses up to 70 billion parameters. Fewer parameters mean less memory and CPU for the model and less cost to get an answer. This smaller footprint and cost will be vital as we start to see AI and LLMs built-in to products rather than being the product. Running these LLMs on the cloud will continue to be the most efficient use of resources because these workloads will have significant variations in resource usage. It is easy to see why Google is developing and embedding LLMs into products.

One of the LLM-based products we saw in the Google presentation was Google Duet, an AI-based assistant for various computing tasks. The demo we saw at AI Field Day was of Duet AI for Developers, which assists with software development and troubleshooting. There is code suggestion, helping developers avoid the repeated work of writing initialization code or quickly get familiar with new services and APIs. We saw the use of Duet to identify the cause of an error message in a log by analyzing the source code that generated the code. Although it was not demonstrated, automated unit testing is a handy component. Good testing is vital to DevOps velocity without compromising safety. Another interesting Duet feature is code explanation. Hopefully, that will tell me what the code I wrote last year is supposed to do! I can see why Jensen Huang of NVIDIA says kids don’t need to learn to code; AI will do the coding. With all this automated AI, kids and adults need to learn critical thinking and AI prompt engineering.

Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.

Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.

Other Insights from The Futurum Group:

Google Cloud Engineering Exec: Welcome to Generative Engineering

Google Enhances GKE With Advanced Security, “Cluster Fleet” Management

Google Cloud Set to Launch NVIDIA-Powered A3 GPU Virtual Machines

Author Information

Alastair has made a twenty-year career out of helping people understand complex IT infrastructure and how to build solutions that fulfil business needs. Much of his career has included teaching official training courses for vendors, including HPE, VMware, and AWS. Alastair has written hundreds of analyst articles and papers exploring products and topics around on-premises infrastructure and virtualization and getting the most out of public cloud and hybrid infrastructure. Alastair has also been involved in community-driven, practitioner-led education through the vBrownBag podcast and the vBrownBag TechTalks.

Related Insights
Glean Doubles ARR to $200M. Can Its Knowledge Graph Beat Copilot
April 3, 2026

Glean Doubles ARR to $200M. Can Its Knowledge Graph Beat Copilot?

Nick Patience, VP & Practice Lead at Futurum, examines Glean's platform evolution from enterprise search to agentic AI, as it doubles ARR to $200M and battles Microsoft 365 Copilot for...
HP IQ Finally Brings Useful On-Device AI To Workspaces
April 3, 2026

HP IQ Finally Brings Useful On-Device AI To Workspaces

Olivier Blanchard, Research Director at Futurum, shares insights on HP IQ, HP’s workplace intelligence layer combining on-device AI, proximity-based connectivity, and IT control across devices and workflows....
Can UK Public Sector Security Keep Up With Its Own Digital Growth?
April 2, 2026

Can UK Public Sector Security Keep Up With Its Own Digital Growth?

The UK public sector's complex digital infrastructure has outpaced manual audits. Palo Alto Networks offers visibility to uncover critical security gaps in government and NHS environments....
Are Browsers the New Enterprise Attack Surface No One Is Ready to Defend?
April 2, 2026

Are Browsers the New Enterprise Attack Surface No One Is Ready to Defend?

Browser security is now the primary enterprise attack surface, with 95% of organizations experiencing browser-originated incidents that legacy tools cannot defend....
CrowdStrike Deepens Agentic SOC Strategy Across Partners, Services, and Devices
April 1, 2026

CrowdStrike Deepens Agentic SOC Strategy Across Partners, Services, and Devices

Fernando Montenegro, VP & Practice Lead for Cybersecurity & Resilience at Futurum, examines CrowdStrike’s agentic SOC expansion across partners, IBM, and Intel, and what it means for security execution and...
LevelBlue–SentinelOne Partnership: Does Unified Security Improve Outcomes?
April 1, 2026

LevelBlue–SentinelOne Partnership: Does Unified Security Improve Outcomes?

Fernando Montenegro, VP & Practice Lead for Cybersecurity & Resilience at Futurum, analyzes the LevelBlue SentinelOne partnership and its focus on integrating threat intelligence, AI detection, and response to improve...

Book a Demo

Newsletter Sign-up Form

Get important insights straight to your inbox, receive first looks at eBooks, exclusive event invitations, custom content, and more. We promise not to spam you or sell your name to anyone. You can always unsubscribe at any time.

All fields are required






Thank you, we received your request, a member of our team will be in contact with you.