Menu

Gemma and Building Your Own LLM AI – Google Cloud AI at AI Field Day 4

Gemma and Building Your Own LLM AI – Google Cloud AI at AI Field Day 4

Introduction

The Google Cloud AI team presented at AI Field Day 4, telling us about the Gemma large language model (LLM) and what Google Cloud infrastructure you could use to build your own LLM AI.

The Google Cloud has always been known as an excellent platform for analytics and AI. Google Cloud AI build models like the newly released Gemma family of AI models. Gemma uses the same technologies as the Gemini LLMs but with a smaller parameter count to reduce the resources required for inference. Gemma also continues Google’s work on AI safety, trying to avoid chatbots that become radical or generate inappropriate representations in art or video.

We heard a little about Gemma at AI Field Day 4. At least one delegate had already had some hands-on time, even though Gemma had only been released the previous day. The Google Cloud AI presentation focused more on how these models are trained on the same type of infrastructure that Google Cloud offers customers.

The Google Kubernetes Engine (GKE) is central, allowing massive scale out of the compute requirement to train an LLM. In particular, training an LLM needs enormous scale out of accelerated processing, such as adding a TensorFlow Processing Unit (TPU) to each training node. GKE supports TPU-equipped computing and scaling out. GKE can handle 15,000 compute nodes and 50,000 TPUs in a single cluster. You can imagine that not all Google Cloud locations have all that capacity. Some scheduling challenges might be caused by customers occupying that amount of resources for weeks to train a new foundation LLM. Google expects only model training to require massive resourcing, and inference should have more modest resource requirements.

The idea that training to create the model and inference where the model is used has different requirements was repeated throughout AI Field Day 4. The Google Cloud AI team see the vast majority of inference using CPUs, available in various configurations in every Google Cloud location. Some of the development of LLMs will focus on reducing the cost of inference. Gemma is one of the examples. The two Gemma models have 2 billion and 7 Billion parameters yet can, according to Google, provide results as good as Meta’s Llama-2 model, which uses up to 70 billion parameters. Fewer parameters mean less memory and CPU for the model and less cost to get an answer. This smaller footprint and cost will be vital as we start to see AI and LLMs built-in to products rather than being the product. Running these LLMs on the cloud will continue to be the most efficient use of resources because these workloads will have significant variations in resource usage. It is easy to see why Google is developing and embedding LLMs into products.

One of the LLM-based products we saw in the Google presentation was Google Duet, an AI-based assistant for various computing tasks. The demo we saw at AI Field Day was of Duet AI for Developers, which assists with software development and troubleshooting. There is code suggestion, helping developers avoid the repeated work of writing initialization code or quickly get familiar with new services and APIs. We saw the use of Duet to identify the cause of an error message in a log by analyzing the source code that generated the code. Although it was not demonstrated, automated unit testing is a handy component. Good testing is vital to DevOps velocity without compromising safety. Another interesting Duet feature is code explanation. Hopefully, that will tell me what the code I wrote last year is supposed to do! I can see why Jensen Huang of NVIDIA says kids don’t need to learn to code; AI will do the coding. With all this automated AI, kids and adults need to learn critical thinking and AI prompt engineering.

Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.

Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.

Other Insights from The Futurum Group:

Google Cloud Engineering Exec: Welcome to Generative Engineering

Google Enhances GKE With Advanced Security, “Cluster Fleet” Management

Google Cloud Set to Launch NVIDIA-Powered A3 GPU Virtual Machines

Author Information

Alastair has made a twenty-year career out of helping people understand complex IT infrastructure and how to build solutions that fulfil business needs. Much of his career has included teaching official training courses for vendors, including HPE, VMware, and AWS. Alastair has written hundreds of analyst articles and papers exploring products and topics around on-premises infrastructure and virtualization and getting the most out of public cloud and hybrid infrastructure. Alastair has also been involved in community-driven, practitioner-led education through the vBrownBag podcast and the vBrownBag TechTalks.

Related Insights
Is Autonomous IT the Endgame for AI in Operations or Just the Start of a Bigger Shift?
April 12, 2026

Is Autonomous IT the Endgame for AI in Operations or Just the Start of a Bigger Shift?

As Autonomous IT evolves, CIOs must weigh efficiency gains against vendor lock-in and skills gaps, raising the question: is this AI's operational endgame or just the beginning?...
OpenAI’s GPT-5.3 Instant Mini: Does Faster AI Mean Smarter Enterprise Decisions?
April 12, 2026

OpenAI’s GPT-5.3 Instant Mini: Does Faster AI Mean Smarter Enterprise Decisions?

OpenAI's GPT-5.3 Instant Mini launch signals a critical shift in enterprise AI adoption. With 67% of organizations running GenAI in production and 75% increasing budgets, speed and cost-efficiency now drive...
OpenAI Sora Discontinuation: What the End of a Platform Means for Enterprise AI Strategy
April 12, 2026

OpenAI Sora Discontinuation: What the End of a Platform Means for Enterprise AI Strategy

OpenAI's 2026 Sora discontinuation forces enterprises to urgently reassess GenAI strategies, as 67% already run it in production while facing vendor lock-in and integration risks....
Is Autonomous IT the Endgame for AI in Operations or Just the Start of a Bigger Shift?
April 11, 2026

Is Autonomous IT the Endgame for AI in Operations or Just the Start of a Bigger Shift?

As Autonomous IT evolves, CIOs must weigh efficiency gains against vendor lock-in and skills gaps, raising the question: is this AI's operational endgame or just the beginning?...
Agentic AI
April 10, 2026

Oracle’s Fusion Agentic Apps: Can Platform-First AI Finally Deliver Enterprise ROI?

Oracle launches Fusion Agentic Applications with autonomous AI agents in enterprise platforms. Research shows 38.8% of enterprise buyers now expect GenAI delivery via agents, signaling a fundamental shift in how...
Can AI Save the Mainframe BMC Bets on Governance and Agentic AI
April 10, 2026

Can AI Save the Mainframe? BMC Bets on Governance and Agentic AI

Brad Shimmin and Mitch Ashley, Analysts at Futurum, examine BMC Software’s April 2026 AI expansion. The report details how uniting AMI with Control-M's new Agent Gateway addresses the mainframe demographic...

Book a Demo

Newsletter Sign-up Form

Get important insights straight to your inbox, receive first looks at eBooks, exclusive event invitations, custom content, and more. We promise not to spam you or sell your name to anyone. You can always unsubscribe at any time.

All fields are required






Thank you, we received your request, a member of our team will be in contact with you.