Gemma and Building Your Own LLM AI – Google Cloud AI at AI Field Day 4

Gemma and Building Your Own LLM AI – Google Cloud AI at AI Field Day 4

Introduction

The Google Cloud AI team presented at AI Field Day 4, telling us about the Gemma large language model (LLM) and what Google Cloud infrastructure you could use to build your own LLM AI.

The Google Cloud has always been known as an excellent platform for analytics and AI. Google Cloud AI build models like the newly released Gemma family of AI models. Gemma uses the same technologies as the Gemini LLMs but with a smaller parameter count to reduce the resources required for inference. Gemma also continues Google’s work on AI safety, trying to avoid chatbots that become radical or generate inappropriate representations in art or video.

We heard a little about Gemma at AI Field Day 4. At least one delegate had already had some hands-on time, even though Gemma had only been released the previous day. The Google Cloud AI presentation focused more on how these models are trained on the same type of infrastructure that Google Cloud offers customers.

The Google Kubernetes Engine (GKE) is central, allowing massive scale out of the compute requirement to train an LLM. In particular, training an LLM needs enormous scale out of accelerated processing, such as adding a TensorFlow Processing Unit (TPU) to each training node. GKE supports TPU-equipped computing and scaling out. GKE can handle 15,000 compute nodes and 50,000 TPUs in a single cluster. You can imagine that not all Google Cloud locations have all that capacity. Some scheduling challenges might be caused by customers occupying that amount of resources for weeks to train a new foundation LLM. Google expects only model training to require massive resourcing, and inference should have more modest resource requirements.

The idea that training to create the model and inference where the model is used has different requirements was repeated throughout AI Field Day 4. The Google Cloud AI team see the vast majority of inference using CPUs, available in various configurations in every Google Cloud location. Some of the development of LLMs will focus on reducing the cost of inference. Gemma is one of the examples. The two Gemma models have 2 billion and 7 Billion parameters yet can, according to Google, provide results as good as Meta’s Llama-2 model, which uses up to 70 billion parameters. Fewer parameters mean less memory and CPU for the model and less cost to get an answer. This smaller footprint and cost will be vital as we start to see AI and LLMs built-in to products rather than being the product. Running these LLMs on the cloud will continue to be the most efficient use of resources because these workloads will have significant variations in resource usage. It is easy to see why Google is developing and embedding LLMs into products.

One of the LLM-based products we saw in the Google presentation was Google Duet, an AI-based assistant for various computing tasks. The demo we saw at AI Field Day was of Duet AI for Developers, which assists with software development and troubleshooting. There is code suggestion, helping developers avoid the repeated work of writing initialization code or quickly get familiar with new services and APIs. We saw the use of Duet to identify the cause of an error message in a log by analyzing the source code that generated the code. Although it was not demonstrated, automated unit testing is a handy component. Good testing is vital to DevOps velocity without compromising safety. Another interesting Duet feature is code explanation. Hopefully, that will tell me what the code I wrote last year is supposed to do! I can see why Jensen Huang of NVIDIA says kids don’t need to learn to code; AI will do the coding. With all this automated AI, kids and adults need to learn critical thinking and AI prompt engineering.

Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.

Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.

Other Insights from The Futurum Group:

Google Cloud Engineering Exec: Welcome to Generative Engineering

Google Enhances GKE With Advanced Security, “Cluster Fleet” Management

Google Cloud Set to Launch NVIDIA-Powered A3 GPU Virtual Machines

Author Information

Alastair has made a twenty-year career out of helping people understand complex IT infrastructure and how to build solutions that fulfil business needs. Much of his career has included teaching official training courses for vendors, including HPE, VMware, and AWS. Alastair has written hundreds of analyst articles and papers exploring products and topics around on-premises infrastructure and virtualization and getting the most out of public cloud and hybrid infrastructure. Alastair has also been involved in community-driven, practitioner-led education through the vBrownBag podcast and the vBrownBag TechTalks.

Related Insights
Can Claude Opus 4.7 and Ensemble AI Models Finally Make Code Review Reliable?
April 18, 2026

Can Claude Opus 4.7 and Ensemble AI Models Finally Make Code Review Reliable?

CodeRabbit's ensemble AI code review system using Claude Opus 4.7 catches subtle bugs and race conditions that single-model systems miss, signaling a major shift in software quality assurance....
Will GPT-Rosalind Redefine AI’s Role in Life Sciences R&D?
April 18, 2026

Will GPT-Rosalind Redefine AI’s Role in Life Sciences R&D?

OpenAI's GPT-Rosalind marks a pivotal shift in enterprise AI, delivering domain-specific reasoning for life sciences while intensifying competition between horizontal and vertical AI specialists....
Can Real-Time Code Quality Tools Like Qodo and Cursor Break the Pull Request Bottleneck?
April 18, 2026

Can Real-Time Code Quality Tools Like Qodo and Cursor Break the Pull Request Bottleneck?

Qodo's integration with Cursor demonstrates how real-time code quality tools are eliminating pull request bottlenecks by surfacing issues as developers write code, not after submission....
Can CodeRabbit's Multi-Repo Analysis End the Microservices Blind Spot in Code Review?
April 18, 2026

Can CodeRabbit’s Multi-Repo Analysis End the Microservices Blind Spot in Code Review?

CodeRabbit's new Multi-Repo Analysis feature surfaces cross-repository breaking changes that traditional code review tools miss, addressing a critical pain point for microservices architectures and distributed teams....
Is PyTorch Europe's Rise a Turning Point for Open Source AI Leadership?
April 17, 2026

Is PyTorch Europe’s Rise a Turning Point for Open Source AI Leadership?

PyTorch Conference Europe 2026 drew 600+ AI leaders to Paris, showing open source AI's growing enterprise influence as organizations shift from proprietary solutions toward agentic AI and hybrid deployments....
Agentic AI or Pipeline AI for Code Reviews? Why the Architecture Decision Now Shapes Dev Velocity
April 17, 2026

Agentic AI or Pipeline AI for Code Reviews? Why the Architecture Decision Now Shapes Dev Velocity

Enterprise leaders face a critical decision: agentic AI versus pipeline AI for code reviews. Futurum Group's latest analysis reveals how this architectural choice directly impacts developer velocity, risk management, and...

Book a Demo

Newsletter Sign-up Form

Get important insights straight to your inbox, receive first looks at eBooks, exclusive event invitations, custom content, and more. We promise not to spam you or sell your name to anyone. You can always unsubscribe at any time.

All fields are required






Thank you, we received your request, a member of our team will be in contact with you.