Build Your Own AI Infrastructure Using Google Cloud

Build Your Own AI Infrastructure Using Google Cloud

Analyst(s): Alastair Cooke
Publication Date: May 19, 2025

What is Covered in this Article:

  • Google Cloud offers multiple options for building and deploying AI applications. The presentations focused on building an infrastructure using VMs and managed storage.
  • The cluster director and toolkit can deploy complete cluster environments from a simple source. The deployed cluster is designed for long-term operation.
  • New object storage options bring data closer to AI compute nodes. Managed Lustre provides massively scalable file storage for AI servers.

Recommendations:

  • Evaluate the newest available options for creating and serving AI models. Newer VM types and accelerators are far more capable than previous generations.
  • Carefully compare managed service capabilities against your needs, and only build infrastructure if there is an overwhelming need.
  • When you choose to build infrastructure, use automation for deployment and operations. Use managed services within that infrastructure.

Build Your Own AI Infrastructure Using Google Cloud

Analyst Take: The Google Cloud Platform offers numerous options for building an AI application, ranging from managed services to Infrastructure-as-a-Service (IaaS). Matching business requirements to these options is the only way to ensure a cost-effective deployment of your AI application on the Google Cloud. In previous Tech Field Day presentations, Google Cloud focused on managed services like Vertex and Cloud Run to deliver AI applications. At AI Infrastructure Field Day, Google transitioned to showcasing the lower-level services that customers can utilize to build a customized AI solution. The portfolio includes specialized compute instances and storage options to accelerate applications and networking, enabling seamless integration between your cloud-based AI application and on-premises infrastructure.

Cluster Operations

The complexity of building an AI infrastructure, even on the Google Cloud, can inhibit new projects. The cluster director and its cluster toolkit provide a straightforward way to deploy and operate an AI cluster at scale, utilizing either Google Kubernetes Engine (GKE) or the Slurm cluster scheduler. Customers create a simple blueprint document, a declarative source file, from which the toolkit creates a Terraform template and deploys a cluster. The cluster director manages the cluster and its workloads, providing observability and troubleshooting tools to simplify ongoing cluster management. Using the cluster toolkit and director lowers the barrier to entry for deploying AI infrastructure on Google Cloud, whether for AI training or model serving for inference.

Hardware Accelerators

An AI training cluster needs accelerators for the heavy lifting of building or refining a model. Google Cloud offers both NVIDIA GPUs and Ironwood TPUs as accelerators. The latest A4 VM shapes contain NVIDIA B200 Blackwell GPUs. The A4X uses water-cooled variants for higher performance and density. These new machines offer NVMe SSDs and Google’s Titanium-accelerated networking. The combined upgrades provide significantly increased performance and throughput for demanding AI workloads. Google also provides Tensor Processing Units (TPUs) as an accelerator; the seventh-generation TPU, known as Ironwood, is expected to be available before the end of 2025. Ironwood will enable a super pod comprising over 9,000 TPU chips to function as a single unit for massive AI training jobs. Each generation of TPU has brought larger pod scaling and more data throughput with High Bandwidth Memory (HBM). Ironwood will provide over 7 TBPS of memory bandwidth.

Storage Performance at Scale

Google Cloud announced some new storage options at Google Cloud Next, detailing how to implement and use these services for an AI infrastructure in the AI Field Day presentations. The new shared file system service is Google Cloud Managed Lustre, which provides a file system of up to 1 PB in size and sub-millisecond latency. Lustre is a parallel file system that enables thousands of concurrent connections without compromising performance, making it ideal for AI model data and checkpointing during training. Two of Google Cloud’s new object storage capabilities are designed to place object data in the same zone (data center) as the AI compute cluster. Anywhere Cache, as the name suggests, is a cache for an existing bucket. The Anywhere Cache is configured for the zone containing your AI cluster and provides read-only access to the bucket contents at high throughput and low latency. Anywhere Cache can be configured on any existing bucket and is suitable for training data or reference data, as it is read-only. Writes to the bucket are not accelerated; they are sent directly to the permanent bucket location. Rapid Storage is a new option for a bucket to service up to 20 million requests per second for read-write access in a single zone. Rapid Storage is an excellent option for checkpoint data or any use where low latency is critical, and the availability limitations of being in a single zone are appropriate. Both Anywhere Cache and Rapid Storage can be used with Google CSFUSE, a driver that makes object storage available as a file system, enabling unmodified applications to work with object storage data.

Networking at Planet Scale with Extreme Local Performance

Network connectivity between AI accelerators and for ingestion of training or inference data is always a critical design element to avoid bottlenecks. Titanium is the hardware offload technology at the core of delivering high-bandwidth, low-latency virtual machine (VM) and container networking within the Google Cloud platform. GPUs and TPUs are connected through low-latency non-blocking networks to ensure training and fine-tuning jobs complete rapidly. The GKE Inference Gateway is a new feature, an application-aware AI load balancer for inference. The gateway differs from a network load balancer because it knows the state and load of the containers running LLM models for inference. The gateway produces a more consistent load across the available container instances, resulting in more consistent, low-latency responses for users. Google’s testing showed 60% lower latency and 40% higher throughput than a conventional load balancer. Google also announced Cloud WAN, an option to use the Google planet-scale network as your WAN connectivity, connecting your on-premises and multi-cloud networks through a fully managed network service. Cloud WAN utilizes the Google global network and does not require a VPC-based network to join on-premises locations. Cloud WAN can also provide better Internet routing for SD-WAN platforms by transitioning to the Google network closer to the source of traffic, rather than closer to the SD-WAN head end.

What to Watch:

  • Most businesses will not build foundation AI models; fine-tuning and model serving for inference are the main business uses for AI infrastructure. Many use cases for AI will involve predictive AI, also known as AI/ML, which has far smaller resource requirements than the generative AI presented here.
  • An increasing number of Neocloud providers are offering public clouds specifically for AI workloads. These providers do not provide the same range of services and options as larger platforms like Google Cloud; however, their specialization may be suitable for specific use cases.
  • The Google Cloud is an excellent platform for AI experimentation, as managed services can reduce the effort and risk involved in testing potential solutions to business problems. Once inference delivers proven business value, it makes sense to evaluate whether to use managed services, cloud infrastructure, or on-premises infrastructure for long-term operations. Often, the best location for inference is alongside your existing data and applications.

Google’s presentations on the AI Hypercomputer, clustering, GKE, and object storage are on the morning appearance page. The presentations on managed Lustre, GPU, and TPU acceleration, and networking are on the afternoon appearance page.

You can watch all the presentations from the four days of AI Infrastructure Field Day on the Tech Field Day website.

Disclosure: Futurum is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.

Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of Futurum as a whole.

Other insights from Futurum:

Google Cloud Next 2025: The Yellow Brick Road to AI Transformation

At Google Cloud Next, Google Brings Its Databases to Bear on Agentic AI Opportunity

Why Organizations Are Switching to Google Cloud

Author Information

Alastair has made a twenty-year career out of helping people understand complex IT infrastructure and how to build solutions that fulfil business needs. Much of his career has included teaching official training courses for vendors, including HPE, VMware, and AWS. Alastair has written hundreds of analyst articles and papers exploring products and topics around on-premises infrastructure and virtualization and getting the most out of public cloud and hybrid infrastructure. Alastair has also been involved in community-driven, practitioner-led education through the vBrownBag podcast and the vBrownBag TechTalks.

SHARE:

Latest Insights:

On this episode of The Six Five Pod, hosts Patrick Moorhead and Daniel Newman discuss the recent US-China trade deal, the Middle East AI technology push, and Qualcomm's unexpected data center chip announcement. The hosts debate the future impact of AI on information workers and analyze market reactions to recent tech deals. They also explore Cisco's impressive earnings and leadership changes. Throughout the episode, Moorhead and Newman offer insightful commentary on the interconnectedness of global tech markets, the rapid pace of AI advancements, and the strategic moves of major tech players in response to evolving industry dynamics.
Zoho Launches Zoho Payments To Bring Native Payment Capabilities to US Businesses, Strengthening Operational Workflows and Boosting Financial Visibility
Keith Kirkpatrick, Research Director at Futurum, shares insights on Zoho Payments and how its in-house payment stack aims to reduce financial workflow friction and improve transaction success rates across business operations.
Zoho Adds Generative AI and Orchestration Tools to Its CX Platform, Expanding CRM Usage Beyond Sales Teams
Keith Kirkpatrick, Research Director at Futurum, shares insights on Zoho’s CRM for Everyone and how Zia-powered AI tools like Ask Zia are enabling prompt-based reports, workflows, and no-code customization for broader CRM adoption.

Book a Demo

Thank you, we received your request, a member of our team will be in contact with you.