Analyst(s): Brad Shimmin
Publication Date: April 24, 2026
Google Cloud is moving beyond the concept of static data estates to help its customers build systems that are not just intelligent but capable of taking action. By standardizing on Apache Iceberg and integrating agentic engineering directly into the storage and catalog layers, the company aims to bridge the gap between raw infrastructure and AI-driven autonomous action. This re-architecture aims to eliminate data silos and automate the heavy lifting of data engineering through several new brand families led by Gemini Enterprise, Agentic Data Cloud, and Gemini Enterprise Agent Platform.
What is Covered in This Article:
- Google Cloud’s architectural evolution from passive data estates to active systems of intelligence capable of action, leveraging the Gemini and Gemma model families.
- The evolution of Dataplex into the Knowledge Catalog, acting as a dynamic semantic engine for agentic AI.
- The expansion of the Cross-Cloud Lakehouse through Apache Iceberg enables zero-copy data integration across AWS, Azure, and major SaaS platforms.
- The introduction of Smart Storage in Google Cloud Storage (GCS) to natively tag, embed, and extracting value from unstructured information.
- The transition toward agentic data engineering via the Data Agent Kit and Conversational Analytics is moving practitioners from manual coding to intent-driven orchestration.
The News: Google Cloud recently announced a sweeping update to its data portfolio, signaling a transition from providing data storage and analytics to delivering integrated systems of intelligence and action. At the heart of this announcement is the Knowledge Catalog, an evolution of Dataplex designed to provide large language models (LLMs) with the business context required for agentic workflows. Simultaneously, Google is doubling down on open standards by utilizing Apache Iceberg as the foundation for its Cross-Cloud Lakehouse, which now supports zero-copy queries against data residing in AWS and Azure.
Uniquely, to address the challenge of unstructured data, Google introduced Smart Storage capabilities natively within Google Cloud Storage (GCS), allowing for automatic metadata tagging and embedding of files as they land. Finally, the company is introducing agentic tooling through the Data Agent Kit for developers and Conversational Analytics for business users, all unified under the Gemini Enterprise Agent Platform. These updates are intended to reduce the friction of data movement and replace manual ETL pipelines with autonomous, intent-driven engineering.
Going Beyond the Data Graveyard With Google’s Agentic Data Cloud as the New Semantic Core for Agentic AI
Analyst Take: For the last decade, the enterprise has struggled to cope with messy data estates—a massive, often unwieldy collection of lakes, warehouses, systems of record, SaaS apps, and on-desk data silos (e.g., spreadsheets) that require constant maintenance and complex plumbing. However, the rise of agentic AI has shone an uncomfortable light on this situation: static data repositories cannot keep up with the reasoning requirements of a modern generative model. Google hopes to address this by reframing and, in many cases, re-engineering its broad data and analytics portfolio, recognizing that storing data is insufficient if the infrastructure cannot also understand and act upon it. This transition represents a significant maturation of Google Cloud’s data intelligence, finally bringing the company’s data infrastructure in line with its industry-leading, highly innovative AI capabilities on display in the Gemini and Gemma model families.
From my vantage point, Google’s move to help customers transform their existing systems of intelligence into action is a shrewd display of technical pragmatism, which is clearly paving the way for its customers to move from data chaos to agentic autonomy. By solving for some very straightforward issues, including data silos, the accessibility of unstructured information, and the inefficiencies of manual coding, Google’s data portfolio is at last catching up with its own AI ambitions and is showing its primary rivals in the hyperscale space that the search engine leader has still got the technical chops to solve complex problems.
The Iceberg Standard and the Cross-Cloud Lakehouse
One of the most impactful moves in this announcement is the elevation of Apache Iceberg to a first-class citizen within the Google ecosystem. By standardizing on this open table format, Google is effectively neutralizing the data gravity problem that has long plagued multi-cloud strategies. The Cross-Cloud Lakehouse enables BigQuery to query data in AWS S3 or Azure Data Lake Storage without the traditional overhead and cost of moving it. This zero-copy data integration directly challenges the walled-garden approach of some competitors and signals a move toward a more fluid query fabric.
According to the Futurum Group 1H 2026 DIAI Market Sizing & Five-Year Forecast Report, the demand for integrated AI and data governance platforms is a top priority for enterprise leaders. Google’s approach meets this demand by acknowledging that data will always be distributed. Instead of forcing customers to consolidate all their information into Google Cloud Storage, Google is providing a unified layer that reaches out to wherever the data lives, including platforms like ServiceNow, SAP, and Workday, which have partnered with Google to streamline direct data access. This creates a unique and enticing gravity well for Google. While the data may physically reside on a competitor’s disks, the intelligence and reasoning can readily happen within the Google Cloud environment.
Moving From Basic Metadata to Semantics with The Knowledge Catalog
The evolution of Dataplex into the Knowledge Catalog goes far beyond a simple rebrand. Traditional data catalogs have historically served as metadata storehouses, places where schemas and column names are recorded but infrequently utilized effectively by end-users or autonomous applications. The updated Knowledge Catalog functions as a dynamic context engine rather than a static list of tables. It provides the grounding necessary for models like Gemini to understand how a table relates to specific business processes, such as supply chain management or customer churn.
This semantic layer is critical for agentic AI. If an agent is tasked with optimizing inventory, it needs to know which tables represent current stock, which represent pending orders, and how those entities interact across different systems. By mapping these complex business relationships natively, Google is providing the connective tissue that has been missing from the AI stack. In doing so, Google is moving the conversation from the physical location of data to the business meaning of that data.
Illuminating Dark Data With Smart Storage
The most technically aggressive component of this strategy is the introduction of Smart Storage within Google Cloud Storage (GCS). Historically, unstructured data (e.g., PDFs, images, and call logs) has remained ostensibly opaque, hidden away in storage buckets and requiring manual pipelines to process, vectorize, and index before an AI model could touch it. Google is now embedding these capabilities directly into the storage layer as autonomous data enrichment.
As soon as a file lands in GCS, Smart Storage can automatically tag it, generate embeddings, and extract entities. This effectively removes vector database complexities that plague many organizations. Instead of managing a separate infrastructure for Retrieval-Augmented Generation (RAG), the storage layer itself becomes aware of the content it holds. This represents a significant reduction in architectural complexity and overhead, making it far easier for enterprises to bring their proprietary, semi- and unstructured information into the AI fold. And when paired with the Knowledge Catalog, this service actively creates a dynamic map of how data relates to the business, providing the vital meaning that agentic processes require.
Agentic Data Engineering and the Evolution of Manual Pipelines
Perhaps the most disruptive change we are tracking in this market is the transition toward agentic data engineering. For years, data engineers have been the plumbers of the enterprise, writing thousands of lines of Python, Spark, and SQL code to move and transform data. Google’s Data Agent Kit, which integrates directly into developer environments such as VS Code, signals the Company’s intent to replace these manual, deterministic pipelines with autonomous orchestration built on nothing more than user intent.
Practitioners can now use intent-driven engineering to describe their desired outcome, leaving the agent to handle the underlying code generation and execution. This is not about replacing the data professions, however. Rather, it’s about elevating their role. When paired with Conversational Analytics, which allows business users to query data trapped in spreadsheets or local files using natural language, the entire data lifecycle becomes more fluid, more adaptive, and much easier to maintain. All of this is orchestrated through the Gemini Enterprise Agent Platform, which serves as a single control plane for managing these autonomous entities.
Don’t Ignore Egress and Intellectual Lock-in
While Google’s technical vision with Agentic Data Cloud is compelling, there are tactical risks that enterprises must consider. First, the zero-copy promise of the Cross-Cloud Lakehouse is still subject to the physical realities of cloud economics. Even if Google can query data in AWS without moving it, the user is still likely to encounter egress and transactional fees and the inherent latency issues of cross-cloud traffic. In this regard, technology is currently ahead of the cloud providers’ billing practices. This will, of course, change as companies like Google transition toward a pure utility-based pricing model.
More strategically, there is the risk of intelligence lock-in. While Google is using open formats like Apache Iceberg for storage and model context protocol (MCP) for agentic access to its entire portfolio, the semantic mappings, agentic logic, and prompt engineering that make the system intelligent are deeply tied to Google’s proprietary Gemini models. Moving data might be easy, but moving the brain that knows how to use that data can be much harder. Organizations will need to carefully weigh the convenience of this unified and increasingly vertically integrated system against the long-term flexibility of their AI strategy. Still, Google does not necessarily need to displace its competitors to win here. It only needs to make its ecosystem the most logical and advantageous place to run these high-value workloads.
What to Watch:
- The Evolution of the agentic data engineer role (a role Futurum labels as the AI Shepherd). Watch for a change in hiring requirements as data engineering moves from manual coding to the management and oversight of autonomous agents.
- Competitor responses in the lakehouse Space. Databricks, AWS, Microsoft, and Snowflake will likely accelerate their own zero-copy and semantic layer features to counter Google’s move into the cross-cloud query space.
- Cloud egress fee wars. As multi-cloud architectures become the technical norm, there will be increasing pressure on cloud providers to lower or eliminate egress fees for cross-cloud querying.
- The maturity of the knowledge catalog. The success of Google’s strategy hinges on how well its Knowledge Catalog can automatically map business relationships without requiring massive manual effort from data stewards.
See the complete press release on the Agentic Data Cloud on Google’s website.
Disclosure: Futurum is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.
Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of Futurum as a whole.
Other Insights from Futurum:
Can Databricks Out-Iceberg the Competition?
Teradata Set to Turn Data Gravity Into AI Gold With Enterprise AgentStack
Author Information
Brad Shimmin is Vice President and Practice Lead, Data Intelligence, Analytics, & Infrastructure at Futurum. He provides strategic direction and market analysis to help organizations maximize their investments in data and analytics. Currently, Brad is focused on helping companies establish an AI-first data strategy.
With over 30 years of experience in enterprise IT and emerging technologies, Brad is a distinguished thought leader specializing in data, analytics, artificial intelligence, and enterprise software development. Consulting with Fortune 100 vendors, Brad specializes in industry thought leadership, worldwide market analysis, client development, and strategic advisory services.
Brad earned his Bachelor of Arts from Utah State University, where he graduated Magna Cum Laude. Brad lives in Longmeadow, MA, with his beautiful wife and far too many LEGO sets.
