Menu

Cohesity Gaia Uses RAG to Unlock Valuable Secondary Data

Cohesity Gaia Uses RAG to Unlock Valuable Secondary Data

The News: Cohesity introduces Gaia, an AI-powered conversational assistant. Additional detail is available in Cohesity’s press release.

Cohesity Gaia Uses RAG to Unlock Valuable Secondary Data

Analyst Take: With the hype around generative AI, it is easy to lose sight of the fact that the content generated by AI engines is only as factual and reliable as the data that it is based upon. For this reason, retrieval-augmented generation (RAG) is emerging as an important tool for improving the accuracy and factual consistency of the large language models (LLMs) that are being used to generate responses. From this standpoint, it augments and enhances generative AI with private, corporate data, while retaining the ability to manage the security of, and access to, that information.

Why RAG for Secondary Data?

Specifically, RAG can pull from a variety of knowledge sources, effectively allowing LLMs to “look things up” before answering an inquiry. For example, it can be used to identify files that are relevant to a business inquiry, inspect the content of these documents and files, and then use this information to generate a response. As a result, the subsequent response is more reliable and grounded in real-world knowledge.

While structured and unstructured primary data stores have largely been the focus of generative AI to date, they are only the tip of the iceberg. Secondary data contained in emails, files, and virtual machines represent significantly more data. This data is a massive and untapped opportunity from an AI standpoint.

What to Look for in a Solution for RAG-Enhanced AI

When looking at an architecture for AI that uses RAG, the ability to bring compute resources to the data is important for a number of factors that include optimizing performance, bandwidth and costs while minimizing latency. This is where a distributed architecture can come in, adding the benefits of improved resource utilization by spreading compute resources and increasing fault tolerance for resiliency.

Additionally, data integrity and security need to be managed. This management is especially critical considering the sensitive nature of the data that may need to be utilized to substantiate answers to business inquiries. A decentralized architecture allows for data to be processed locally, avoiding risks inherent in moving data to a centralized server. What’s more, responsible access to data must be enforced with capabilities such as role-based access control (RBAC) and by embracing a zero-trust approach. Finally, supporting API extensibility facilitates the ability to adapt to a range of diverse functionalities and capabilities that may be required to support experimentation and innovation and to integrate with existing infrastructure and workflows.

Introducing Cohesity Gaia

For its part, Cohesity aims to provide what it describes as an “easy button” to adopting RAG AI for secondary data with Gaia. With its Data Cloud offering, Cohesity has already been working to offer an end-to-end platform for data protection, security, mobility, access, and insights, that is based on its scalable, distributed file system and that uses its data indexing capabilities. Gaia will provide an AI-powered conversational interface that allows users to gain contextual and valuable LLM-based insights from enterprise data, with Cohesity hosting the LLM vector database, initially in the form of Azure OpenAI with others to follow. While initially supporting Microsoft 365 and OneDrive, Cohesity aims to expand to support other data types including unstructured NAS backups.

To facilitate responsible and secure utilization of data—which is an important and growing concern as customers increasingly adopt all forms of AI—the customer determines which data is indexed by Gaia. This approach helps to avoid any data being indexed that should not be, whether for privacy, security, or compliance concerns. Additionally, content filtering, configurable guardrails, and RBAC controls the data that specific users can access.

An example use case is streamlining compliance using generative AI. Data is indexed based on the Cohesity backups, and responses are generated based on generative AI, with RAG enhancing the subsequent output by facilitating access to corporate data and LLMs for enhanced business context. A multi-turn chat interface allows users to do further investigative digging. For example, a user might ask for the presence of patient names and treatment plans that may have been exposed over a certain period of time, and then dig into specific examples that are uncovered.

Conclusion and Looking Ahead

Cohesity is the first mover when it comes to bringing RAG to secondary data. In allowing users to ask business questions and obtain a context-rich response, using RAG AI and LLM to search data, identify relevant information, and generate a response, this is a game-changer in terms of unlocking value from the exabytes of secondary and protection data that is in existence today and growing exponentially. Key customer outcomes that The Futurum Group anticipates include improving the speed and accuracy of decision making and streamlining risk compliance and risk management. As table stakes criteria, the solution incorporates key features required to facilitate safe and responsible access to and utilization of indexed data. The solution is SaaS-based and includes a free trial option, which will support customer adoption.

Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.

Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.

Other Insights from The Futurum Group:

Cohesity Acquires Veritas Data Protection Assets

Early Access for Cohesity Turing Integration with Amazon Bedrock

Cohesity at AWS re:Invent 2023 – The Six Five on the Road

Author Information

Russ brings over 25 years of diverse experience in the IT industry to his role at The Futurum Group. As a partner at Evaluator Group, he built the highly successful lab practice, including IOmark benchmarking.

Prior to Evaluator Group he worked as a Technology Evangelist and Storage Marketing Manager at Sun Microsystems. He was previously a technologist at Solbourne Computers in their test department and later moved to Fujitsu Computer Products. He started his tenure at Fujitsu as an engineer and later transitioned into IT administration and management.

Russ possesses a unique perspective on the industry through his experience as both a product marketing and IT consumer.

A Colorado native, Russ holds a Bachelor of Science in Applied Math and Computer Science from University of Colorado, Boulder, as well as a Master of Business Administration in International Business and Information Technology from University of Colorado, Denver.

Krista Case brings over 15 years of experience providing research and advisory services and creating thought leadership content. Her vantage point spans technology and vendor portfolio developments; customer buying behavior trends; and vendor ecosystems, go-to-market positioning, and business models. Her work has appeared in major publications including eWeek, TechTarget and The Register.

Related Insights
Glean Doubles ARR to $200M. Can Its Knowledge Graph Beat Copilot
April 3, 2026

Glean Doubles ARR to $200M. Can Its Knowledge Graph Beat Copilot?

Nick Patience, VP & Practice Lead at Futurum, examines Glean's platform evolution from enterprise search to agentic AI, as it doubles ARR to $200M and battles Microsoft 365 Copilot for...
HP IQ Finally Brings Useful On-Device AI To Workspaces
April 3, 2026

HP IQ Finally Brings Useful On-Device AI To Workspaces

Olivier Blanchard, Research Director at Futurum, shares insights on HP IQ, HP’s workplace intelligence layer combining on-device AI, proximity-based connectivity, and IT control across devices and workflows....
RSAC 2026: The AI 'Tragedy of the Commons' and the Future of Agentic Security
April 3, 2026

RSAC 2026: The AI ‘Tragedy of the Commons’ and the Future of Agentic Security

Fernando Montenegro and Mitch Ashley, VPs and Practice Leads at Futurum, convey their observations from the RSAC 2026 Conference, with a focus on AI and agentic security....
Can UK Public Sector Security Keep Up With Its Own Digital Growth?
April 2, 2026

Can UK Public Sector Security Keep Up With Its Own Digital Growth?

The UK public sector's complex digital infrastructure has outpaced manual audits. Palo Alto Networks offers visibility to uncover critical security gaps in government and NHS environments....
Are Browsers the New Enterprise Attack Surface No One Is Ready to Defend?
April 2, 2026

Are Browsers the New Enterprise Attack Surface No One Is Ready to Defend?

Browser security is now the primary enterprise attack surface, with 95% of organizations experiencing browser-originated incidents that legacy tools cannot defend....
CrowdStrike Deepens Agentic SOC Strategy Across Partners, Services, and Devices
April 1, 2026

CrowdStrike Deepens Agentic SOC Strategy Across Partners, Services, and Devices

Fernando Montenegro, VP & Practice Lead for Cybersecurity & Resilience at Futurum, examines CrowdStrike’s agentic SOC expansion across partners, IBM, and Intel, and what it means for security execution and...

Book a Demo

Newsletter Sign-up Form

Get important insights straight to your inbox, receive first looks at eBooks, exclusive event invitations, custom content, and more. We promise not to spam you or sell your name to anyone. You can always unsubscribe at any time.

All fields are required






Thank you, we received your request, a member of our team will be in contact with you.