Cloudera: De-Mystifying Data Architectures with Unique Platform Solutions

Cloudera: De-Mystifying Data Architectures with Unique Platform Solutions
Getting your Trinity Audio player ready...

State of the Ecosystem – Data Architecture Clarity and Selection is Challenging Today

Organizations across the data ecosystem are struggling with identifying the data architecture combination that is best suited for meeting their intricate data demands. Today’s data teams are tasked with the immense challenge of delivering and administering all their organization’s data and workloads throughout the entirety of their on-premise and cloud environments while also assuring minimal to no latency. In essence, they are focused on advancing the main business objective of making their business a data-driven organization by delivering everything, everywhere all at once across their evolving data architecture.

As a result, data decision makers are evaluating data fabric, data lakehouse, and data mesh trends to keep up with organization-wide data demands. We believe that supplying definitions of these data architectures can provide better understanding of these options and why decision makers are contemplating them in fulfilling the goal of data architecture optimization.

Data Mesh: An approach used to help scale a company’s data footprint in a manageable way through the decentralization of data and workloads. Data mesh is a set of practices around people, process, and technology choices that allow for companies to elastically scale their data systems. Key data mesh design principles as including self-serve data discovery, full data security, data lineage, data auditing, and data cataloging. We find large organizations with a domain-tailored architecture benefit the most from adoption since data meshes preserve the data and its ownership in the domain where it originated, thereby avoiding IT chokepoints, and assuring domain-based scaling.

Data Fabric: For instance, only with data properly understood through a fabric, can a mesh sensibly divide into domains and know what data is at its disposal. Fundamentally, concepts in data mesh map to real-world artifacts in the data fabric implementations. One way to implement a data mesh is to make technology choices within the framework of a data fabric. As such, data fabric is a collection of technologies used to ingest, store, process, and govern data anywhere at any time. Data fabric can be deemed as the technology part of data mesh. We see data fabric adoption picking up across organizations that look to accelerate integration between their data silos, make data readily available to business users regardless of location, and advance fulfillment of their data compliance and security goals.

Data Lakehouse: Data lakehouses integrate and unify the capabilities of data warehouses and data lakes with the goal of supporting artificial intelligence (AI), machine learning (ML), business intelligence, and data engineering on a unified platform. Specifically, open data lakehouses help organizations run rapid analytics on all data — both structured and unstructured — at massive scale. Today we see organizations swiftly embracing open data lakehouses to attain interoperability across different analytic engines and vendors, leveraging community-driven innovation to avoid vendor lock-in, and solving their real-world business problems in pragmatic ways with best-of-breed capabilities.

For additional clarification, we view hybrid architectures as the technology decisions made to ingest, store, process, govern, and visualize data in different form factors, encompassing on premises and multiple clouds, also replicating data according to need. As such, hybrid architectures can be viewed as an implementation of a data fabric that spans multiple form factors.

We find there is a wide variance of perspective on what constitutes a hybrid architecture. Although establishing a single official industry-wide definition is unlikely and simply not as important as meeting enterprise demand in using a hybrid architecture to avoid architectural lock-in and the potential constraints imposed by the specific technologies implemented or the location of data production and consumption. Regardless of the hybrid architecture used, we see enterprises giving top priority to having hybrid architecture flexibility and choice, especially toward improving their business outcomes.

We see data decision makers grappling with a great deal of marketing noise advocating the superiority of one of these data trends, making their decision to adopt only one of these trends or a combination of the trends more vexing. Overall, we do not believe the data trend selection process is an either/or choice and that data decision makers can optimize and modernize their data architecture by using an open-source data platform that brings built-in versatility and flexibility.

Data Architecture Trends: What to Expect

We see key data trends emerging that are shaping and driving the data architecture optimization process. For instance, data contracts are emerging as a new approach to data mesh as they can provide transparency over data usage and dependencies. In the near-term, we anticipate that decision makers will proceed cautiously by initially focusing on standardization support and technical stability. In this nascent stage, data governance is integral although avoiding excessive overhead merits extra scrutiny. As more confidence in data contracts is gained, we expect organizations to automate more of their data mesh processes including data mesh contracting.

Key to the enduring success of data meshes is assuring that the metadata, both dynamic and static, is consistent across all data products. This entails that the data model of the metadata must be consistent regardless of the underpinning technologies used. This data model functions as the contract structure which is defined between the producers and consumers of the data. In sum, consumers gain more flexibility to subscribe to data products that are generated by the data producers.

From our viewpoint, data decision makers are also investigating combining the data mesh with the data exchanges being built such as the Snowflake data exchange, Amazon data exchange, and others. This trend could further enlarge how data meshes are defined and understood. However, the future of this approach is currently unsettled as the data exchanges are designated primarily as producer and consumer marketplaces that usually do not have an analytics workload associated with them.

Cloudera: Meeting the Challenges and Easing the Selection of the Best Data Architecture

We believe that Cloudera’s portfolio is well suited to meet the demands of today’s rapidly evolving data architectures. This especially applies to being the trusted partner for the data decision makers who are making the selection of the data trends, including very likely their combinations, that are best suited to optimizing their data architecture journey.

The Cloud Data Platform (CDP) enables modern data architectures on a data anywhere and anytime basis, all according to the customer’s scale requirements. By supporting all the major data models in play today — i.e., data mesh, data fabric, and data lakehouse — Cloudera assures customers can avoid lock-in into one trend and have the flexibility vital to optimizing their data architecture through data trend selectivity.

For example, the integrated security and governance capabilities available through Cloudera’s Shared Data Experience (SDX) already have a proven track record in the delivery of successful data meshes across tightly regulated industries such as financial services. Additionally, the versatility of the Cloudera Data-in-Motion product and broader integration of CDP enable intricate use cases that extend beyond the data mesh in areas such as the ingestion and processing of IoT data for customer analytics and real-time cybersecurity analytics. This gives customers the overall data architecture flexibility key to optimizing their data model combinations.

We are also encouraged by Cloudera’s extensive support for open data lakehouse use cases over the last several years. Through open-source support, Cloudera customers can gain the confidence to advance their data trends selections with the knowledge that any choice they make maintains architectural flexibility and avoids lock-in. These deployments use open-source engines on open data and table formats, allowing for easy use of data engineering, data science, data warehousing, and machine learning in the data architecture optimization process.

From our perspective, Cloudera’s hybrid data platform provides the building blocks key to demystifying and deploying all modern data architectures. While technology in and of itself is insufficient to deploy any architecture, we believe there is tremendous benefit in having a single platform that meets the requirements of all architectures. Organizations can streamline their data trend selection process by minimizing the workforce training required to use, manage, and administer multiple systems. In addition, a single platform eliminates the need to replicate key capabilities such as governance across multiple trends throughout different locations and infrastructures.

Ultimately, we believe that Cloudera can provide the technological component of the solution to support any organization’s data-driven initiative by implementing the data mesh, data fabric, and data lakehouse trends according to customer selection and prioritization.

Disclosure: Futurum Research is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.

Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of Futurum Research as a whole.

Other insights from Futurum Research:

The Six Five On the Road with Rob Bearden, Cloudera CEO

Cloudera Infuses Value Across Data Ecosystem with Innovative Open Data Lakehouse Approach

Understanding and Embracing the Hybrid Multi-Cloud

Author Information

Daniel is the CEO of The Futurum Group. Living his life at the intersection of people and technology, Daniel works with the world’s largest technology brands exploring Digital Transformation and how it is influencing the enterprise.

From the leading edge of AI to global technology policy, Daniel makes the connections between business, people and tech that are required for companies to benefit most from their technology investments. Daniel is a top 5 globally ranked industry analyst and his ideas are regularly cited or shared in television appearances by CNBC, Bloomberg, Wall Street Journal and hundreds of other sites around the world.

A 7x Best-Selling Author including his most recent book “Human/Machine.” Daniel is also a Forbes and MarketWatch (Dow Jones) contributor.

An MBA and Former Graduate Adjunct Faculty, Daniel is an Austin Texas transplant after 40 years in Chicago. His speaking takes him around the world each year as he shares his vision of the role technology will play in our future.


Latest Insights:

HP Q1 2024 Earnings Could Be Hiding a Demand Easter Egg Ahead of the Impending AI-Driven PC Refresh Cycle Reset
Olivier Blanchard, Research Director at The Futurum Group, shares his insights on HP Q1 2024 earnings, which send mixed messages about PC demand ahead of the impending PC segment’s refresh cycle reset, driven by the new AI PCs.
Company Banking on Significant Revenue Growth in 2024 with Upcoming Volvo Launch
Daniel Newman and Keith Kirkpatrick of The Futurum Group cover Luminar’s Q4 2023 and FY 2023 earnings and discuss the challenges and opportunities that lie ahead for the LiDAR provider.
The Futurum Group’s Paul Nashawaty and Camberley Bates share their insights on Pure Storage’s earnings and future outlook.
HPE GreenLake Provides Hybrid Cloud Services as a Service
Alastair Cooke, CTO Advisor at The Futurum Group, shares his insights on how HPE has developed the GreenLake portfolio to deliver a variety of hybrid cloud infrastructure as a service offerings.