In this episode of the Futurum Tech Webcast, host Steven Dickens speaks with IBM’s Ryan Yackel, GTM PM and Growth Leader, IBM Databand, about the evolving landscape of data management and AI. They discuss the recent acquisition of Databand by IBM, highlighting Databand’s role in data observability within the modern data stack. Yackel explains how data observability is becoming increasingly important due to the challenges faced by data engineering teams and the proliferation of diverse tool stacks. He also delves into how data observability complements data governance, emphasizing its role in improving detection, resolution times, and data SLAs.

Their discussion covers:

IBM’s acquisition of Databand and its integration into IBM’s data fabric team
That data observability is identified as a critical trend due to the increasing demands on data engineering teams and the complexity of tool stacks
How data observability enhances data governance, reliability, and quality within organizational data strategies
The intersection of data management practices with AI deployment, emphasizing the importance of quality and governance in AI strategies

To learn more, and to download The Futurum Group’s white paper done in partnership with IBM, visit the company’s website.

Watch the video below, and be sure to subscribe to our YouTube channel, so you never miss an episode.

Listen to the audio here:

Or grab the audio on your streaming platform of choice here:

Disclaimer: The Futurum Tech Webcast is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded, and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors, and we ask that you do not treat us as such.

Transcript:

Steven Dickens: Hello, welcome. My name’s Steven Dickens and I’m your host and you’re joining us here on another episode of the Futurum Tech Webcast. I’m joined by Ryan Yackel, formerly the CMO of Databand, now with IBM. Hey Ryan, welcome to the show.

Ryan Yackel: Hey man, what’s going on? Good to talk to you today.

Steven Dickens: So we’ve been looking forward to recording this for a while. Maybe let’s get the listeners and viewers orientated here first. Tell people a little bit about your role and then maybe also expand on the Databand acquisition by IBM. I think that’s still relatively fresh in people’s minds, so maybe just drill down there as well for us.

Ryan Yackel: Yeah. So yeah, thanks for having me. About a year ago, or close up to a year ago, IBM acquired Databand. Databand.ai is what the website was and it’s still probably going to be up for a while, but we were a leader in data observability, specifically in the modern data stack space. And so IBM obviously is a leader in data fabric and a lot of the things they’re doing around governance, data integration, replication, lineage, all those different areas of establishing a data foundation, observability just wasn’t a part of their data stack. And so they saw an opportunity to acquire Databand and we’ve been a part of the data fabric team for going on about a year now. So it’s been great because we’re able to really tap into different audiences that IBM may have not had communications with or may have not had an opportunity to discuss solutions with, particularly more code-based data engineering and data platform teams that are struggling with pipelines breaking and data quality breaking and they have no idea where the problem is. And having data observability a part of their overall data engineering workflow has been a tremendous value add that we’ve seen. So that’s a quick overview and I was CMO over there and I’m leading go-to-market strategy for IBM data band right now within the IBM data fabric.

Steven Dickens: Fantastic. So let’s dive straight in. In your introduction there, you covered data observability. Why is that becoming a hot trend? I hear a lot about it in the data platform where customers are looking. You’re obviously closer to it than I am. Why is that becoming such a hot trend and why are people starting to really engage with you and your team?

Ryan Yackel: Yeah, I think that there’s been a couple, I guess, macro things that have gone on in the space. I mean, one of them is the fact that data engineering teams and data platform teams are insanely just underwater constantly. I mean, that’s what we hear. We hear things like, “Hey, we can’t be told a week from now, or even an hour from now, ‘the data pipeline has failed,’ and not know about it. We can’t be blindsided by schema changes because then our people that are consuming the data downstream are going to have a problem.” So I think the way data observability has come out is, one is the fact of this overloading of throughput that the data engineering team needs to really do at a high velocity clip and do it reliably, that’s one of the problems that’s blowing things up. Then the second thing is there’s so many different tool stacks that people are a part of nowadays. It’s hard to keep track of all the different tools. They’re using Open source, using something like Databricks or Snowflake or Airflow or DBT or whatever you’re using out there is hodgepodge of just best of breed tooling that’s out there. If you don’t have sensors and monitoring in place across all your different tool stack, you’re going to run into a problem. So I think those are the two main things, just the proliferation of the modern data stack and then just the overload of data engineering.

So out of that really came a focus on how can we deliver value to this underserved, data engineering team, out there with some tooling. If you look at the way that application observability has gone over the past 5 to 10 years with companies like Datadog and Instana and New Relic, everyone that’s in the software space or software engineer, they have monitoring and observability tools already. That’s already a table stake thing that they already have, when they’re actually monitoring their stuff in production. Data engineering teams don’t have that. So if you take a lot of the core concepts that are a part of the software delivery teams and the software delivery engineering processes that go on there and apply that to applications and cloud infrastructures and prod, we’re doing the same thing now for data engineering teams with data pipelines, data infrastructure, data sets, data tables, all those things are now a part of a observability for data engineering teams and not just for software engineering teams. I think that’s the hodgepodge of things that have gone on that’s really made this category take off.

Steven Dickens: Yeah, it’s interesting. We see that observability space as you say, primarily for the security and operations teams. It’s interesting to see that now sort of bleeding across into that data space. I think the needs are crucial. One of the things I wanted to drill down on as people, especially with AI, you’re starting to pull together what they’re doing. One of the big areas we’re seeing is data quality. Where does this play in from a data quality perspective?

Ryan Yackel: Yeah, there’s some pretty cool memes out there around your AI strategy getting swatted down by poor data quality, right?

Steven Dickens: You’ve got to love a good meme to be able to describe this industry for sure.

Ryan Yackel: Oh yeah, I love them, and they’re all over Instagram and LinkedIn if you follow any of the data creators that are out there. So I mean it’s very garbage in, garbage out, right? That’s a thing that’s been going on in the software delivery world forever. I used to work in a company called Tricentis. It was around software test automation. We constantly talked about need for testing and the right kind of testing at the right time, and different layers of testing you have so that you don’t have a production issue when it goes out into the wild. That’s 10 times more expensive, right? A hundred times more expensive to fix something that’s in production than if you had caught it earlier. The same thing is going on in the data space. You have all this data being passed through data downtime, data bugs, data issues that occur so late in the process because there’s not something there that can monitor things as it’s in motion or at rest to alert and tell you the root cause of it to go fix it before it gets into production.

So observability is really just another layer on top of your overall data quality strategy, data reliability strategy, and also data governance. So I mean we work very well with other data cataloging tools like IBM Knowledge Catalog is one of those other DQ solutions for testing. We work very well with other built-in DIY monitoring tools that you have as well as an extra layer on top of the data quality you’re wanting to push out. So an example is, in AI, we have a customer that’s using airflow to trigger different SageMaker Pipelines that are a part of their predictive models for advertising. And so one of the things that they need to have is observability around the airflow processes and SageMaker Pipelines in case anything goes bad, they need to go and correct those right away. Because the only way they’re going to know if something’s wrong with the predictive model is if a pipeline is not sending the right amount of data, it’s not sending it correctly, it’s sending it at the wrong time or it didn’t even run at all. And so that’s an example of you have a huge AI strategy going on one hand, but all the data that’s feeding that, all those models, those go down, then there’s a problem.

Steven Dickens: Ryan, one of the things you mentioned, I want to take you back, you mentioned governance. I think that’s a really key point. People are starting to think about where is this data coming from? What’s the provenance of that data? Maybe if you could just drill down on what you mean. You glossed over it really quick as part of the rest of the discussion, but I think it’s something worthwhile coming back to. Can you just expand?

Ryan Yackel: Yeah. So from a data governance standpoint that really entails a lot of things from data access, data privacy, to security, to making sure you have certain quality scores around your data that are in a resting motion. The observability, what we’re doing is we’re not taking away any of those things when we talk about data quality, a lot of times we’re talking about it in the midst of your overall data quality strategy and a part of your data quality strategy needs to have something around the reliability of let’s say data that’s in motion. So I mentioned a lot of times the pipelines that are feeding these data sets, the schemas that are actually in motion within those pipelines, and then the impact analysis or lineage that actually tells you if this pipeline failed, then downstream there’s another problem that could occur. And so when we talk about governance, we’re not talking about replacing or chipping away at the current governance strategy you may have within your data. We’re telling you, “Hey, this is a totally separate thing that can add to your overall governance posture to make it even better.” That’s why we talk about, when we go into talking about customers, they may have a cataloging solution, but that may be only really geared towards data access and more geared towards correcting the data as it’s in data tables and making automated corrections around that automated data quality. We’re talking about things that are even further up the stream before they even get to that point.

So that’s where I say it’s a complimentary, and obviously the White Paper we wrote recently with Butamben dives into that, and the reason why we did that is because it’s annoying, but this is the way it is. If you look at across every observability solution that’s out there, they’re going to use language like data quality, data reliability, schema changes, pipeline issues, data quality problems, data downtime. They’re going to use a lot of the same terminology that traditional data quality tools already use, because observability isn’t something that’s front of mind with a VP or an executive. They know about data quality, but they don’t know that data observability can lead to better data quality. And so we like to make the differentiation between this observability and data quality thing. It’s not an if or and, it’s a both and. It’s a combination of things, and quite honestly, that’s why you’re seeing other companies out there that traditionally only talks about data quality, now talking about observability, because now observability has become a hot trigger item, or term, that people are now talking about. So there’s a little bit of a keyword gamesmanship that’s going on, but at the end of the day, we’re all trying to add to your overall data quality data governance posture. At the end of the day, that’s what we’re trying to do.

Steven Dickens: The market is always good at jumping on a new trend and trying to get optimized for it, but I think to make this real for clients, can you maybe talk about, I mean the technology’s great, good to understand the data observability. We’re hearing a lot about data fabric. Can you just expand on some of the use cases? So maybe let’s take the technology and make it real. Where is this being deployed and maybe talk about some of the use cases?

Ryan Yackel: Yeah, for sure. So within the overall data fabric, we integrate to a lot of the current IBM solutions that you have, but our main integration point is to tie into the tools that are a part of an overall data fabric strategy, that may not be the exact tooling that comes from one vendor. So for example, one of the things a part of, when we got acquired and we would talk to the different account executives or a part of… Obviously IBM has some of the largest companies in the world, and we come to find out that, although they’ve deployed data fabric within these organizations, they have other teams and other departments that are using tools that are completely different than what IBM is providing. So for example, a major bank, they were using IBM DataStage for some of their workflow orchestration and data integration needs, but they had a whole other team that had around 5,000 airflow pipelines, that they were deploying in a totally separate group. So that was an example where we could go in and say, “Hey, are you having problems with your airflow dags blowing up? Are you having problems with you not seeing the visibility you want in a single pane of glass across all the airflow pipelines that you have so you can know exactly when things go wrong or if scheme has changed within those data sets or there’s pipeline latency,” all those things, again, at scale, it’s harder and harder to monitor. It was an easy way for us to go in and say, “Hey, we’re not replacing airflow, we’re not replacing your Python scripts. We’re not replacing your DBT runs that you’re using. All we’re doing is we’re adding in a layer of insurance on top of these so that you can know exactly where the problems are when they occur and give you peace of mind so you’re not having to mainly track these things.”

And so that’s an easy way for Databand to come into either existing or new customers that are looking at having more data quality within their overall governance posture is just to say, “We’re not taking away anything that you have. We’re just adding to it,” because all observability tools, they are absolutely worthless unless you have tooling and processes already in place because all we’re doing is monitoring those things and sitting on top of those things and giving you a better peace of mind of how those pipelines are running and tables are acting. Versus saying, “Hey, we’re going to come in and replace all your airflow with a new orchestration tool we’re trying to sell you.” That’s not how our go-to-market is. So within the overall data fabric we have those two plays. We have the play of existing customers that are using full on IBM products. We can tie into those and we have even those same companies that say, “Hey, we’re using part of this, but we’re actually using some of these other solutions over here. How do you integrate to that?” And we can show them value.

Steven Dickens: So maybe help us quantify some of that value. What are customers seeing as they deploy Databand? What’s the before and after picture look like? Is that people savings? Is it cost savings? Is it just data quality improvements or is it all of the above?

Ryan Yackel: Yeah, I would say it’s all of the above, but I’ll give you some examples. We measure things in really three different areas. One is meantime to detection. So MTTD, and again, that’s an example, that’s a term that application observability tools like Instana, Datadog, New Relic, they use these terms as well because it’s very similar. I keep saying that. If you understand what observability is in the application space, a light bulb should trigger in your head and go, “Oh, we just need to apply that to our data processes and we’ll see the same benefits as these applications or ability tools.” So one is meantime to detection, which is how fast can we reduce, improve the time to you to detect something or even detect things you didn’t even know about. And so, when we set up a customer on Databand, we first get an inventory of all the different pipelines they have. We immediately set those up, we set baselines against those, so then we can do anomaly detection around anything that looks weird outside of the normal instances of how that’s baselined over a few weeks, few days, and then it’ll adjust, based off of us using some anomaly detection. So that’s one.

The other one is meantime to resolution, which is MTTR. Again, another acronym that’s used in observability space, which is, “Okay, we detected something, how fast can we resolve it?” And that’s really around the root cause analysis that we do with Databand. So that’s telling you not just, “Hey, something broke,” that’s, “Hey, this is why it broke. This is the problem that you’re running into. This is how many times it’s occurred. Here’s the trend analysis of why this keeps occurring. Here’s how it affects things downstream.” All those things are helping you figure out how to resolve that issue with everything in one place. An example would be, “Hey, I had some machine learning pipelines that were feeding a downstream DBT job possibly, but then kicked off some tests and it ran, then it pushed some data into Snowflake. Well, we can track all of that, end to end, to show you exactly where the problems occurred, where the typical root cause is, and then help you fix it.”

And then the last one is really around if you detect earlier, resolve faster. The other one is you’re going to end up delivering higher data SLAs. So you’re going to meet those and guarantee those data SLAs you have, both internal and external customers or stakeholders rather than you may have. So a lot of times, these data engineering teams and platform teams, they have very, very clear service level agreements around when the data needs to get to a certain place at a certain time and the quality of that. Well, by adding in Databand, you can more likely guarantee those because anytime something breaches or is about to breach a data SLA that you’ve set up in our system will tell you right away. Then hopefully, as you learn more of the nuances around what’s been going on and how things continually break and you fix them more and more and more, then those processes get better and those data SLAs are met more and more.

So we like to say, a lot of times Databand is going to do a lot of alerting and cleanup for you initially when you first use it to say, “Okay, here are all the problems that we have in our pipelines and datasets. Let’s fix those.” Then you would hope that alerting goes down over time, we don’t want to just add more and more and more and more alerts to you. Hopefully it goes down over time, and it only grows as proportion of to how much more workloads you put into the system. Right? Then it becomes more of like a monitoring observability layer that hopefully should only go off when something really, really bad happens. It’s the engine light on your car. You don’t want the engine light to constantly be going off, but if you’ve gone to the dealership and you fixed it, you would hope that the monitoring within the car is going to alert you when there really is a next problem. And by the way, you can’t afford to then just take out an observability tool because you think everything’s fixed, because as we know, you ignore problems and you think everything is great, something happens and then it’s a disaster.

Steven Dickens: It’s always the challenge. We talk about it in the… Alert fatigue. You’ve got to map that right level of the right type of alerts and the right frequency of those alerts, but not over alert because the teams start to switch off.

Ryan Yackel: Yeah.

Steven Dickens: Well, Ryan, it’s 2023. We’re recording this podcast and we’ve gone 20 minutes without mentioning AI.

Ryan Yackel: Oh man.

Steven Dickens: I know.

Ryan Yackel: AI, it’s in right now.

Steven Dickens: I don’t think that’s entirely possible to record a podcast in 2023. Obviously, the data and the data pipelines and the data quality and the governance are all crucial as people start to think about their AI journey, what do you see as implications. As I’m starting to think about, “Hey, I’m looking to deploy my enterprise AI platform,” and I think that’s where a lot of clients are. Where do you see the data ban piece fitting into that overall equation?

Ryan Yackel: Yeah, so I mentioned in an example where we had a customer that was basically using AWS SageMaker for a lot of their AML pipelines and creation of their predictive models and things like that. Also, obviously, this isn’t news, but it’s a big deal, which is IBMs relaunch Watson X as the enterprise addition for our overall AI strategy, and that comes in three different facets, for the most part. It’s .ai, watsonx.ai, which is all about a studio around building and maintaining those models. You’ve got .data, which is basically our open lakehouse to store the data and push the data to these models. And we have governance, which is all around monitoring those models, monitoring the AI, figuring out if there’s any biases or any issues within those that could actually be detrimental to AI strategy.

So all that’s really awesome. Then on top of that is again, overall the data fabric. And so what I like to… There’s an analogy that I give and Databand’s a part of this analogy, which is, if you look at Formula One, Formula One is a very high stakes sport. You’re driving a car, hundreds of miles an hour, in very tight conditions and very dangerous conditions. If anything goes wrong with the car or the operations or the driver or somebody that’s just racing against you, it could go really bad. But also, it’s one of the most exciting sports that’s out there, and it’s all about the constantly fine-tuning of the car to make the car as good as it can be, pre-race, after race, and then even actually during the race, with the different types of ways they can adjust the car mechanics. So way I like to describe it is in an overall AI strategy, if Formula One’s your overall AI strategy, the car would be all of the awesome things that WatsonX does, right?

It’s the steering wheel, it’s the chassis, it’s the driver, it’s all the things that are actually building those AI models. And then all the operations, components that go into the car, the building of the car, the fuel injectors that go into the car, the team around the car, the radio notifications into the car to let the car know what AI you need to go faster, slower, take this turn, slow down, notifications around engine failures or things like that, all of those things are all going to come together for an overall AI strategy. So that’s what I would say in terms of the way these of two come together and obviously, data engineering team is at the forefront of that. They’re the ones that are pushing the data into these models so that they can be used for an AI issue that you may have

Steven Dickens: A Formula One analogy’s always a good place to wrap up. Ryan, thank you so much for joining us. It’s been really good to drill into this. I think you mentioned it, we’ve written a research report here. Please click and subscribe. We’ll put the link to the report. But I think some fascinating points here, especially the AI piece that you talked about and where that’s fitting in. So thank you very much for joining us today.

Ryan Yackel: Yeah, thank you so much.

Steven Dickens: You’ve been listening to another episode of the Futurum Tech Webcast. Please click and subscribe and do all those things to help the algorithm. We’ll put a link to the research that we’ve done with the Databand team at IBM down in the show notes, and we’ll see you next time. Thank you very much for watching.

Other Insights from The Futurum Group:

The Six Five Insider at IBM Analyst Day with Rob Thomas and Dr. Dario Gil

IBM Helping Riyadh Air Get into the Skies in 2025 with Digital Tech

Growing the IBM-AWS Alliance – The Six Five on the Road at AWS re:Invent 2023

Author Information

Steven Dickens

Regarded as a luminary at the intersection of technology and business transformation, Steven Dickens is the Vice President and Practice Leader for Hybrid Cloud, Infrastructure, and Operations at The Futurum Group. With a distinguished track record as a Forbes contributor and a ranking among the Top 10 Analysts by ARInsights, Steven's unique vantage point enables him to chart the nexus between emergent technologies and disruptive innovation, offering unparalleled insights for global enterprises.

Steven's expertise spans a broad spectrum of technologies that drive modern enterprises. Notable among these are open source, hybrid cloud, mission-critical infrastructure, cryptocurrencies, blockchain, and FinTech innovation. His work is foundational in aligning the strategic imperatives of C-suite executives with the practical needs of end users and technology practitioners, serving as a catalyst for optimizing the return on technology investments.

Over the years, Steven has been an integral part of industry behemoths including Broadcom, Hewlett Packard Enterprise (HPE), and IBM. His exceptional ability to pioneer multi-hundred-million-dollar products and to lead global sales teams with revenues in the same echelon has consistently demonstrated his capability for high-impact leadership.

Steven serves as a thought leader in various technology consortiums. He was a founding board member and former Chairperson of the Open Mainframe Project, under the aegis of the Linux Foundation. His role as a Board Advisor continues to shape the advocacy for open source implementations of mainframe technologies.

Data Observability’s Role in Enterprise Data Quality and AI Readiness – Futurum Tech Webcast

Other Insights from The Futurum Group:

Author Information

Steven Dickens

SHARE:

Latest Insights:

New to You, Phison Pascari Enterprise SSD Range for Hardcore AI Workloads

Powering the AI Engine: AMD Instinct, ROCm and Real-World Wins – Six Five On The Road

The Futurum Group

Data Observability’s Role in Enterprise Data Quality and AI Readiness – Futurum Tech Webcast

Other Insights from The Futurum Group:

Author Information

Steven Dickens

SHARE:

Latest Insights:

New to You, Phison Pascari Enterprise SSD Range for Hardcore AI Workloads

Bolstering Resiliency Through Talent Succession Planning – The Main Scoop

Powering the AI Engine: AMD Instinct, ROCm and Real-World Wins – Six Five On The Road

EPYC: How AMD is Rewriting the Enterprise AI Story – Six Five On The Road

The Futurum Group

Welcome to The Futurum Group

Book a Demo

Thank you, we received your request, a member of our team will be in contact with you.