Menu

Bigger isn’t Always Better: It’s All About Lean Data

Bigger Isn’t Always Better: It’s All About Lean Data

Does your company need to go on a data diet? If you’re like many businesses today, you’re collecting a lot more data than you need or use, an issue that could cause security and privacy issues, as well as unnecessary storage costs. It turns out being agile and lean in digital transformation doesn’t require more data—it requires smarter data. And it’s time for companies to learn how to make the distinction. Does your company need to go on a data collection diet?

First: let’s deal with the elephant in the room — or cloud. In an age when “big data” is everything, it’s easy to believe that the more data you gather, the better your company will perform. We need data lakes after all, right? But the truth is more than half of the data collected by companies goes “dark.” Gartner defines dark data as information assets that a company collects, processes, and stores but generally fails to use. Those lates turn into swamps.

Surveys show, on average, 55 percent of data being collected is either data companies don’t know how to use or data companies aren’t sure they accurately captured. That number goes up to 75 percent for more than 1/3 of companies in the United States. Shocking, right? And irresponsible, as well, because every piece of data collected is a piece of data that could be targeted or exploited at a customer’s loss.

The following are a few tips for determining what data you should actually be collecting, and why your customers should be actively involved in the data collection process.

Focus on Minimal Viable Data

In digital transformation, less is more. Rather than focusing on how much data you can gather, focus on how little you can gather to get the most meaningful results. Do you really need a birth year or will month and day suffice? Do you really need an address or will the zip code work? Question your data collection process to find out what you can live without.

Create a Data Map

Do you even know how much data you are collecting? Do you know who has access to it? Make a clear data map to ensure no “dark data” is making its way into your system.

Ask, “What Would Happen If We Didn’t Collect X?”

Lots of companies focus on what new data they can gather. What about asking which data you can live without? Like I said before, ask if you really need the data. Do you really need your customer’s last name? What marital status? Net worth? Do you really need to calculate how much time they spend on your website? The answer might be “yes”—but it might also be “no.” If you don’t see an immediate value in the data you are collecting, stop collecting it. Especially if your customers don’t even know you’re collecting it in the first place!

Be Transparent

To play off the point above, do your customers know how much data you are collecting about them? Do they understand you aren’t just gathering their name and email address when they sign up for your weekly coupon on your mobile app—but tracking their every movement through your store? If not, you need to take a moment and recognize the “cringe” factor associated with gathering data against your customers’ will. Maybe it could help you improve customer experience in the long term but does doing that help earn you loyalty points? Rather than collecting the information surreptitiously, be open. Be explicit. Let your customers know the kind of experience you’re trying to create for them, and why. And then give them the option of receiving it—or not.

Like I said in my recent post, if you’re a CMO you need be involved in your privacy policy creation for this exact reason. You need to know what data you can collect while also telling your customers why you’re collecting it.

Consider Both Sides of the Data Relationship

As a company, you’re getting something from the data you collect, or at least you’re hoping to. But what are you giving back to your customer, really? For most of us, the goal is to use the data to give customers a more personalized shopping experience—more highly personalized offers and incentives, easier shopping experiences, etc. But is the data we’re collecting really allowing us to do that? And have we really kept our eye on that CX prize? Often times, we become so obsessed with the goal of collecting data that we lose sight of the ultimate endpoint: improving the customer journey. Take a look at your data relationship and be honest about where you are giving back (or not) with the data you’re gathering.

Remember: every point of data you collect is a point of data you need to store and keep secure or destroy when the time is right. A recent breach of 200 million records (allegedly originating from Experian but purchased by third-party marketers), included things like religion, income, net work, gender, and phone number. Did the marketing companies need all of that information? No! But all of that information now available on the dark web. By eliminating the collection of unnecessary data, you eliminate potential security issues and ballooning storage budgets from the get-go, freeing you up to focus on more meaningful work.

Yes, sometimes algorithms are tricky. Sometimes you aren’t quite sure what types of data you’ll need to collect until you figure the “winning algorithm” out. But once you do, regroup. Unclick some boxes. Commit to collecting only what you need and communicate to your customers about why you’re doing it. You’ll be surprised how much more loyal they’ll be when harvesting data is not your company’s only goal.

Futurum Research provides industry research and analysis. These columns are for educational purposes only and should not be considered in any way investment advice.

The original version of this article was first published on Forbes.

Author Information

Daniel is the CEO of The Futurum Group. Living his life at the intersection of people and technology, Daniel works with the world’s largest technology brands exploring Digital Transformation and how it is influencing the enterprise.

From the leading edge of AI to global technology policy, Daniel makes the connections between business, people and tech that are required for companies to benefit most from their technology investments. Daniel is a top 5 globally ranked industry analyst and his ideas are regularly cited or shared in television appearances by CNBC, Bloomberg, Wall Street Journal and hundreds of other sites around the world.

A 7x Best-Selling Author including his most recent book “Human/Machine.” Daniel is also a Forbes and MarketWatch (Dow Jones) contributor.

An MBA and Former Graduate Adjunct Faculty, Daniel is an Austin Texas transplant after 40 years in Chicago. His speaking takes him around the world each year as he shares his vision of the role technology will play in our future.

Related Insights
Snowflake's SnowWork Targets the Gap Between Data Insight and Business Action
March 25, 2026

Snowflake’s SnowWork Targets the Gap Between Data Insight and Business Action

Brad Shimmin and Nick Patience explore Snowflake’s Project SnowWork and how the Agentic Enterprise Control Plane turns the AI Data Cloud into a "system of action" for autonomous workflows across...
Mistral Forge Takes Aim at RAG. But Who Actually Needs Custom Models
March 25, 2026

Mistral Forge Takes Aim at RAG. But Who Actually Needs Custom Models?

Nick Patience, AI Platforms Practice Lead at Futurum, examines Mistral Forge, a custom enterprise AI model training platform, and argues that while its approach is sound, the addressable market may...
Oracle Positions AI Database 26ai to Lead $1.2 Trillion Market by Bridging the Agentic Reasoning Gap
March 25, 2026

Oracle Positions AI Database 26ai to Lead $1.2 Trillion Market by Bridging the Agentic Reasoning Gap

Brad Shimmin and Keith Kirkpatrick of Futurum explore Oracle's pivot to agentic plumbing. Oracle is embedding autonomous reasoning directly into Oracle AI Database 26ai to solve the enterprise data latency...
Grounding the Agentic Mandate As the Semantic Layer Market Eyes 19% Growth, Microsoft Fabric IQ Targets Leaders Prioritizing AI Investment
March 20, 2026

Grounding the Agentic Mandate: As the Semantic Layer Market Eyes 19% Growth, Microsoft Fabric IQ Targets Leaders Prioritizing AI Investment

Brad Shimmin, VP and Practice Lead at Futurum, shares insights from FabCon and SQLCon 2026 on how Microsoft is leveraging the new Database Hub and Fabric IQ to unify transactional...
NVIDIA GTC 2026 Day 1 - Can NVIDIA’s Ecosystem Accelerate the Inference Inflection
March 18, 2026

NVIDIA GTC 2026 Day 1 – Can NVIDIA’s Ecosystem Accelerate the Inference Inflection?

Brendan Burke, Research Director at Futurum, breaks down NVIDIA GTC 2026 Day 1, highlighting the NVIDIA Vera Rubin platform, the $27B Nebius-Meta deal, and how partners like HPE and Micron...
NVIDIA Agent Toolkit
March 16, 2026

At GTC 2026, NVIDIA Stakes Its Claim on Autonomous Agent Infrastructure

Nick Patience and Mitch Ashley, analysts at Futurum, examine NVIDIA's Agent Toolkit announcements at GTC 2026, covering NemoClaw, AI-Q, the Nemotron Coalition, and what they mean for enterprise agentic AI...

Book a Demo

Newsletter Sign-up Form

Get important insights straight to your inbox, receive first looks at eBooks, exclusive event invitations, custom content, and more. We promise not to spam you or sell your name to anyone. You can always unsubscribe at any time.

All fields are required






Thank you, we received your request, a member of our team will be in contact with you.