Apple Using YouTube To Train Its Models?

Apple Using YouTube To Train Its Models?

The Six Five team discusses Apple using YouTube to train its models.

If you are interested in watching the full episode you can check it out here.

Disclaimer: The Six Five Webcast is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors and we ask that you do not treat us as such.

Transcript:

Patrick Moorhead: Let’s move to Apple potentially using YouTube to train its models. Apple has what some would say is a pristine reputation for not stealing stuff and using it for themselves. Others might say Apple steals an incredible amount of stuff, like IP doesn’t pay its vendors, like Qualcomm tries to make them go bankrupt, but this is an interesting one, Dan. Break this down for us.

Daniel Newman: They only done that a couple of times and never to anyone quite the size of Qualcomm. I mean, look, there was this investigation that came out that was making claims that Apple Intelligence had been trained on YouTube data. Something like 170,000 videos is the number that’s running around the internet. A lot of these are from these super influencer types, MKBHD and Mr. Beast. I think it was 170,000 videos and it’s focused on the subtitle was really what it was trained on. So it’s the subtitles, which was the language. Now, long and short, Pat, is, is Apple doing this? Yes. Is Apple doing it for Apple Intelligence? And what they’re saying is no. This is really important because Apple launched what it called the OpenELM model, and that was a research contribution. So they were starting to be involved in this LLM development and advancing open source. So Apple calls that a state-of-the-art open language model.

Now, when Apple talks about its OpenELM, they tell the markets that this was only done for research purposes. So this wasn’t designed to power Apple Intelligence. So I guess the first question, Pat, is … By the way, there’s a group, and again, I’m not sure if you’re super familiar with it, it’s called EleutherAI. Now, EleutherAI is basically a large collection that’s kind of I guess as a sort of a nickname of the pile, and this is supposedly a pile of sort of real data. Now, one of the sub issues that’s going to come out as we continue to proliferate a large language model and model development is going to be we’re going to run out of real human-created data to train models on. So what’s going to start to happen is we’re going to start creating and training models on synthetic data or data that’s already been created by AI. So EleutherAI, this pile has been created is to be a pile of real human-created content and data that can be used for research and for the purpose of companies like Anthropic, Apple, NVIDIA, and others to train models on.

That was a little bit of an audible from the play about Apple, but the long and the short of it is, yes, Apple did train on YouTube. It took subtitles from 170,000 videos that were done by high-profile influential voices on YouTube, but they used it to train a research-focused open source model that is not being used to feed their Apple Intelligence platform. So does that make it okay? I mean, I think the question mark there is that I don’t know that that makes it okay. I don’t know that all these companies that are using this content for free without any royalty or licensing agreement for the benefit of research makes it okay. But, Pat, look, this is a fundamental issue that sits at the very top of the LLM development cycle where were all these models created, right? We all remember the very, very popular video of the OpenAI CTO talking about whether or not the Sora model was trained on YouTube to which was probably one of the most awkward interview interactions that I’ve ever seen when you know somebody’s looking at you just lying straight to your face, but I guess by you could call that a lie of omission.

So long and long is that Apple’s not not guilty, but they might not be guilty either. They’re kind of following the protocol that everyone seems to, which is trying to find high value real human-created content to train models on to then deploy into research ecosystems to try to create better models. But Pat, here’s my caveat. Do we really know? Do we ever really know? I mean, they can say they’re not using it in Apple Intelligence. How’s the average person ever really know that that’s true to be the case? I think right now, there’s a lot of trust that goes on, and we found out whether it’s been Apple, it’s been Adobe, it’s been OpenAI, it’s been Microsoft, it’s been Google, that there is a lot of gray area as to what data can be used, what should be used, and what we’re seeing in the results that we’re getting.

Patrick Moorhead: Listen, I think Apple is brilliant in how they responded to this, right? I mean, when in doubt you blame it on your research group, you blame it on junior people, you blame it on somebody. You put together a process that you fired. I mean, it’s-

Daniel Newman: Pat’s about to go hard. I can feel it.

Patrick Moorhead: No.

Daniel Newman: Was I too nice? Was I too nice?

Patrick Moorhead: No, no, no. You were actually, I think, quite balanced in that.

Daniel Newman: Weird.

Patrick Moorhead: No. I still have PTSD from working in corporate America. I have seen every excuse. I’ve seen the blame game. I don’t know if you’ve heard the two envelope joke about the new CEO comes in and one of them talks about, “Hey, be sure to blame it on me if things go south.” I mean, we see responsibility here, but it is, to be fairer, I mean, blaming on your research people. I don’t think there are and should be any restrictions on research groups being able to come in and hoover data, as long as none of that gives value to something that ends up to be a commercial entity. I have a hard time believing that what Apple’s Research Group is doing and the insights they’re gaining won’t help Apple sell something they’re going to make money and margin on. I mean, it’s kind of like OpenAI is a non-profit corporation that’s worth a hundred billion dollars, right? I mean, it’s kind of farcical and it’s kind of funny, right? Anyways, I think we’ve drained this topic as much as we can.

Daniel Newman: Listen, man, I thought we’d launched into that one though, that you were just going to come just crushing down on them, but basically, is the TLDR, if I wanted to give the sound bite here, is that they’re all equally as full of or not full of (beep) as the others when it comes to this.

Patrick Moorhead: Beep.

Daniel Newman: Sorry, I just ruined this one. We’re going to get a censorship on this one.

Patrick Moorhead: We’re going to get explicit. We’re going to be explicit.

Daniel Newman: We hope that everybody out there will stay with us and realize that I was just saying that out of love for the industry because I am a techno-optimist, Pat. I like to think it’s all good.

Author Information

Daniel is the CEO of The Futurum Group. Living his life at the intersection of people and technology, Daniel works with the world’s largest technology brands exploring Digital Transformation and how it is influencing the enterprise.

From the leading edge of AI to global technology policy, Daniel makes the connections between business, people and tech that are required for companies to benefit most from their technology investments. Daniel is a top 5 globally ranked industry analyst and his ideas are regularly cited or shared in television appearances by CNBC, Bloomberg, Wall Street Journal and hundreds of other sites around the world.

A 7x Best-Selling Author including his most recent book “Human/Machine.” Daniel is also a Forbes and MarketWatch (Dow Jones) contributor.

An MBA and Former Graduate Adjunct Faculty, Daniel is an Austin Texas transplant after 40 years in Chicago. His speaking takes him around the world each year as he shares his vision of the role technology will play in our future.

SHARE:

Latest Insights:

Brad Shimmin, VP and Practice Lead at The Futurum Group, examines why investors behind NVIDIA and Meta are backing Hammerspace to remove AI data bottlenecks and improve performance at scale.
Looking Beyond the Dashboard: Tableau Bets Big on AI Grounded in Semantic Data to Define Its Next Chapter
Futurum analysts Brad Shimmin and Keith Kirkpatrick cover the latest developments from Tableau Conference, focused on the new AI and data-management enhancements to the visualization platform.
Colleen Kapase, VP at Google Cloud, joins Tiffani Bova to share insights on enhancing partner opportunities and harnessing AI for growth.
Ericsson Introduces Wireless-First Branch Architecture for Agile, Secure Connectivity to Support AI-Driven Enterprise Innovation
The Futurum Group’s Ron Westfall shares his insights on why Ericsson’s new wireless-first architecture and the E400 fulfill key emerging enterprise trends, such as 5G Advanced, IoT proliferation, and increased reliance on wireless-first implementations.

Book a Demo

Thank you, we received your request, a member of our team will be in contact with you.