Introduction: Generative AI is widely considered the fastest moving technology innovation in history. It has captured the imagination of consumers and enterprises across the globe, spawning incredible innovation and along with it a mutating market ecosystem. Generative AI has also caused a copious amount of FOMO, missteps, and false starts. These are the classic signals of technology disruption – lots of innovation, but also lots of mistakes. It is a rumpus room with a lot of “kids” going wild. The rumpus room needs adults. Guidance through the generative AI minefield will come from thoughtful organizations who do not panic, who understand the fundamentals of AI, and who manage risk.
Our picks for this week’s Adults in the Generative AI Rumpus Room are Anthropic, Kolena, and IBM.
Anthropic’s Responsible Scaling Policy
The News: On September 19, Anthropic published its Responsible Scaling Policy, which is intended to manage potential severe risks of using its AI models. Per the company’s published paper:
“As AI models become more capable, Anthropic believes that they will create major economic and social value but will also present increasingly severe risks. With this document, we are making a public commitment to a concrete framework for managing these risks…. We focus these commitments specifically on catastrophic risks, defined as large-scale devastation (for example, thousands of deaths or hundreds of billions of dollars in damage) that is directly caused by an AI model and wouldn’t have occurred without it. AI represents a spectrum of risks, and these commitments are designed to deal with the more extreme end of this spectrum.”
Central to the framework is the concept of AI safety levels (ASLs). Anthropic is defining a series of AI capability thresholds that represent increasing potential risk. There are two types: deployment risks, which come from active use of AI models, and containment risks, which come from just possessing an AI model. An example of a containment risk is the creation of an AI model that could enable the production of weapons of mass destruction if stolen.
Anthropic notes this process is iterative and admits its “building the airplane while flying it,” in that it is attempting to govern systems and problems that have not yet been built or hatched.
You can read Anthropic’s Responsible Scaling Policy on the company website.
Adults because… Anthropic is at the very least taking some responsibility for the AI models it produces, even though those models are not specifically designed for any particular use case intended for good or bad. The company will be monitoring how its models are tuned and used and will first make it harder for attackers to steal model weights, and second build in misuse prevention measures that would allow Anthropic to shut things down. Other AI model makers have some preventions in place, but as these things go, proactive, de facto standards can lead to legislated preventions.
Kolena Offers Tools To Test and Benchmark AI Model Performance
The News: On September 26, Kolena announced it raised $15 million, bringing the total raised to $21 million. Kolena is focused on building trust in AI by making AI models work better. A TechCrunch article about the funding described Kolena’s approach:
“Kolena can provide insights to identify gaps in AI model test data coverage.… And the platform incorporates risk management features that help to track risks associated with the deployment of a given AI system (or systems, as the case may be). Using Kolena’s user interface (UI), users can create test cases to evaluate a model’s performance and see potential reasons that a model’s underperforming while comparing its performance to various other models.
“’With Kolena, teams can manage and run tests for specific scenarios that the AI product will have to deal with, rather than applying a blanket ‘aggregate’ metric like an accuracy score, which can obscure the details of a model’s performance,’ [co-founder and CEO Mohamed Elgendy] said. ‘For example, a model with 95% accuracy in detecting cars isn’t necessarily better than one with 89% accuracy. Each has their own strengths and weaknesses — e.g., detecting cars in varying weather conditions or occlusion levels, spotting a car’s orientation, etc.’”
Read the Kolena TechCrunch story here.
Adults because… By its nature, AI contains a degree of nebulousness. How does it work? How do we know it is accurate? Rules-based development is traceable; AI is much less so. Therefore, AI systems must be tested and monitored. Kolena is not the only company offering tools to test and monitor AI models; the company joins a growing number of others that are helping enterprises build and deploy better AI models. Here is hoping these monitoring tools are widely adopted.
IBM Provides Contractual Protections for AI Models
The News: On September 28, IBM announced the company would provide its standard intellectual property protection to IBM-developed watsonx models. According to the press release: “IBM provides an [intellectual property] indemnity (contractual protection) for its foundation models, enabling its clients to be more confident AI creators by using their data, which is the source of competitive advantage in generative AI. Clients can develop AI applications using their own data along with the client protections, accuracy and trust afforded by IBM foundation models.” IBM provides similar protection for hardware and software products.
Read the full press release on IBM AI model protections on the IBM website.
Adults because… We are squarely in a time where there is a lot of activity and interest in AI, but it is also a time when most people are just learning about AI – what it can and cannot do, as well as what it should and should not do. It is a time of minimal trust in AI outcomes. Consequently, companies that build trust in AI outcomes will help the overall AI market advance. IBM providing IP protection for AI developed in watsonx is helping build trust in AI outcomes. Adobe has a similar provision for AI-fueled Firefly. More and more “Adults” will back the AI they place into production.
Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.
Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.
Other insights from The Futurum Group:
Adults in the Generative AI Rumpus Room: Salesforce, DeepLearning.ai, Microsoft
Adults in the Generative AI Rumpus Room: Gleen, IBM
Adults in The Generative AI Rumpus Room: Arthur, YouTube, and AI2
Author Information
Mark comes to The Futurum Group from Omdia’s Artificial Intelligence practice, where his focus was on natural language and AI use cases.
Previously, Mark worked as a consultant and analyst providing custom and syndicated qualitative market analysis with an emphasis on mobile technology and identifying trends and opportunities for companies like Syniverse and ABI Research. He has been cited by international media outlets including CNBC, The Wall Street Journal, Bloomberg Businessweek, and CNET. Based in Tampa, Florida, Mark is a veteran market research analyst with 25 years of experience interpreting technology business and holds a Bachelor of Science from the University of Florida.