Analyst(s): Brad Shimmin
Publication Date: March 21, 2025
AWS’s newly available SageMaker Unified Studio promises to simplify data and AI workflows by providing a central hub for querying and processing data across various AWS services and third-party sources. This is further enhanced by updates to the company’s S3 Tables service, improving integration with SageMaker Lakehouse, and adopting the Apache Iceberg RESTful API standard. These changes underscore AWS’s focus on the data lakehouse architecture for generative AI and analytics.
What is Covered in this Article:
- AWS announced the general availability of SageMaker Unified Studio, a single-user experience through which data and AI professionals can access, build, and deploy analytics and AI solutions.
- AWS has streamlined access to S3 Tables through SageMaker Lakehouse (now generally available), opening them up to a broad set of analytics query engines, most notably AWS’s own Athena query engine.
- AWS S3 Tables is now a first-class citizen within AWS’ Management Console, enabling users to discover, query, and manage tabular data housed in S3 and other data stores.
- When launched in December 2024, AWS S3 Tables was only available in three regions. Today, AWS offers this service across eleven regions and plans to reach all S3-supported geographies before the end of this year.
- New compatibility with the Apache Iceberg REST Catalog standard opens up S3 to both internal catalogs like AWS Glue and external catalogs like Apache Hive and Nessie.
The News: On March 13th, 2025, AWS announced the general availability of SageMaker Unified Studio, a single data and AI development environment designed to help users find, access, and act on corporate data using best-of-breed technologies from both AWS and third parties. Enhancing this effort, the vendor also announced several updates to its new S3 Tables service, streamlining access to S3 Tables through SageMaker Lakehouse, expanding regional availability, and adopting support for the Apache Iceberg REST Catalog standard.
AWS Puts S3 and Next Generation of SageMaker at the Heart of Its Data and Analytics Strategy During Pi Day 2025
Analyst Take: Looking back to AWS re:Invent 2024, AWS announced the next generation of Amazon SageMaker, positioning it at the center for data, analytics, and AI workflows. The idea was to combine AWS machine learning (ML) and analytics capabilities by unifying access to disparate tools and data without compromising governance. In this way, AWS hoped to accelerate the path from data to value for enterprise customers. To achieve this goal, AWS also began working on Amazon SageMaker Unified Studio, a single data and AI development environment where users could find and access data and AI assets using best-of-breed tools across virtually any use case.
Fast-forward to International Pi Day on March 14, 2025, when AWS celebrated the 2006 launch of its S3 storage service. As part of that celebration, AWS solidified its unifying vision, first by announcing the general availability of SageMaker Unified Studio and second by announcing several important updates to its emerging S3 Tables service. Those updates include integrating S3 Tables with SageMaker Lakehouse, expanding regional availability, and adopting support for the Apache Iceberg REST Catalog standard.
Together, these updates help the company greatly simplify how users interact with its sizable data and analytics products portfolio. For example, SageMaker Unified Studio brings together Amazon EMR, AWS Glue, Amazon Athena, Amazon Redshift, Amazon Bedrock, and Amazon SageMaker AI. Users can create projects within the unified studio to securely work with analytics and AI artifacts, including data, models, and generative AI (GenAI) applications – even those employing agentic workflows.
Across these announcements, what is crystal clear is that AWS is doubling down on the data lakehouse architecture as a unifying data layer upon which many upstream analytical tasks depend, particularly those in support of generative AI (GenAI) use cases, where unstructured and semi-structured data reign supreme. Amazon SageMaker Lakehouse provides unified, secure, and open access to enterprise data based on the popular Apache Iceberg open table format. This means that whether data is stored in Amazon S3 data lakes, Redshift data warehouses, or third-party and federated data sources, users can access it from one place and use it with Iceberg-compatible engines and tools.
S3 Tables Enter the Fray
First launched in 2006, Amazon’s simple storage service (S3) has played a vital role in delivering a highly scalable and affordable way for companies to store a wide range of data, not just for data lakes but for virtually any use case, including cloud-native software and mobile apps. Today, S3 houses over 400 trillion objects and handles 150 million requests per second. In looking at tabular data alone, S3 stores exabytes of data, averaging over 15 million requests per second. It’s little wonder, then, that AWS is now pushing more and more high-level database functionality down into this core data stratum. Why rely solely on a full-fledged database management system for any and all data access functionality when it’s much more cost-effective and performant to add those same capabilities (e.g., structured data access and control) to a software layer that’s much closer to the underlying hardware?
In conjunction with Pi Day, the global cloud provider has rolled out several enhancements to S3 Tables that bring it into closer alignment with several key AWS tools, including Amazon SageMaker Lakehouse, Athena, EMR, Glue, Redshift, and QuickSight. Further, SageMaker Lakehouse now integrates with Amazon S3 Tables, which was the first cloud object store with native Apache Iceberg support. This lets SageMaker Lakehouse users create, query, and process S3 Tables efficiently using various analytics engines in SageMaker Unified Studio and Iceberg-compatible engines like Apache Spark and PyIceberg.
First introduced during last December’s AWS re:Invent conference, S3 Tables is rapidly maturing into another option for AWS’ large but somewhat cumbersome portfolio of data and analytics solutions. Still, as outlined above, this announcement has many pieces, and some of these pieces are under active development.
Futurum has observed the same “down-stack” data and analytics trend unfold within the data storage marketplace. For example, vendors VAST Data, NetApp, and others are actively building vector store functionality into the storage layer to deliver performance that can scale linearly. This same approach is evident in S3 Tables, which optimizes Apache Iceberg workloads, purportedly delivering up to 10x higher transactions per second than non-managed Iceberg tables stored in general-purpose S3 buckets.
Better Integration Means Faster Time to Value
For AWS, which has historically extolled a best-of-breed approach to data and analytics software, speed often plays second fiddle to complexity. With 15+ databases available to customers, optimized for different use cases (graph, time-series, documents, etc.), and with several query engines customers can use to access this data, AWS and its customers have historically had to trade complexity for flexibility. The company’s work with Iceberg, S3 Tables capability, and SageMaker, promises to clean up much of that complexity, especially for analytics workloads.
To begin, S3 Tables are now compatible with the Apache Iceberg REST Catalog standard. This opens up S3 to both internal catalogs, such as Glue, and external meta stores like Apache Hive and Nessie. This will encourage third-party providers to treat AWS’ storage layer as a viable platform upon which to build modern data platforms. Further, the S3 Tables service is now a first-class citizen within AWS’ Management Console, where users can, for example, create, populate, and query S3 Tables using Athena (the company’s scalable query engine). An interesting aspect of this is that users can manage data access controls at different levels of abstraction. Users can, for example, define broader, database-level security directly within S3 Tables themselves, leaving more fine-grained row and column-level control for SageMaker Lakehouse.
In this way, AWS’ next generation of SageMaker brings several established AWS capabilities together, elevating SageMaker Unified Studio as a central hub for data, AI, and analytics professionals. The idea here is to unite disparate data sources and query engines within a single experience. Now users can access unified data across S3 data lakes, Redshift data warehouses, and third-party and federated data sources in SageMaker Lakehouse directly from SageMaker Unified Studio. Built on top of AWS’ data governance platform, DataZone, SageMaker Unified Studio also does more than just bring together several query engines and data sources. It can be used to orchestrate those within complete development workflows for AI app development, data processing, and SQL analytics.
This marks a significant turning point for AWS. No longer should AWS be considered a raw cloud platform housing a rich yet uncoordinate portfolio of data and analytics tools. Instead, we are seeing the emergence of a genuine data intelligence platform with SageMaker acting as the primary control plane for governance and access. This falls absolutely in line with work done by AWS competitors Databricks, Snowflake, Microsoft, and others to create a unified yet open platform tuned to the rigors of AI delivery.
Recommendations for Enterprise Buyers
Embrace Apache Iceberg and other open table formats (OTFs), including Databricks Delta Lake and Apache Hudi. Beyond avoiding vendor lock-in and improving operational resilience, each has its own unique values. For example, experience with Apache Spark might find greater performance in using the Delta Lake table format.
Those concerned over potential vendor lock-in related to these announcements should take solace from the fact that SageMaker Unified Studio and Athena, Glue, et al. are not meant to replace other query engines. By building on top of the Apache Iceberg open standard, AWS makes it easier for users to actually bring in other query engines like Apache Spark, Trino, Dremio, Starburst, and even PyIceberg.
AWS’ move to create a unified user experience for data professionals with SageMaker Unified Studio is not just about unified data sources. Because it plugs directly into the company’s AI portfolio (most notably, Amazon Bedrock), this unified IDE is actually meant to provide a life-cycle complete experience for data engineers, data scientists, ML engineers, and IT professionals building AI apps (both predictive and generative in nature).
What to Watch:
- Several key trends and competitive pressures will shape the future of AWS’s role in the sports industry.
- Amazon Q Developer will play a critical role in driving the success or failure of AWS data and analytics tools over the long term as the company begins making the switch from a traditional, imperative approach to instead use natural language to ask questions, generate SQL queries, create pipelines, etc. in a more declarative manner.
- Throughout these announcements, AWS continues to push its somewhat controversial data integration approach, branded Zero-ETL. While not a destination unto itself, the idea of doing away with traditional extraction, transformation, and loading pipelines will appeal to AWS customers, particularly with pre-built connectors handling access to data across several sources (i.e., Salesforce, Zoho, ServiceNow, and SAP).
For more detailed insights from the event, please refer to the following press releases:
- Amazon S3 Tables integration with SageMaker Lakehouse is now generally available
- Amazon S3 Tables add Apache Iceberg REST Catalog APIs
- Amazon S3 Tables add create and query table support in the S3 console
Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.
Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.
Other insights from The Futurum Group:
Amazon Delivers Strong Q4 FY 2024 with Record Operating Income, AWS Growth
Does a Group Focused on Agentic AI at AWS Signal Enterprise Prioritization?
Accelerating GenAI Innovation – Six Five On the Road at AWS re:Invent
Author Information
Brad Shimmin is Vice President and Practice Lead, Data and Analytics at Futurum. He provides strategic direction and market analysis to help organizations maximize their investments in data and analytics. Currently, Brad is focused on helping companies establish an AI-first data strategy.
With over 30 years of experience in enterprise IT and emerging technologies, Brad is a distinguished thought leader specializing in data, analytics, artificial intelligence, and enterprise software development. Consulting with Fortune 100 vendors, Brad specializes in industry thought leadership, worldwide market analysis, client development, and strategic advisory services.
Brad earned his Bachelor of Arts from Utah State University, where he graduated Magna Cum Laude. Brad lives in Longmeadow, MA, with his beautiful wife and far too many LEGO sets.