The News: The New York area suffered an earthquake on Friday, which led the Test Floor Manager for IBM Mainframe PJ Catalano to post on X following the quake. The press picked up on his post—to read the article, click here.
Enhancing Critical Infrastructure Resilience: IBM Mainframe’s Approach
Analyst Take: All hardware platforms are not created equal. As the likes of Google, Microsoft, and Amazon Web Services (AWS) enter the custom hardware business at scale, we are at an inflection point in the industry, where enterprises have more options regarding workload placement than ever before. As enterprises navigate the complex landscape of where to land their workloads, one vector stands out for many enterprise architects: that of server and overall system availability. These choices come into even starker contrast when an earthquake hits. Against this backdrop, it is worth exploring three areas where IBM is deploying unique architectural approaches to deliver the unparalleled system availability for which the mainframe is known.
Earthquake Testing Hardware
IBM’s z16 and LinuxONE offerings are distinguished by their certification to NEBS Zone level 4, the apex of standards ensuring the durability and reliability of telecommunications equipment under severe conditions, including seismic events.
NEBS stands for Network Equipment-Building System, a standard used primarily in the telecommunications industry in the US. The standards are defined and maintained by the Telecommunications Industry Association. Zone 4 testing specifically addresses the equipment’s ability to withstand seismic activity, making it a crucial consideration for installations in earthquake-prone areas. This level of testing is the most demanding within the NEBS criteria, focusing on ensuring that the equipment can continue to operate during and after seismic events. The criteria cover various aspects of performance, including structural integrity, operational continuity, and safety under earthquake conditions. Telecommunications equipment that passes NEBS Zone 4 testing is considered to have met some of the highest standards for durability and reliability under seismic stress. This certification is particularly important for critical infrastructure and services that require uninterrupted operation, such as emergency response systems, data centers, and core network components of telecommunications providers. By adhering to NEBS Zone 4 standards, IBM is demonstrating its commitment to delivering high-quality, dependable products capable of performing in some of the most challenging environments. This reassurance is vital for maintaining the integrity of mission-critical infrastructure, especially in regions with a high risk of earthquakes.
Run Book Automation and Holistic System Availability Thinking
Geographically Dispersed Parallel Sysplex (GDPS) is a comprehensive solution developed by IBM to enhance the resilience and availability of mainframe environments specifically tailored for IBM Z systems. GDPS employs a combination of clustering, server management, storage replication, and automation techniques to ensure that critical applications and data can withstand various types of outages, including those due to disasters, thereby minimizing downtime. This robust framework supports IBM Z’s capability to deliver exceptionally high system availability, reaching 99.999999% (or “eight nines”) availability in environments including DB2 data sharing.
The cornerstone of GDPS’s effectiveness in achieving such unparalleled levels of availability lies in its sophisticated data replication strategies and automated disaster recovery processes. These features enable seamless, near-instantaneous failover to backup systems without significant data loss or operational downtime, even in the event of catastrophic failures. GDPS’s architecture supports multiple active data centers in an “active-active” configuration, allowing for workload balancing across geographically dispersed sites. This setup enhances system availability and contributes to disaster recovery and business continuity by ensuring that an alternative site can immediately take over in case one site goes down. Furthermore, integrating synchronous and asynchronous data replication methods ensures that data integrity is maintained across distances, safeguarding against data loss and enabling rapid system and data recovery. In a DB2 data-sharing environment, GDPS enhances resilience by managing the storage subsystem and remote copy configurations, automating operational tasks, and ensuring a cohesive failure recovery process from a singular point of control. This comprehensive approach to disaster recovery and high availability is pivotal for organizations that rely on IBM Z for their mission-critical operations, providing the necessary infrastructure to maintain continuous operation and compliance with regulatory standards.
Architecting Out Memory Failures
The deployment of Redundant Array of Independent Memory (RAIM) within the mainframe ecosystem is akin to the RAID 10 configuration in disk storage systems, offering a sophisticated mechanism to safeguard against memory module failures through redundancy and advanced error correction techniques. It employs additional memory modules and algorithms to protect against module failure, ensuring continuous operation. RAIM is more robust than parity checking and ECC memory and can correct multiple DRAM device failures and entire memory channel failures. RAIM’s integration into the IBM zEnterprise 196 server and its continued presence in subsequent models, including the z16 and LinuxONE systems, represents a significant leap in memory error correction, emphasizing IBM’s unwavering focus on system reliability and data integrity.
Looking Ahead
IBM mainframe’s architectural philosophy is a testament to an enduring commitment toward engineering excellence and proactive design, aimed at meeting the demands of contemporary enterprises with unwavering reliability and availability. This philosophy is vividly demonstrated through the strategic incorporation of NEBS Zone 4 certification, the unparalleled availability metrics of GDPS, and the advanced memory protection afforded by RAIM. Together, these elements form a robust blueprint for ensuring operational resilience in the face of both natural disasters and technological disruptions.
As IBM forges ahead with the refinement and innovation of its system architectures, it remains essential to navigate the evolving regulatory environment, notably the DORA regulations in the EU. These regulations are designed to enhance digital operational resilience, particularly for financial entities, underscoring the necessity of a robust infrastructure in today’s highly digital and interconnected economy.
The holistic design ethos at the heart of the IBM mainframe, with system availability as its cornerstone, reflects IBM’s sophisticated engineering and foresight. The rigorous NEBS Zone 4 certification of the IBM z16 and LinuxONE systems, alongside the near-perfect system availability achieved through GDPS, exemplifies IBM’s unyielding dedication to ensuring operational continuity and resilience. This focus is further emphasized by the strategic use of RAIM technology, which protects against a wide array of memory failures, ensuring that IBM’s mainframes provide continuous, reliable service for mission-critical operations.
Collectively, these architectural innovations elevate the IBM mainframe to the pinnacle of system availability and reliability, addressing critical concerns around earthquake resilience, disaster recovery automation, and memory failure mitigation. IBM thereby offers enterprises a reliable platform for their most essential applications, surpassing the stringent requirements of the modern, data-driven world and setting new benchmarks for the future of enterprise computing infrastructure.
Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.
Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.
Other Insights from The Futurum Group:
IBM Announces New Quantum Processor and IBM Quantum System Two
IBM Power: Continued Innovation with Refreshed Leadership
IBM Partner Plus Program to Help Partners Grow More Services Revenue
Author Information
Regarded as a luminary at the intersection of technology and business transformation, Steven Dickens is the Vice President and Practice Leader for Hybrid Cloud, Infrastructure, and Operations at The Futurum Group. With a distinguished track record as a Forbes contributor and a ranking among the Top 10 Analysts by ARInsights, Steven's unique vantage point enables him to chart the nexus between emergent technologies and disruptive innovation, offering unparalleled insights for global enterprises.
Steven's expertise spans a broad spectrum of technologies that drive modern enterprises. Notable among these are open source, hybrid cloud, mission-critical infrastructure, cryptocurrencies, blockchain, and FinTech innovation. His work is foundational in aligning the strategic imperatives of C-suite executives with the practical needs of end users and technology practitioners, serving as a catalyst for optimizing the return on technology investments.
Over the years, Steven has been an integral part of industry behemoths including Broadcom, Hewlett Packard Enterprise (HPE), and IBM. His exceptional ability to pioneer multi-hundred-million-dollar products and to lead global sales teams with revenues in the same echelon has consistently demonstrated his capability for high-impact leadership.
Steven serves as a thought leader in various technology consortiums. He was a founding board member and former Chairperson of the Open Mainframe Project, under the aegis of the Linux Foundation. His role as a Board Advisor continues to shape the advocacy for open source implementations of mainframe technologies.