Search

Intersection of DevOps, Platform Engineering, and SREs | DevOps Dialogues: Insights & Innovations

Intersection of DevOps, Platform Engineering, and SREs | DevOps Dialogues: Insights & Innovations

On this episode of DevOps Dialogues: Insights & Innovations, I am joined by CTO Advisor and Signal65 Analyst, Alastair Cooke, for a discussion on the Intersection of DevOps, Platform Engineering, and SREs.

Our conversations cover:

  • DevOps for Continuous Evolution
  • Platform Engineering Orchestrating Infrastructure for Innovation
  • Site Reliability Engineers Are Bridging the Gap
  • The Importance of Each Practice Area

These topics reflect ongoing discussions, challenges, and innovations within the DevOps community.

Watch the video below, and be sure to subscribe to our YouTube channel, so you never miss an episode.

Listen to the audio here:

Or grab the audio on your favorite audio platform below:

Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this webcast. The author does not hold any equity positions with any company mentioned in this webcast.

Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.

Transcript:

Paul Nashawaty: Hello and welcome to another edition of DevOps Dialogues: Insights and Innovation. My name is Paul Nashawaty and I’m the practice lead for the App Dev Practice at The Futurum Group. And today I’m joined by CTO Advisor and Signal65 Advisor, Alastair Cooke, for a discussion on the intersection of DevOps platform engineering and SREs. Alastair, great to have you here today.

Alastair Cooke: Hey Paul, it’s a pleasure to be here and it’s great to be joining you on the show.

Paul Nashawaty: Yeah, it’s really great to have you here and it’s great Tech Field Day was also a lot of fun too. So why don’t we tell the audience, start by telling the audience what you do and why you’re on the podcast.

Alastair Cooke: So I always struggle to describe what it is that I do, but the podcast angle on this as DevOps Dialogues, my background is an on-premises infrastructure engineer, but I spent about eight years teaching AWS’s public scheduled training courses starting out in the architecture stuff that corresponds to what we do on-premises, but very rapidly I realized that AWS is a developer enablement platform.

So I was excited to then start teaching the developing on AWS course and then the DevOps engineering on AWS course, and both up scaling for teaching those courses, but also talking to just so many people who are using AWS as a platform for DevOps or starting to use it. I learned a huge amount about how people are using AWS in particular for DevOps and some of the disparities between the way AWS themselves or Amazon uses DevOps and the way real customers use DevOps.

Paul Nashawaty: Well, it’s great to have you on the show. I mean, this is definitely, and the topic we’re talking about today is one that I think you’re very qualified to talk through based on your background and where we’re going. There’s a lot of changes. We hear organizations talk about DevOps, we talk about organizations talk about platform engineering, and we see SREs coming up quite a bit. This is an area that I think is a lot of confusion at. What are your thoughts there?

Alastair Cooke: Yeah. I think there are multiple views of what the future is and as usual that the future is here, it’s just not evenly distributed. And so there are places like as AWS or Amazon themselves found in order to get the rate of change they needed and the agility that they needed. Waterfall development just didn’t work. And so they went down the path of DevOps and in some ways, AWS or Amazon was a foundational to the DevOps practices of that idea of small teams with full responsibility for the entire lifecycle of the application.

But if you’re coming from maybe an enterprise software development environment where there’s a lot of regulation and control and where organizational structures are very strict, it’s really hard to actually implement that. And then of course as you start taking a large complex application and trying to break it up into a whole bunch of microservices, then you find that the building the individual microservice is easy, but building the collection of microservices that together make your application becomes a much greater challenge. And so although you’re apparently solving one problem, you’re creating another as we so often do with IT.

And then of course there’s the thought that a lot of developers don’t actually want to know about all of that underpinning but the infrastructure side, that was my bread and butter before I started moving across to the analyst side, building that infrastructure is not something that most developers want to know about or want to care about, even to the level that it’s delivered by a lot of the AWS infrastructure services. So I think that the key thing is there’s just so much variation and requirements and starting points for organizations. There isn’t a single solution that suits all of them.

Paul Nashawaty: That makes sense, Alastair. When I think about what you’re talking about in our research, we see a lot of growth around acceleration of code development and pushing applications out the door. We see actually in our most recent research, we see that 24% of respondents indicate that they want to push code out on an hourly basis, yet only 8% are able to do so. But before we talk about that, let’s think about it in the context of when we look at delivering code and delivering these cadence of applications out the door and such, there are many different maturity levels when we look at how organizations are evolving.

So when we look at the DevOps and continuous evolution, I see from my conversations that there are organizations that have IT in lines of business, IT DevOps in lines of business, IT DevOps SREs in lines of business and IT platform, engineering lines of business. So when you take this all into consideration, it really does have a lot of just different ways organizations are driving. What are your thoughts around that?

Alastair Cooke: Well, I completely agree that we do see there’s different permutations of how the organization will change. Fundamentally, the hard part of DevOps is the organizational change. Moving away from that separation of having the business team and the IT team and having this idea that there is a single team that’s into and responsible for each component and that it’s a small enough team that they can actually coordinate together. People don’t coordinate together in very large groups. If you’ve ever traveled with a group of 20 people, you know that’s very hard. Whereas traveling by yourself or with a small group of people is much easier to coordinate and so small groups move faster.

But organizations don’t like small groups where typically in our western culture have built this top-down hierarchical view. Each manager has between seven and nine or is it five and 11 direct reports. And there’s these structures they’re often siloed by functional areas and changing from that to this microservices architecture with the team that are responsible for their individual microservices and across skill, it’s huge organizational change. That’s always been the thing that’s hardest about adopting DevOps. But I want to circle back to one of the things that you said, which was about the rate of deployment of changes. And this is a terrible metric.

And I specifically recall a client on my last DevOps engineering course talking about that they had been incented to have a high rate of change going out and they often had trivial changes going out and trivial breaking changes because they had no incentive on the quality of those changes coming out. The metric that I’m really keen on is latency. So the whole point of getting features out into production is to fulfill some business need. And so that time from the expression of the business need to the feature being out in production, that’s what I call latency. And that’s the thing that I think is the valuable measure. It doesn’t matter how often you’re putting out changes that make no difference to the business, putting out changes that are impactful to business is the only thing that is our job.

Paul Nashawaty: Yeah, I couldn’t agree more. And actually when we look at our trending data and we look at the research that we’ve done, one of the things I often reflect on is the data that shows the CI/CD pipeline and getting that code out the door. And one of the things I found that was astonishing to me was in 2022, only 29% of organizations were doing continuous testing, only 29%. But then in 2023 we saw, to your point, the goal was push to code out the door, push big green button, get it out the door, don’t care about the quality, get it out the door. But what we saw in 2023 was this big uptick from 29% to 66%.

So there was an impact because putting the onus on your client to test your software is a bad thing to do and that’s the fastest way to lose a client. And so I talked to organizations about this and I said, “So why do you do that? What’s the rationale behind it?” And they’re like, “Well, we do sprint reviews every two weeks, so if we see a problem, we can rapidly be agile and push the code out the door.” It’s like, “Yeah, but,” I use the analogy of I fly somewhere and then I pull up my phone and I try to get a ride service company. If I pull up one app and I go, “Okay, well,” and it crashes on my phone, where do I go? I might try it again. And if I do it again, if it crashes a second time, I go to another ride services app.

So that’s not a good model of letting your clients do the testing for you. So I agree with you, but I want to pivot a little bit in the discussion here, Alastair. I think that you touched on a number of areas and the teams and the shifting left. You typically hear this with security a lot, but it’s also in the CI/CD pipeline. It’s also in the moving closer to the engineering side platform engineering orchestration around infrastructure innovation tends to be another topic that comes up quite frequently. Can you speak a little bit about that?

Alastair Cooke: Yeah. It fundamentally comes back to the idea that AWS’s success from my own, again, my view of these things is covered from my experience. AWS’s success was not from having DevOps and having these small teams that were willing to end responsible. That’s where they started. And they found pretty quickly that by itself that didn’t speed up building of applications because every microservices team had to choose the product and choose how they were going to support it, choose what database to use and how to protect its data and those kinds of things. And AWS had a second discussion of how they were going to transform and realize they needed reusable services that you could assemble your application out of.

And that to me is the bit where AWS is a developer enablement platform. I see the platform engineering practice as bringing that idea of reusable software-defined services that are easy for developers to consume so that the developer building a microservice doesn’t have to worry about all of the underpinnings or the minutia of the component and can spend more of their time focusing on writing the business logic, the thing that’s unique to be a business. So I see the platform engineering as being part of that sort of maturity along the way that the developer who was writing business differentiating code shouldn’t have to also write their own message queue system, for example, or choose message queue system.

Paul Nashawaty: Yeah, that makes sense. I mean, I get that, but I mean, one of the things that comes up is I’m at a conference this week and we’re having these conversations about how every person in the IT organization is a developer. That was one of the comments that was made. Everybody’s a developer. And I kind of was thinking about it and I go, “Well, let me think that through.” So I’m looking around the conference, pretty sizable conference, and I look around, I go, “I cannot find one person that identifies themselves as a system admin or a network admin or a storage admin. They’re all platform engineers. They’re all moving up. They want to be that platform.” And then it was like, “Okay.”

Then I took that one step further and I said, “Well, if you take infrastructure and you start implementing infrastructure as code, then you start moving towards that developer centric perspective.” And I was saying earlier on the podcast here, I was saying that 24% of organizations desire to release code on an hourly basis, yet only 8% can do so. Well, the 8% that can do so, the commonality between is they do agile software development, they use DevOps methodology, they use infrastructure as code in order to code out the door faster. So if there’s infrastructure changes, allowing and enabling your platform engineering team to make those changes dynamically is really about the speed of business. It’s right. It’s really about how to get the business to be accelerating faster.

Alastair Cooke: Absolutely. And fundamental to the idea of DevOps infrastructure as code, that you can build that test environment using the same automation you’re going to use to build the production environment because that’s the only way to get high fidelity testing. If you’re testing with a three-month-old clone of production, you’re not going to get a valid test for when you deploy into production. So absolutely the delivering of these services, the consumable pieces that the platform engineering team are delivering, absolutely that’s a development process. There shouldn’t be trouble tickets required in order to get a new database instance deployed. This is the sort of thing that gets in the way of deploying new features out into production.
So it should be internal service catalog.

We heard about these 10 years ago and the idea of delivering developer enablement services. So one of the things that kind of bugs me about a lot of the vendors is they say, “We’re enabling cloud-native development by giving you a Kubernetes platform.” Okay, but Kubernetes is an infrastructure service. To do cloud-native development on premises, I need application services, I need databases, I need storage, I need queues, I need messaging, I need load balancing. All of these things need to be delivered to me as a service, not just a place that I can put my code because docker containers are a software distribution mechanism fundamentally.

Paul Nashawaty: Yeah, yeah, that makes sense. And I want to comment on the days of submitting a help desk ticket and waiting three days for changes to occur are long since gone. I mean, if your organization is still in doing that, you’re not competitively advantaged, you’re at a disadvantage actually.

Alastair Cooke: The future is not evenly distributed. There are still people in that pane.

Paul Nashawaty: Yeah, no, and I can appreciate that and I get it. And that goes that whole maturity too, because you have some applications that may be in more fast dynamic and push agile development processes, and you might have some heritage systems that can accommodate and deal with that three-day change cycle. But really if you’re trying to be dynamic and fluid, those new systems of engagement really have to come out the door very quickly. We talked about a lot of this in the recent blog that we published, the intersection of DevOps and platform engineering in the SREs. It’s on our website.

You can learn more about it in the walk from this dialogue here that we’re having. But I do want to talk about, because I mentioned SREs and I don’t want to leave that kind just hanging out there. So when I talk to organizations, because SREs have this kind of more holistic view across multiple instantiations of the environments and such, do SREs bridge the gaps? Do they bridge the gap to help bring things together or are they just a DevOps on steroids type of approach?

Alastair Cooke: Fundamentally, the idea of an SRE came from the thought that if you’re a hyper scaler operating a large web estate, you historically believed that the developers were the gods and that the people who ran production were a bunch of minions who just had to do as they were told. The SRE movement, the start of SRE was a recognition that production is actually the thing that’s in front of our customers that generates our revenue. So it’s important that we look after and enable those people. And I see SRE DevOps platform engineering as being all very tightly interconnected and different ways of implementing the same objective.

Definitely having that focus on how do we change our production environment? How can we accommodate change and failure in our production environment? The AWS story around SRE is more that it should be part of your fundamental architectural design is to continuously improve how your application architecture works. And to a certain extent, that loops back to my earlier comment that as you go into the microservices architecture, coordinating between the microservices becomes the hardest and most fragile part. It’s that same focus. So definitely having that system-level focus that any one part being broken shouldn’t cause everything to fail, but you should understand the consequences and then you should be mitigating those consequences.

A whole bunch of architecture and application design practices that are around having circuit breakers for overloads, having load balances, having some sort of stateless compute layer with a much more carefully protected persistence layers. All of these things are interrelated and you can achieve the same business outcome using a variety of these different approaches. But I think there is a muddled message about what’s best, because we definitely saw SRE come up first and then DevOps and more recently platform engineering. And the truth will be all of these things are useful and around in large organizations for a long time to come.

Paul Nashawaty: Well, I think you touch on something that’s kind of important to note to the audience here. When we think about, you’ve mentioned it several times already about microservices, containerization and Kubernetes and orchestration. One thing I do see in our research is 88% of net new cloud-native applications are using microservices as part of that deployment because the elasticity, right? And the ability to have flexibility within your environment. So having that skillset across your DevOps and SREs and platform engineering teams is incredibly important, especially when you start looking at modernization and taking those heritage applications and modernize them into the cloud-native space.

I mean, some of those applications will not be refactored, we know that. Some of them are going to be encapsulated in Cloud ready states, maybe living in the Cloud and left alone, but you’re going to build new net new systems of engagement that will be built on cloud-native microservices architectures. And that important skillset is something that I think the SREs and platform engineering and DevOps teams really need to understand. And not to mention, we can have a whole nother session, Alastair, talking about WebAssembly and Wasm, what that means because that’s the future state of maybe that’s where this all goes and kind of leapfrog over microservices and go into WebAssembly instead.

Alastair Cooke: Yeah. And then well also placement of WebAssembly within microservices. And of course one of the big enablers for microservices and elasticity is serverless platforms.

Paul Nashawaty: Exactly.

Alastair Cooke: It’d be really interesting to see development of the serverless platforms that can be run on premises as a hybrid environment because I don’t see in everything in the Cloud future, for the majority of particularly large organizations, there’s always something that can’t move. And it is often those heritage applications, I like that term far better than legacy, but it means the same thing. These applications that are delivering a lot of production business value but aren’t, as you say, the systems of engagement. This is typically the places that we’re going to want to see the flexibility, the scalability, the elasticity that we get out of full DevOps implementations across public Cloud platforms.

Paul Nashawaty: Yeah. Alastair, I was called out once when I started calling Prospect or Client’s environment. I said, “The legacy applications of like,’ ‘It’s not legacy.'” I said, “Okay, well the heritage environment,’ ‘That’s better.'” So it’s heritage. So that kind of has staying power now. I use heritage as the way to kind of describe that environment, but you’re right, I mean it’s basically whatever you’re trying to do and move forward with. But I will kind of touch base on something that you did talk about. A lot of times when people think about modernization and cloud-native environments, they think of the Cloud, they think of moving to the Cloud. And what we find is cloud-native in research, we find that Cloud-native development, only 11% of organizations are refactoring and building Cloud-native applications on-prem today.

However, we ask where they’re going in two years and we see that over 30% of respondents saying that they’re repatriating back from the public and private clouds to an on-prem instantiation. So on-prem doesn’t necessarily mean that you’re going to decouple from the cloud-native environment. It just means that your on-prem environment will be cloud-like, and you’ll have the ability to take advantage of all that infrastructure that the ability. And also to kind of piggyback on that, we also see that application portability is critical. Actually 20% of organizations state that it’s critical. And 67% say it’s very important.

So application portability from your core to your edge to your Cloud, it’s a very key kind of approach. I’m not saying that most organizations will be moving applications back and forth like that, but what they will do is they’ll have the ability to have the flexibility as needed for those burst capacity or whatever they may need to do. So we’re wrapping up towards the end here, Alastair. And I want to just touch, we talked about DevOps, talked about platform engineering and SREs. Why don’t we talk a little bit about the importance of each practice area? Why is it different and why are there different terms?

Alastair Cooke: I think the different terms have been used to describe things along the evolution. The evolution for DevOps from SRE doesn’t appear to me. But the evolution of DevOps to sites to platform engineering does. The idea that developers don’t want to care about the underlying infrastructure. There is definitely a development practice that’s around let’s write the business logic, the business process that the bit, that is really the whole purpose of what we’re writing code for that sits in DevOps but doesn’t sit in the platform engineering space. And then there’s a whole piece of having to write the things that aren’t business differentiating, but serve the business differentiating code.

And this is the place where we like serverless platforms because there’s so much less to write. And the software defined software development services, everything is a service methodologies. So if you are on premises, you’re going to have to build those things. And this is where the platform engineering team comes in and is going to be responsible for building those services that deliver to the people who then build the business application. Both sets of developers, that platform engineering team is writing the code that delivers the service that is then consumed by the developers who are writing the business logic.

It’d be interesting to see just how organizations do implement and do implement all three, whether there’s a huge SRE, whether there’s a large platform engineering practice actually sitting on top of clouds where the Cloud maybe doesn’t deliver the service that you need, or the Cloud service doesn’t fit your business requirements. So just because you’re running on public cloud doesn’t mean that everything is going to be consuming as a service and you’re only ever writing business logic code. So I’m not sure that I see them as necessarily being competing areas of practice so much as a stable of skill sets that maybe are going to be differentiated because people who have a particular skill set will fall into one of these buckets.

Paul Nashawaty: That makes a lot of sense. And it really also depends on the maturity of the organization and how they’re structured and such. So I guess your mileage may vary, right? And that’s kind of the takeaway. Well, Alastair, we’re coming up to the end. Is there anything you would like to leave the audience with as final thoughts?

Alastair Cooke: I think fundamentally we’ve all got to come back to what are we doing to improve the business? How is my day-to-day activity, making my employer a better organization? And that’s where we need to return to as we’re looking at these practices. How is this going to improve the business? How are we going to engage better with our customers? How are we going to deliver more sales as a result of these actions we’re taking? Keep that bigger view. And again, that’s part of that whole shift left is show more of what’s going on for real customers, for real business, to the people who can actually impact changes on it.

Paul Nashawaty: Makes a lot of sense. And Alastair, I’d like to thank you for your time, your insights, and your perspective. It really is valuable. I hope the audience really gets a lot out of it. I also want to thank the audience for participating in attending this session today. It’s really exciting. A lot of research here, a lot of data points and a lot more information can be found at futurumgroup.com. Thank you for your time today and have a great day.

Other Insights from The Futurum Group:

Application Development and Modernization

The Evolving Role of Developers in the AI Revolution

The Intersection of DevOps, Platform Engineering, and SREs

Author Information

At The Futurum Group, Paul Nashawaty, Practice Leader and Lead Principal Analyst, specializes in application modernization across build, release and operations. With a wealth of expertise in digital transformation initiatives spanning front-end and back-end systems, he also possesses comprehensive knowledge of the underlying infrastructure ecosystem crucial for supporting modernization endeavors. With over 25 years of experience, Paul has a proven track record in implementing effective go-to-market strategies, including the identification of new market channels, the growth and cultivation of partner ecosystems, and the successful execution of strategic plans resulting in positive business outcomes for his clients.

Alastair has made a twenty-year career out of helping people understand complex IT infrastructure and how to build solutions that fulfil business needs. Much of his career has included teaching official training courses for vendors, including HPE, VMware, and AWS. Alastair has written hundreds of analyst articles and papers exploring products and topics around on-premises infrastructure and virtualization and getting the most out of public cloud and hybrid infrastructure. Alastair has also been involved in community-driven, practitioner-led education through the vBrownBag podcast and the vBrownBag TechTalks.

SHARE:

Latest Insights:

Nivas Iyer, Sr. Principal Product Manager at Dell Technologies, joins Paul Nashawaty to discuss the transition from VMs to Kubernetes and the strategies to overcome emerging data storage challenges in modern IT infrastructures.
Shimon Ben David, CTO at WEKA, joins Dave Nicholson and Alastair Cooke to share his insights on how WEKA's innovative solutions, particularly the WEKApod Data Platform Appliance, are revolutionizing storage for AI workloads, setting a new benchmark for performance and efficiency.
The Futurum Group team assesses how the global impact of the recent CrowdStrike IT outage has underscored the critical dependency of various sectors on cybersecurity services, and how this incident highlights the vulnerabilities in digital infrastructure and emphasizes the necessity for robust cybersecurity measures and resilient deployment processes to prevent widespread disruptions in the future.
On this episode of The Six Five Webcast, hosts Patrick Moorhead and Daniel Newman discuss CrowdStrike Global meltdown, Meta won't do GAI in EU or Brazil, HP Imagine AI 2024, TSMC Q2FY24 earnings, AMD Zen 5 Tech Day, Apple using YouTube to train its models, and NVIDIA announces Mistral NeMo 12B NIM.