Blog Posts Business Management

Keep lights on – The SRE way!

Blog: Capgemini CTO Blog

In the recent past we have seen a spurt in the number of clients asking for SRE services – in some cases a very conscious and explicit ask for SRE services, while in others it was more of an implicit ask for similar outcomes without really calling it out as SRE. This article, co-authored with Manoj Tharwani, aims at putting across our point of view on what SRE is. While this may be just a teaser, feel free to reach out to us in case you are looking for more details. At Capgemini, we have invested in building this capability that is enabling us to implement this concept at several of our clients.

 

Introduction
In simple terms, reliability is defined as the probability of success. However, in the application world reliability is talked about in terms of availability and measured in context of the frequency of failures. Reliability is important as it can help build or lose confidence in a product and an organization’s brand reputation.

Especially in the current IT systems landscape that comprises of several moving parts and a multi-cloud-based setup that poses even more complexity, a traditional approach based on the philosophy of “prevent system from failing” doesn’t quite work. With that many moving parts, there is bound to be a disruption somewhere, resulting in failures. The philosophy hence needs to be more like “expect failures to happen; build systems that are resilient to these failures”. That is where the concept of SRE or Site Reliability Engineering (a.k.a Service Reliability Engineering) kicks in.
SRE is all about applying a software engineering mindset to system administration. As a software engineer, you look at the business requirements and develop the system. Likewise, a Service Reliability Engineer needs to look at how each disruption can affect the business requirement and then find a solution for it accordingly.
Agile-focused, product-driven approach and IT – OT integration have been key drivers for the growing demand for SRE today.

Originating at Google as early as 2003, the concept started with a team tasked with the responsibility of maintaining Google’s website “as available as possible”. They did that by simplyapplying the software engineering concepts to system administration topics – which later formed the basic tenets of SRE, as described in the online book published by Google.
Like any other enterprise architecture framework, one does not need to “mimic” the same methods as done by Google. While you need to assess these practices in the context of your enterprise, there are certain basic tenets of SRE that must be followed:

DevOps vs SRE
One obvious question that often gets raised, is about the crossover between SRE and DevOps, and rightly so. There is a significant overlap between the 2 concepts. Both tend to address the silos between “dev” and “ops”. Also, in terms of practices followed, there are a lot of parallels. However, the approach and objectives are quite different in both cases.

We have seen several cases in the recent past, where in spite of having complete DevOps implementation, companies continue to bleed millions of dollars when their core systems go down – SRE will help plug that gap!

We believe SRE is not different, but in fact see it as a natural evolution of the DevOps maturity model, as depicted in the graphic below:

 

Another aspect worth mentioning is around the scope of SRE. While DevOps focusses on bridging the gap between “development” and “operations” teams, SRE extends that further by bringing in the focus on “architecture” as well. This ensures that the system resiliency is built into the system by-design so as to quickly react and recover from unexpected disruptions.

 

What would it really take to be an SRE?
SREs are expected to cover the entire spectrum of IT systems – it combines deep awareness of technical infrastructure, operating systems and computer networking with an attention to higher-level service level objectives (SLOs) to maintain a focus on business-relevant activities.
They focus on solving problems by building software components or features which prohibit the problem to re-occur in future (if not, then at-least making it less painful). It is thus often recommended that the SREs come from a software engineering background with some awareness of operations, rather than the other way around.
Following technical skills are required from this role:

Additionally, non-technical skills such as problem-solving, teamwork, working under pressure, and strong written and verbal communications are key to their success.

Conclusion
Every enterprise – large or small, at any given point in time has multiple applications under development and code deployments (and re-deployments). A lot of these enterprises have a mix of legacy and modern applications, supported by separate Development and Operations team. While DevOps ensures a smooth, automated, pipeline-driven approach to these deployments, there needs a dedicated focus on ensuring the availability of end-to-end business functions. This is what exactly SRE’s brings to the forefront and garnering a lot of interest in the industry.

Enterprises in past have been working on reliability in some shape and form. Hence first step to the SRE journey should consist of looking at the applications holistically from an end customers perspective, gather what is existing and determine the gaps in service reliability. This would give any organization a view of how they are placed when it comes to reliability, what needs to be addressed immediately and help plan.
At Capgemini, we help enterprises with “Reliability Assessment” by giving them a view of how they are placed when it comes to reliability, aspects which needs to be addressed immediately and how the steady state of service should look like.
Not just that, Capgemini also supports enterprises addressing critical issues impacting reliability through the “SRE Jumpstart” offering. We ensure improved availability and reduced outages across all applications addressing performance bottlenecks. We define necessary processes and build tools required for service reliability, focus on automation and bring applications to a steady state of service maintainability.

For more details on our SRE service offerings, please feel free to reach out to me or Clifton Menezes and we will be happy to collaborate with you in your journey towards keeping the lights on, the SRE way!

Leave a Comment

Get the BPI Web Feed

Using the HTML code below, you can display this Business Process Incubator page content with the current filter and sorting inside your web site for FREE.

Copy/Paste this code in your website html code:

<iframe src="https://www.businessprocessincubator.com/content/keep-lights-on-the-sre-way-2/?feed=html" frameborder="0" scrolling="auto" width="100%" height="700">

Customizing your BPI Web Feed

You can click on the Get the BPI Web Feed link on any of our page to create the best possible feed for your site. Here are a few tips to customize your BPI Web Feed.

Customizing the Content Filter
On any page, you can add filter criteria using the MORE FILTERS interface:

Customizing the Content Filter

Customizing the Content Sorting
Clicking on the sorting options will also change the way your BPI Web Feed will be ordered on your site:

Get the BPI Web Feed

Some integration examples

BPMN.org

XPDL.org

×