DevOps to SRE: Making the Desired Culture a Reality
Over the past year at FullContact, the DevOps team has transitioned into an SRE team, and we couldn’t be happier. Our DevOps team functioned like many other DevOps teams and matched well with a sentiment I hear from fellow attendees at every DevOps-oriented conference I’ve attended, from DevOps Days to KubeCon. “DevOps is just a rebranded SysAdmin/Ops person….”
We primarily handled infrastructure and tooling like CI/CD to make the developers’ lives more efficient and productive. But we still operated apart from our teams, disconnected and sometimes clashing with work other teams had planned.
In my mind, DevOps is a culture, not a role. Sure, it centers around some tooling, and that tooling needs to be maintained by someone, usually with a bit more Ops skill sets. But with DevOps comes a of empowered engineers paving the way to rapid releases, automated deployments, efficiently configurable infrastructure, and more.
For us, the SRE role is the embodiment of DevOps culture applied. When we set out to move towards having SREs, we took a moment to understand what we wanted from someone in the SRE role, what skills they should have or work towards acquiring, and how they would work with the teams they embedded into.
What do we want from an SRE?
To determine what we want from an SRE, we took a step back to understand what their impact would look like on our infrastructure. We want our infrastructure to constantly be moving towards a highly automatic and self-healing system. Resistant to failure, yet easy to maintain from an engineer’s point of view. This would require our SREs to be collaborative. They would work with the team to design and implement our systems, driven by data, to progress towards our desired state of a more resilient, highly automatic, and self-healing system.
We also want our infrastructure to be visible. Adding and optimizing our metrics and alerting system to provide the correct information at the right time enables us to react faster and make better software development decisions. This would require our SREs to give our services and platform a voice through observability.
In addition to observability and resilience, we also want our infrastructure to be ordered and structured. As an SRE for FullContact, we would need our engineers to provide recommendations to their teams on designing their systems with best practices and in a consistent, repeatable way.
What should an SRE have?
Now that we understand what the outputs of our SREs should look like, we can outline what expected abilities they should have or attain to be successful. The list we came up with is as follows:
- computer science fundamentals (at least data structures, algorithms, and system design)
- ability to write in a variety of languages
- capable of debugging, benchmarking, and adding observability to any system in our stack
- a deep understanding of our infrastructure
Some of these are theoretical in nature, while others are where theory meets the real world. Attaining or having these four abilities will enable our SREs to understand performance tradeoffs, write in or even suggest software be written in languages outside of our default languages based on the JVM, assist in debugging in real-time services in those languages, and be able to make adjustments and recommendations with the big picture in mind.
How will an SRE work?
Our SREs end up embedded into the teams they work with. Functionally, this allows them to fold into that team’s flow and feature work cadence. Additionally, the SREs would champion the DevOps culture mindset, empowering their co-workers along the way. We also still meet as an SRE team to handle global infrastructure needs and keep each other informed about possible changes to improve the way our code is written, tested, deployed, and run.
How is the SRE process going so far?
Making changes like this can be messy. It can take a few iterations. This transformation requires just as much of a mindset shift as it does a role shift. But we relentlessly deliver and improve at FullContact. So far, our teams enjoy having an SRE on the team, and not just because they have an Ops-oriented person to call on. Our engineers strive to improve at their craft, and these days, that means understanding how to create an IAM role, update terraform, or adjust a pipeline themselves.
We went into this understanding that it can be challenging to acquire all of the skills for an SRE, especially since so many companies still silo their workers heavily via job duties while attempting to call it DevOps. Knowing this, we practice empathy and strive to empower our SRE’s through ongoing, meaningful investment. We started by spending a solid six months coaching, mentoring, and assisting in the integrations of our SREs with their teams. Once we integrated into the teams, we set up feedback loops to check for impacts and milestones around the outcomes outlined above. Even now, we are enabling our entire SRE team to study for and take the CKA exam to add even more skills to their arsenal. And we won’t stop there as we find new ways to grow ourselves and our teammates.
We are hiring!
Want to work on cutting-edge technology around Person-Centric Identity? FullContact helps brands better understand their customers’ journey to offer a superior experience. Our mission is to do so while honoring one’s data privacy across any channel.
Our technology has a real-time, full spectrum identity graph that spans from the physical world to the digital world. We are building out new capabilities within the graph, expanding linkage datasets, and building integrations to the platforms where our customers live. If this piques your interest and you want to work with a very technically capable team, let me know!