Site Reliability Engineer

Job type:
Start date:
6 months +
Scot Lewis Associates Ltd
Published at:
flag_no United Kingdom
Project ID:

This project has been archived and is not accepting more applications.
Browse open projects on our job board.

SRE, DevOps, Azure, AW, Python, Ci/CD

Working with a leading security, DevOps and DevSecOps consultancy to secure the services of a site reliability engineer for a global energy client based in London. The DevOps & SRE function at the forefront of the clients Digital & Cloud Transformation. The team uses modern methodologies to create innovative solutions for many teams across the clients development and engineering estate. The team follows flexible working practices and embraces a customer centric design thinking' delivery processes. The team aims to enable the client to innovate without barriers using a mix of emergent technologies.

Working within the DevOps & SRE team the DevOps & SRE Specialist role will be focusing on delivering SRE capabilities to enable customers to shorten release cycles, improve reliability, and stay ahead of the competition while ensuring security and compliance.

You will be a highly capable technical resource working within agile teams to identify, measure, define and automate solutions for customer teams. Placed within these customer teams you will work with them to identify toil and reduce it through automation and efficient process.

The successful candidate will be proactively diagnosing problems with the ability to code a more permanent fix and/or engineering/rewriting failed or broken processes. Must be able to communicate those lessons learnt to prevent those problems from reoccurring.


The candidate will:

  • Continuous focus on quality being key; candidate will lead or be part of a team that performs Root Cause Analysis (RCA) where needed
  • Have experience in being an DevOps & SRE Engineer, Analyst or Specialist.
  • Set and/or advise on SLO's, SLIs or OKRs
  • Experience in Azure/AWS
  • At least 3 years of experience in Python
  • Good understanding of CICD
  • Gather and analyse metrics & telemetry from both operating systems and applications to assist in performance tuning and fault finding
  • Writing reusable, testable, and efficient code
  • Performing an in-depth analysis of the possible risks and countermeasures for them
  • Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve
  • Teaching the DevOps teams to follow SRE guidelines and procedures to minimize the number of errors and incidents
  • Partner with development teams to improve services through rigorous testing and release procedures
  • Direct involvement with clients to gather requirements and deliver top quality service
  • Improve reliability, quality, and time-to-market of our internal products and tooling. Must view problems as an opportunity to improve