Site Reliability Engineer

London

‐ Onsite

This project has been archived and is not accepting more applications.
Browse open projects on our job board.

Keywords

Hardware Programming Languages Python Automation Stack Reliability Engineer Security aws Application Cloud

Description

JOB DESCRIPTION

Job Title: Site Reliablity Engineer

Location: London

Department/Practice:
Job Purpose and primary objectives:

As a Site Reliability Engineer you will help build systems and tools for internal use that enable you and your fellow AWS Engineers to operate safely at high speed and wide scale. Your will have the unique opportunity participate and lead architecture workshops, working directly with technical teams and partners.

Key responsibilities (please specify if the position is an individual one or part of a team):
Role Accountabilities

Operate, monitor, and maintain high availability of services running in a cloud environment
Continue to automate, scale, and manage our cloud infrastructure
Work with team to establish service level objectives and monitor to ensure the objectives are met
Continually improve cloud operations automation and tooling to monitor and maintain enterprise cloud-based applications
Troubleshoot infrastructure and application issues, and work with the development team to resolve issues
Identify and improve on possible points of failure in the infrastructure/applications
Execute automation for known cloud-operations tasks, and create new automation for new situations or issues you encounter; automate everything
Collaborate with team to maintain, monitor, and improve cloud-based-applications
Facilitate root cause analysis meetings in the event of a production-systems incident so that the team can learn from mistakes and improve our systems and run books
Participate in stress, security, and performance testing
Be Vigilant about security and adhere to best practices to secure our cloud infrastructure and Real Time platform
Design, write and deliver software and automation to dramatically improve the availability, scalability, latency, and efficiency of services
Plan and perform security patches on our applications and underlying infrastructure
Help secure our data and access policies to reduce risk
Troubleshoot application-related support requests to locate the problem area, resolve those which are within your skill set, and forward the others to the appropriate staff
Perform application-related operations and management tasks to provision new customers, address operational requests, and keep the application running efficiently and effectively

Key Skills/Knowledge:

Capabilities, Competencies And Qualifications

Deep understanding of AWS cloud services and how to leverage them for compute, storage, and managed services including, but not limited to databases, managed Kubernetes, and Python/Django application services.
Experienced with modern DevOps engineering practices and comfortable with diverse technical problem sets, across the entire technology stack, including the virtualized hardware
Understanding of the Linux operating systems
Understand of infrastructure as code practices using technologies like Terraform, etc.
Experience in using tools such as Ansible, Puppet, Chef, and leveraging those tools for configuration automation
Proficient in Scripting and developing automation in Python and bash, or similar programming languages
Understand modern approaches to software security - and know what needs to be done to secure software systems and cloud- based infrastructure
Able to effectively trouble-shoot issues across the entire stack including the operating system and the underlying (virtual) hardware

Experience required: 6 to 8 years of experience
Duration of the Assignment: 12 months

Start date: ASAP
Duration: 6.0 months
From: SidTech LTD
Published at: 04.05.2021
Project ID:: 2105043
Contract type: Freelance

To apply to this project you must log in.

Site Reliability Engineer

Keywords

Description

Report project

Recommend this project

Application limit reached

Welcome to freelancermap!