Site Reliability Engineer

Eindhoven  ‐ Onsite

Keywords

Reliability Engineering Automation Continuous Integration Distributed Systems Amazon Web Services Teaching Unix Clojure Cloud Computing Computer Programming Databases Data Centers Debugging Linux Elasticsearch Incident Response Engineering Entrepreneurship Scalability Python (Programming Language) Medical Surveillance Software Architecture Redis Process Automation Prometheus Software Engineering System Programming Grafana Kubernetes Low Latency Apache Kafka Kibana Terraform

Description

About this role:

The site reliability engineer has many responsibilities, including helping the team to design a platform that works across multiple data centers (reliably with low latency); help the team to design and implement software that covers most of the capabilities of software architecture; design and run tests to verify these capabilities, support team in recovering quickly from outages, implement automation tools for CI/CD pipelines, and also help the team to develop good practices around monitoring and incident response.

Who we are looking for:

We are looking for a site reliability engineer that is excited by the opportunity to contribute to the growth of the choreograph create platform. The site reliability engineer must have hands on experience on debugging both automated and human processes, experience in working both in software engineering and in automation. The engineer enjoys teaching and practicing site reliability concepts with the team members, can find a balance in all things, and have experience managing stateful distributed systems.

Role requirements:
  • Degree in Computer Science (or equivalent);
  • 5+ years of experience (or equivalent) in the field of site reliability and programming;
  • Designing and implementing software that improves stability, scalability, availability, and latency; and designing, building, and running tests to verify this;
  • Setting up system health monitoring and automated processes to prevent outages;
  • Defining correcting actions and support in recovering quickly from actual outages;
  • Implementing automation tools for continuous integration/delivery/deployment;
  • Help the team to develop good practices around monitoring and response;
  • Extra points if:
    • Ability to program with one or more high level languages (such as Python, Go or Clojure) with a proven record of accomplishment of automation and an algorithmic approach to solving problems.
    • In-depth knowledge and experience in at least one of: troubleshooting, host-based networking, Linux or UNIX engineering, systems programming, distributed systems, databases, cloud computing, and a desire to learn more.
    • Experience with one or more of the following: Terraform, AWS, Kubernetes, Helm, Prometheus, Grafana, Elasticsearch, Kibana, Redis, Kafka.


Success attributes:
  • High energy and passion for the job;
  • Motivated, self-starter, self-reliant, resilient, and ambitious;
  • Comfortable and thrive in a fast-paced, entrepreneurial, start-up environment


Darwin Recruitment is acting as an Employment Agency in relation to this vacancy.
Start date
04/2024
From
Darwin Recruitment
Published at
12.03.2024
Project ID:
2727491
Contract type
Permanent
To apply to this project you must log in.
Register