Description
Senior DevOps Advisory Technicians
Context
Site reliability
Site reliability engineering is the bridge between developers and IT operations, even in a DevOps culture. Site reliability engineers will be dedicated Full time to creating software that improves the reliability of systems in production, fixing issues, responding to incidents including taking on-call responsibilities.
SRE Teams
The SRE teams are in charge of proactively building and implementing services to make IT and support better at their jobs. This can be anything from adjustments to monitoring and alerting to code changes in production. Similarly, a site reliability engineer can expect to spend time fixing support escalation cases. However the SRE operations mature, systems will become more reliable, leading to fewer support escalations. As the SRE team touches many parts of the and IT organization, it becomes a great source of knowledge and can vital in routing issues to the right people for immediate and/or future effective resolution
Requirement
Technically experienced individuals in the DevOPs space with proven ability in the coaching and development of teams in an embryonic DevOps environment; able to liaise with senior technicians and technical management to develop the and hone the environment to high levels of maturity
Different roles available require Software Engineering (1 role), Infrastructure Engineering(1 role) and Test Engineering (1role) demonstrable experience in each of the disciplines
Candidates must show a high level of technical understanding and high credibility when liaising with senior management, both business and technical
Objectives of the Role
Coach and develop the client teams in the DevOps environments through planned meetings and workshops
Liaise and advise senior technical management in the development of DevOps as the environment of choice
Successfully and efficiently oversee the production environment by monitoring availability and taking a holistic view of system health
Continuously building software and systems to manage platform infrastructure and applications
Improving reliability, quality, and time-to-market of our suite of software solutions
Measuring and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve
Providing primary operational support and engineering for multiple large distributed software applications
Regular timely Responsibilities
Gather and analyse
Coach and develop the client teams in the DevOps environments through planned meetings and workshops
Liaise and advise senior technical management in the development of DevOps as the environment of choice
Gather metrics from both operating systems and applications to assist in performance tuning and fault finding
Partner with development teams to improve services through rigorous testing and release procedures
Participate in system design consulting, platform management, and capacity planning
Create sustainable systems and services through automation and uplifts
Balance feature development speed and reliability with well-defined service level objectives
Skills and Qualifications
Ability to program (structured and OO) with one or more high level languages, such as Python, Java, C/C++, Ruby, and JavaScript
Experience with distributed storage technologies like NFS, HDFS, Ceph, S3 as well as dynamic resource management frameworks (Mesos, Kubernetes, Yarn)
A proactive approach to spotting problems, areas for improvement, and performance bottlenecks