Site Reliability Engineer (Automation and DevOps)

Dublin  ‐ Onsite
This project has been archived and is not accepting more applications.
Browse open projects on our job board.

Keywords

Automation DevOps Continuous Integration IT Service Management Reliability Engineering Application Performance Management Architecture Capacity Planning Consulting Linux Geography Groovy Information Technology Infrastructure Libraries (ITIL) Mainframe Computing Ansible Shell Script SQL Databases Sustainability Systems Design YAML Scripting Git Bitbucket Splunk Dynatrace Jenkins

Description

Site Reliability Engineer (Automation and DevOps)

Location: Dublin, Ireland

Key Responsibilities

  • Plan, manage, and oversee all aspects of a production environment
  • Define strategies for application performance monitoring and optimisation in a production environment
  • Respond to incidents
  • Improvise platform based on feedback and measure the reduction of incidents over time
  • Support deployment of code into multiple lower environments
  • Support current processes with an emphasis on automating everything as soon as possible
  • Design, develop and standardise a monitoring and alerting mechanism for the supported applications
  • Take a holistic approach to problem-solving, by connecting the dots during a production event through the various technology stack that makes up the platform, to optimising meantime to recover
  • Engage in and improve the whole life cycle of services - from inception and design, through deployment, operation and refinement
  • Analyse ITSM activities of the platform and provide feedback loop to Development teams on operational gaps or resiliency concerns
  • Support services before they go live through activities such as system design consulting, capacity planning and launch reviews
  • Support the application CI/CD pipeline for promoting software into higher environments through validation and operational gating, and lead in DevOps automation and best practices
  • Maintain services once they are live by measuring and monitoring availability, latency and overall system health
  • Scale systems sustainably through mechanisms like automation and evolving systems by pushing for changes that improve reliability and velocity
  • Work with a global team spread across tech hubs in multiple geographies and time zones
  • Ability to share knowledge and explain processes and procedures to others
  • Share knowledge and mentor Junior resources
  • Ability to perform on-call duties on a rotational basis
  • Occasional off-hours work required

Skills Required

Must have:

  • Linux
  • Mainframe
  • Shell Scripting
  • ITIL/ITSM
  • Application troubleshooting
  • SQL
  • Any monitoring tool (Splunk/Dynatrace preferred)
  • Jenkins - CI/CD
  • Groovy Scripting/YAML (basic)
  • Git (basic)/Bitbucket (basic)

Good to have:

  • Ansible/Chef
  • Event framework architecture
Start date
ASAP
Duration
12 months
From
Tiger Resourcing Group
Published at
29.04.2025
Project ID:
2873834
Contract type
Freelance
To apply to this project you must log in.
Register