- Company Name
- MHR
- Job Title
- Site Reliability Engineering Manager
- Job Description
-
**Job Title:** Site Reliability Engineering Manager
**Role Summary:**
Lead and scale the Cloud Operations team responsible for the reliability, performance, and automation of a micro‑services SaaS platform hosted on Microsoft Azure. Drive SRE practices, foster a culture of continuous improvement, and ensure high availability, observability, and cost‑effective operation of the People First HR/payroll solution.
**Expectations:**
- Build and mentor a high‑performing engineering team.
- Define and enforce SRE standards, SLIs/SLOs, and incident‑response processes.
- Align operational work with product roadmaps and architecture guidelines.
- Deliver measurable improvements in MTTR, platform health, and cost efficiency.
**Key Responsibilities:**
- Lead, coach, and develop Cloud Operations engineers.
- Own monitoring, alerting, and observability frameworks (Dynatrace, Azure Monitor, Application Insights, Grafana).
- Design and implement automation for environment provisioning, deployments, configuration drift correction, and resilience routines using IaC (Terraform, Bicep) and scripting (PowerShell).
- Partner with Platform, Development, and Architecture teams to ensure solutions are operable, resilient, and disaster‑recovery ready.
- Manage on‑call rotation, lead incident response, conduct root‑cause analyses, and drive preventive actions.
- Optimize CI/CD pipelines for Java/.NET services, integrating quality gates, automated testing, performance checks, and release observability.
- Contribute to governance, cost‑optimization, and capacity‑planning initiatives.
**Required Skills:**
- Proven experience leading and mentoring engineering teams in a cloud‑native environment.
- Strong understanding of Java and/or .NET service architectures, release management, and rollback safety.
- Hands‑on expertise with Azure services, autoscaling, fault tolerance, and SaaS operational readiness.
- Proficiency in monitoring and observability tools (Dynatrace, Azure Monitor, Application Insights, Grafana).
- Deep experience with Infrastructure as Code (Terraform, Bicep) and automation scripting (PowerShell).
- Demonstrated ability to design, implement, and improve CI/CD pipelines for Java/.NET codebases.
- Solid incident‑management skills, including RCA, post‑mortem facilitation, and preventive implementation.
- Excellent communication, collaboration, and stakeholder‑management abilities.
**Required Education & Certifications:**
- Bachelor’s degree in Computer Science, Engineering, or a related technical field (or equivalent practical experience).
- Relevant certifications are a plus (e.g., Microsoft Certified: Azure Solutions Architect Expert, Certified Kubernetes Administrator, or SRE/DevOps certifications).
Ruddington, United kingdom
On site
05-02-2026