- Company Name
- Department for Business and Trade
- Job Title
- Senior Site Reliability Engineer
- Job Description
-
**Job Title:** Senior Site Reliability Engineer
**Role Summary:**
Design, implement, and maintain a highly available, scalable product platform on AWS, ensuring reliable operation of DBT digital services through observability, CI/CD pipelines, and service‑level objectives. Lead on‑call support, mentor junior engineers, and collaborate closely with development teams to embed resilient infrastructure practices.
**Expectations:**
- Deliver robust, tested infrastructure as code at enterprise scale.
- Actively participate in a rotating on‑call schedule, maintaining service reliability.
- Provide clear, actionable feedback on the impact of infrastructure changes to technical and non‑technical stakeholders.
- Mentor junior SREs and drive continuous improvement of the SRE function.
**Key Responsibilities:**
- Build and scale AWS‑based services (ECS, ECR, RDS, ElasticSearch, Redis).
- Design, implement, and maintain observability stack (metrics, logs, dashboards).
- Develop and manage CI/CD pipelines, ensuring automated deployments and rollbacks.
- Define, monitor, and enforce service‑level objectives (SLOs) and incidents.
- Participate in incident response, root‑cause analysis, and post‑mortem practices.
- Collaborate with Dev, Sec, and product teams to align on architecture and security.
- Mentor and coach junior SREs, fostering knowledge sharing and career growth.
**Required Skills:**
- Proficiency in AWS (or Azure/Google Cloud) with hands‑on experience designing large‑scale solutions.
- Strong IaC experience using Terraform, CloudFormation, or Pulumi.
- Programming fluency (Python, Django, or equivalent).
- Experience with Docker, ECS/ECR, ELK stack (ElasticSearch, Logstash, Kibana).
- Database experience with PostgreSQL (RDS) and Redis.
- Deep understanding of Linux/Unix fundamentals, TCP/IP networking, and distributed systems architecture.
- Ability to analyze and troubleshoot complex, multi‑service environments.
- Excellent written and verbal communication; able to explain technical concepts to non‑technical audiences.
**Required Education & Certifications:**
- Bachelor’s degree or higher in Computer Science, Engineering, or a related field (or equivalent professional experience).
- Relevant cloud certifications (e.g., AWS Certified Solutions Architect, Azure Solutions Architect, Terraform Associate) preferred.
- Security clearance: Security Classification (SC) required; candidate must meet UK residency criteria.