- Company Name
- Sanderson
- Job Title
- Site Reliability Engineer
- Job Description
-
**Job Title:** Lead Site Reliability Engineer – Observability
**Role Summary:**
Lead the design, deployment, and governance of end‑to‑end observability for a cloud‑native digital platform. Drive observability‑by‑design throughout the Software Development Lifecycle, define metrics/alerts, and enable engineering teams to self‑serve insights into performance, reliability, and customer experience.
**Expectations:**
• 10+ years of engineering experience with 5+ in SRE, Observability, or DevOps.
• Proven track record of implementing observability stacks in AWS, Azure, or GCP environments.
• Strong automation skills (IaC, CI/CD) and proficiency in one or more programming languages (Python, Go).
• Ability to influence cross‑functional teams, set standards, and mentor stakeholders in observability best practices.
• Demonstrated experience with distributed systems, microservices, Kubernetes, Docker, and enterprise‑grade monitoring tools.
**Key Responsibilities:**
1. Define and own the observability roadmap aligned with business objectives.
2. Establish and maintain SLOs, SLIs, error budgets, capacity planning, and predictive analytics.
3. Design, implement, and document runbooks for metrics, logs, traces, synthetics, and customer‑journey monitoring.
4. Enable teams with synthetic monitoring, health checks, observable CI/CD pipelines, distributed tracing, and cloud‑native patterns.
5. Champion data‑driven culture, adopt governance standards, and promote operational excellence across engineering.
**Required Skills:**
- Observability tools: Datadog, Grafana, Prometheus, OpenTelemetry.
- Container orchestration: Kubernetes, Docker.
- Infrastructure as Code: Terraform, Ansible.
- CI/CD platforms: GitHub Actions, GitLab CI, Jenkins, Argo CD.
- Scripting: Python, Go, Bash.
- Performance & capacity planning, resilience testing.
- Secrets management, RBAC, audit logging, compliance.
**Required Education & Certifications:**
- Bachelor’s degree in Computer Science, Engineering, or equivalent practical experience.
- Certifications: AWS Certified Solutions Architect / Azure Solutions Architect, Certified Kubernetes Administrator, Google Cloud Professional Cloud Architect, or equivalent.
- Optional: Certified Kubernetes Security Specialist, Splunk Certified.