cover image
Workonomics

Workonomics

www.workonomics.co.uk

1 Job

4 Employees

About the Company

WE HELP COMPANIES

We recruit for tech firms of all shapes & sizes.

Start-up | Scale-up | Grown-up

We help companies attract game-changing technical talent.


WE HELP CANDIDATES

We recruit engineers across a range of disciplines.

Software | Product | Infrastructure | Data | ML

We help talented technologists find fulfilling mission-oriented work.

Listed Jobs

Company background Company brand
Company Name
Workonomics
Job Title
Senior Site Reliability Engineer
Job Description
**Job title:** Senior Site Reliability Engineer **Role Summary:** Senior SRE leading reliability and observability initiatives across a high‑throughput real‑time decision platform. Drives infrastructure modernization (Terraform + Kubernetes), rebuilds observability stack, and explores AI‑assisted operations to meet 5‑nines reliability targets for a multi‑team SaaS product. **Expectations:** - Deliver scalable, secure, highly available infrastructure and observability. - Own end‑to‑end incident response, post‑mortem analysis, and continuous improvement cycles. - Collaborate cross‑functionally with product, security, and dev‑op teams. - Publish best‑practice guidelines and tooling for use by 150+ engineers. - Demonstrate measurable improvements in reliability and observability coverage. **Key Responsibilities:** - Lead migration from legacy CloudFormation/EC2 stacks to Terraform‑based, Kubernetes‑oriented infrastructure. - Design and implement modular, reusable Terraform modules and Kubernetes operators. - Build and maintain observability architecture: logging, metrics, tracing, and alerting—moving from ELK to modern stack (e.g., Loki, Prometheus/Thanos, Tempo or equivalent). - Set up comprehensive instrumentation for microservices in Python, Go, or JavaScript. - Develop and enforce operational guardrails, SLO/SLA definitions, and error budgets. - Pilot AI/ML tools to reduce toil (incident classification, root‑cause discovery, automated remediation). - Mentor junior SREs and engineer teams on best practices. - Participate in on‑call rotation and lead post‑mortem documentation. **Required Skills:** - Deep expertise with AWS services (EKS, EC2, S3, RDS, etc.). - Advanced Terraform skills – module design, state management, CI/CD integration. - Kubernetes fundamentals (cluster ops, helm, CRDs, RBAC, networking). - Modern programming in Python, Go, or JavaScript (API clients, automation scripts). - Proven experience building observability solutions (logs, metrics, traces). - Incident response, root‑cause analysis, SLO/SLA/BLP implementation. - Familiarity with CI/CD pipelines (GitHub Actions, GitLab CI, Jenkins). - Strong command‑line, scripting, and debugging abilities. **Bonus Skills:** - Experience optimizing observability cost/performance (data retention strategies, sampling). - Contributions to open‑source monitoring or reliability tools. - AI/ML experimentation in SRE context (e.g., incident chatbot, anomaly detection). - Knowledge of chaos engineering and reliability throughput testing. **Required Education & Certifications:** - Bachelor’s or Master’s degree in Computer Science, Engineering, or related field (or equivalent professional experience). - Professional certifications such as AWS Certified Solutions Architect or DevOps Engineer – Professional, or Kubernetes Certified Administrator (CKA) preferred. ---
London, United kingdom
Hybrid
Senior
13-03-2026