cover image
Workonomics

Senior Site Reliability Engineer

Hybrid

London, United kingdom

£ 120,000 /year

Senior

Full Time

13-03-2026

Share this job:

Skills

Python JavaScript Go Incident Response Kubernetes Monitoring Architecture AWS Microservice Terraform

Job Specifications

Keeping a real-time decision system running at five-nines reliability is easy - until 150 engineers and $billions worth of transactions depend on it.

Company | SaaS, Product, B2B2C

Size | ~600 globally, ~60 in London

Role | Senior Software Engineer (SRE)

Areas | Reliability, Observability, Platform Infrastructure

Skills | AWS, Terraform, Kubernetes, Python / Go / JavaScript

Based | Zone 1, London

Hybrid | 2–3 days a week in-office

Offer | up to £120k + bonus + RSUs + 4.5 day work week

We're partnering with a company you likely haven’t heard of by name, but whose systems quietly sit inside the flow of millions of online purchases.

Their platform helps decide, in real time, whether a transaction should be trusted, operating under a five-nines reliability target while supporting 150+ engineers across more than 20 teams.

The challenge isn’t just keeping systems running today - it’s maintaining reliability as services, traffic patterns, and engineering teams continue to grow.

They’re hiring a Senior Software Engineer (SRE) in London to help shape how reliability and observability work across the entire organisation.

The work focuses on three key areas:

Accelerating a major infrastructure modernisation

Parts of the platform still rely on legacy infrastructure tooling (CloudFormation, internal tooling, EC2-heavy workloads)
The direction is clear: Terraform + Kubernetes as the foundation
This role will be hands-on helping drive that transition and building the operational guardrails that make it safe at scale

Rebuilding observability from the ground up

The company is consolidating its observability tooling and raising the reliability bar for every engineering team
That includes moving away from an ELK-based stack and designing a more modern approach to logging, metrics, and tracing that works across a large microservice architecture
This work will shape how every engineer in the company understands and operates their systems

Exploring how SRE teams actually use AI

The team is actively experimenting with AI to reduce operational toil and improve incident response
They’re looking for someone curious about how AI can realistically help reliability teams work smarter

Why this role is interesting

Improvements here compound across the entire engineering organisation
The systems sit inside real-time decision paths where reliability genuinely matters
The platform modernisation is meaningful engineering work, not housekeeping
You’ll be working with strong engineers who care deeply about how systems behave in production

What they’re looking for

Expertise in cloud and infrastructure (AWS, Terraform, Kubernetes)
Software engineering background in a modern language (Python, Go, JavaScript)
Experience improving reliability, observability, and operational tooling in production

Bonus points if you’ve:

worked on observability tooling or monitoring systems
optimised observability cost and performance
contributed to open-source monitoring or reliability tools

If this sounds like you, apply to learn more about the company and the role.

About the Company

WE HELP COMPANIES We recruit for tech firms of all shapes & sizes. Start-up | Scale-up | Grown-up We help companies attract game-changing technical talent. WE HELP CANDIDATES We recruit engineers across a range of disciplines. Software | Product | Infrastructure | Data | ML We help talented technologists find fulfilling mission-oriented work. Know more