Job Specifications
Location: Remote
Salary: £70,000 - £80,000
About us
At Arbor, we're on a mission to transform the way schools work for the better.
We believe in a future of work in schools where being challenged doesn't mean being burnt out and overworked. Where data guides progress without overwhelming staff. And where everyone working in a school is reminded why they got into education every day.
Our MIS and school management tools are already making a difference in over 7,000 schools and trusts. Giving time and power back to staff, turning data into clear, actionable insights, and supporting happier working days.
At the heart of our brand is a recognition that the challenges schools face today aren't just about efficiency, outputs and productivity - but about creating happier working lives for the people who drive education everyday: the staff. We want to make schools more joyful places to work, as well as learn.
About the role
We are looking for an enthusiastic and proactive Site Reliability Engineer to join our SRE team and help us ensure we provide world-class resilience and performance across the platform. The remit and focus of the role is to advise on all aspects of site reliability including availability, scalability, observability and capacity planning. It's a broad and exciting role, so we're looking for someone up for a challenge - if you're an energetic and a collaborative Site Reliability Engineer, this is the role for you.
Core responsibilities
Proactively monitor and analyse platform performance
Collaborate with engineering teams to address performance bottlenecks and ensure scalability
Assist engineering teams with implementing and reviewing SLOs
Continually improve observability through monitoring and alerting, and dashboards, using tools such as DataDog or Prometheus for example
Work with other teams to ensure it is effective and provides full coverage
Ensure the service is highly available and resilient
Champion best practices in design for high availability
Devise runbooks and run game sessions to test our DR plan, H/A and backups
Conduct assessments of capacity and plan for scaling to meet current and future business needs
Work closely with the Head of Platform Engineering and Head of SRE to strategize and implement scalable solutions
Work closely with the Platform team, feature teams and, 2nd line support and other stakeholders to ensure a good level of service is provided for our customers and embed SRE practices
Key player in the response and troubleshooting of incidents, ensuring rapid resolution and minimising downtime
Participate in blameless postmortems to identify root cause and corrective actions
Develop and maintain playbooks and documentation
Requirements
About you
Experience in performance monitoring and analysis
Capacity planning experience
Scripting and automation skills, with experience in relevant technologies
Experience with Infrastructure as Code, in particular, Terraform
Understanding of relational database technologies and their cloud versions (e.g. AWS Aurora)
Experience with messaging and distributed asynchronous workloads
Experience with nginx or similar technologies
Familiarity with SRE processes
Aware of DevOps principles like the 3 ways and 5 ideals
Bonus Skills
Experience with other database technologies and cloud platforms
Past experience with Enterprise solutions running at scale
Familiarity with Kanban and Agile development processes
Experience with containerisation, for example Docker
Familiarity with software best practices such as Refactoring, Clean Code, Domain-Driven Design and Test-Driven Development
Benefits
What we offer
The chance to work alongside a team of hard-working, passionate people in a role where you'll see the impact of your work everyday. We also offer:
A dedicated wellbeing team who champion initiatives such as mindfulness, lunch n learns, manager training, mental health first aid training and much more!
32 days holiday (plus Bank Holidays). This is made up of 25 days annual leave plus 7 extra company wide days given over Easter, Summer & Christmas
Life Assurance paid out at 3x annual salary
Comprehensive wellness benefit provided by AIG Smart Health, which provides a 24/7 virtual GP service, Mental health support, Counselling, and personalised Health Checks
Private Dental Insurance with Bupa
Salary sacrifice Pension provided by Scottish Widows
Enhanced maternity and adoption leave (20 weeks full pay) and paternity (6 weeks full pay) pay
5 free return to work maternity coaching sessions, helping you adapt to this new exciting time of life!
Access to services such as Calm and Bippit (financial wellbeing coaching)
All of our roles champion flexible working and we are happy to discuss what this means to you
Social committees that plan team, office and company wide events to bring people together and celebrate success
Dedicated professional development training budget (CPD courses, upskilling resources, professional membersh
About the Company
With Arbor, over 7,000 schools and trusts reclaim hours every week, see the data that matters clearly, and support their staff with the tools, time and insight to work at their best.
Know more