- Company Name
- Podium
- Job Title
- Site Reliability Engineer
- Job Description
-
Job title: Site Reliability Engineer
Role Summary:
Design, deploy, and maintain highly available, scalable production systems for a cloud‑native SaaS platform. Work at the intersection of software and infrastructure engineering to ensure platform stability, performance, and security while collaborating closely with product and DevOps teams.
Expectations:
Own reliability goals, participate in on‑call rotations, mentor junior engineers, and continually improve system observability, automation, and incident response processes.
Key Responsibilities:
- Deploy and manage Kubernetes, Helm, Docker, and Linux servers across AWS (and optionally GCP/Azure) using Terraform and Ansible.
- Build and maintain CI/CD pipelines with GitLab/GitHub and automate release workflows.
- Implement monitoring and observability with Datadog, Honeycomb, Prometheus, and logging pipelines; respond to alerts and production incidents.
- Design and enforce capacity planning, scaling strategies, and disaster‑recovery plans.
- Collaborate with cross‑functional teams to minimize downtime and optimize platform resilience.
- Mentor and coach less experienced engineers on SRE principles and best practices.
- Maintain compliance with SOC2, HIPAA, PCI, or similar frameworks as required.
Required Skills:
- 4+ years of production system support in software or systems engineering.
- 3+ years of Linux server deployment, operation, and debugging.
- Proficiency in Kubernetes, Helm, Docker, AWS, Terraform, Ansible, and CI/CD tooling.
- Experience with observability tools (Datadog, Honeycomb, Prometheus).
- Strong programming skills in Python, Go, or Ruby.
- Familiarity with distributed systems, microservices, and system design principles.
- Ability to participate in on‑call rotations and handle production incidents.
- Excellent written and verbal communication.
Required Education & Certifications:
- Bachelor’s degree in Computer Science, Engineering, or equivalent technical experience.
- Relevant certifications (e.g., AWS Certified DevOps Engineer, Certified Kubernetes Administrator, or Terraform Associate) are advantageous but not mandatory.