- Company Name
- BitKernel
- Job Title
- DevOps/SRE Leader
- Job Description
-
**Job Title**
DevOps/SRE Leader
**Role Summary**
Strategic technical leader responsible for designing, implementing, and optimizing end‑to‑end DevOps and Site Reliability Engineering practices across a multi‑cloud, Kubernetes‑based platform. Drives automation, reliability, security, and continuous improvement while managing a high‑performance team and aligning delivery with business objectives.
**Expectations**
- Deliver measurable operational excellence and reliability metrics.
- Build and scale a world‑class DevOps organization that integrates tightly with product and engineering teams.
- Champion DevSecOps, IaC, and GitOps principles, ensuring compliance, cost efficiency, and rapid innovation.
- Maintain a 24/7 operations mindset, overseeing on‑call, incident response, and post‑mortem processes.
**Key Responsibilities**
- Define and execute DevOps strategy, roadmap, and portfolio decisions.
- Architect, provision, and manage multi‑cloud infrastructure (AWS, Azure, GCP) using Terraform, Pulumi, and Crossplane.
- Design and maintain Kubernetes or other container orchestration for globally distributed, high‑availability services.
- Build scalable CI/CD pipelines (Jenkins, GitLab CI, ArgoCD) with progressive delivery (blue/green, canary, feature flags).
- Implement observability stack (Prometheus, Grafana, Loki, Datadog), SLO/SLI dashboards, and runbooks.
- Partner with Security to embed DevSecOps, secrets management, and compliance controls.
- Lead on‑call rotations, incident management, and root‑cause analyses.
- Mentor and grow a multidisciplinary DevOps/SRE team.
- Collaborate with backend, QA, and product leadership on capacity planning, architecture reviews, and delivery alignment.
**Required Skills**
- Deep expertise in cloud infra (AWS/Azure/GCP).
- Cloud‑native tooling: Kubernetes, Terraform, Pulumi, Crossplane.
- CI/CD & automation: Jenkins, GitLab CI, ArgoCD, GitOps.
- Observability: Prometheus, Grafana, Loki, Datadog.
- Linux system administration, networking, high‑availability design.
- IaC, GitOps, and DevSecOps best practices.
- Strong leadership, mentorship, and cross‑functional collaboration.
- Incident response, SLO/SLI definition, and post‑mortem culture.
**Required Education & Certifications**
- Bachelor’s (or higher) in Computer Science, Engineering, or related field.
- Relevant cloud certifications (e.g., AWS Certified Solutions Architect, Azure Solutions Architect, GCP Professional Cloud Architect) preferred.
- Certifications in Kubernetes (CKA/CKAD) or IaC (e.g., HashiCorp Certified: Terraform Associate) desirable.