- Company Name
- Avenue Code
- Job Title
- Senior SRE
- Job Description
-
**Job title**: Senior Site Reliability Engineer (SRE)
**Role Summary**:
Drive the design, construction, and operation of a cloud platform that powers engineering teams. Lead reliability, performance, and security initiatives for production‑critical systems, ensuring robust, scalable, and cost‑effective operations across AWS, Kubernetes, and supporting services.
**Expectations**:
- Proven experience managing high‑availability, production‑critical environments.
- Deep expertise with AWS Cloud and cloud‑native architecture.
- Extensive experience with Kubernetes (EKS, GKE) at scale.
- Strong background in Terraform‑based IaC, CI/CD pipelines (GitHub Actions, ArgoCD, Jenkins, etc.).
- Solid understanding of network, web, and security protocols (HTTP, REST, TLS, DNS).
**Key Responsibilities**:
- Automate infrastructure provisioning and deployments with Terraform, integrating best‑practice CI/CD pipelines.
- Define and monitor SLIs/SLOs, manage error budgets, and create dashboards/alerts for proactive system health.
- Enforce least‑privilege IAM policies, automate vulnerability scans, and maintain audit logs for compliance.
- Instrument services with metrics, logs, tracing; build alerts, custom metrics, and dashboards for rapid troubleshooting.
- Own on‑call rotation, lead incident response, conduct post‑mortems, and drive continuous improvement.
- Implement cost‑optimization tactics: tagging, right‑sizing, and data‑driven spend control.
- Author runbooks, standards, and best‑practice guides; mentor dev teams on DevOps, reliability, and security patterns.
**Required Skills**:
- AWS Cloud (EC2, VPC, RDS, ElastiCache, etc.)
- Kubernetes (EKS, GKE) and container orchestration at scale
- Terraform (plus Terragrunt knowledge a plus)
- Git workflows, branching strategies, CI/CD integration
- Database management (Redis, PostgreSQL)
- Monitoring/observability tools (Prometheus, Grafana, Loki, Jaeger, etc.)
- Networking: VPC, VPN, Load Balancers, Cloud networking components
- Programming/scripting: Python, Golang, Shell; Helm templating; Node.js a plus
- Security fundamentals for cloud infrastructure, least‑privilege IAM, audit logging
**Required Education & Certifications**:
- Bachelor’s degree in Computer Science, Engineering, or equivalent (or equivalent professional experience).
- Relevant AWS certifications (e.g., AWS Certified Solutions Architect, DevOps Engineer) are preferred.
Mountain view, United states
Hybrid
Senior
17-02-2026