- Company Name
- Gridware
- Job Title
- Senior Site Reliability Engineer
- Job Description
-
Job Title: Senior Site Reliability Engineer
Role Summary: Design, build, and maintain scalable, secure, and highly available cloud-native infrastructure on AWS. Lead Kubernetes (EKS) operations, GitOps deployments (ArgoCD), CI/CD pipeline automation (GitHub Actions), event streaming (Amazon MSK), and relational database management (RDS). Drive Infrastructure as Code with Terraform, enforce security and cost‑optimization practices, and provide observability with Grafana, Loki, and Prometheus.
Expectations: • 5+ years in DevOps/SRE/Platform Engineering with AWS production experience.
• Proven expertise in Kubernetes administration, GitOps, Terraform IaC, CI/CD automation, and distributed systems (Kafka/MSK, RDS).
• Strong networking, security, and IdP integration knowledge (Okta, Auth0, etc.).
• Ability to troubleshoot complex multi‑layer production issues and optimize performance and cost.
Key Responsibilities:
- Architect, implement, and maintain scalable AWS infrastructure (EKS, EC2, RDS, MSK, S3, VPC, etc.).
- Manage Kubernetes clusters, deploy applications via ArgoCD following GitOps best practices.
- Build and maintain CI/CD pipelines using GitHub Actions for rapid, automated releases.
- Operate Amazon MSK for high‑throughput event streaming and ensure data pipeline reliability.
- Implement IaC with Terraform, enforce security controls, IdP integrations, and cost‑optimization.
- Design, deploy, and monitor observability solutions (Grafana, Loki, Prometheus) to ensure system health and performance.
- Collaborate with Cloud Security to enforce compliance and security standards across the stack.
Required Skills:
- AWS (EKS, EC2, RDS, MSK, S3, VPC)
- Kubernetes administration and GitOps (ArgoCD)
- Infrastructure as Code – Terraform (Terragrunt optional)
- CI/CD automation – GitHub Actions
- Distributed systems – Kafka/MSK, relational databases (RDS)
- Identity & access management – Okta, Auth0, or similar IdP
- Observability – Grafana, Loki, Prometheus
- Networking, security best practices, and cost‑optimization
Required Education & Certifications:
- Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience).
- AWS Certified Solutions Architect – Associate or Professional (preferred).
- Kubernetes Administrator (CKA/CKAD) or equivalent (preferred).
- Terraform Enterprise Associate or AWS Certified DevOps Engineer (optional, but a plus).
San francisco, United states
Hybrid
Senior
30-10-2025