- Company Name
- ClientMind Recruiting Inc.
- Job Title
- Site Reliability Engineer
- Job Description
-
**Job Title:** Site Reliability Engineer
**Role Summary:**
Responsible for designing, developing, and maintaining shared infrastructure-as-code (IaC) constructs in AWS CDK and CDK8s, ensuring reliability, scalability, and consistency across the organization’s cloud platform. Collaborate with backend and platform teams to secure, monitor, and automate the underlying cloud and Kubernetes services.
**Expactations:**
- Deliver robust, reusable IaC templates that enable rapid service deployment.
- Drive automation of infrastructure provisioning, scaling, and troubleshooting.
- Enforce SRE practices (SLIs/SLOs, observability, fault tolerance).
- Maintain high uptime and performance of shared services and cluster add‑ons.
**Key Responsibilities:**
- Design and evolve shared CDK/CDK8s constructs for networking, EKS, node groups, RDS, OpenSearch, and MSK.
- Operate and expand Kubernetes add‑ons: ingress controllers, cert‑manager, cluster autoscaler, monitoring/logging stacks.
- Implement alerting, autoscaling, and recovery patterns using Prometheus, Grafana, CloudWatch.
- Publish baseline templates, configuration schemas, and operational documentation.
- Own CI/CD pipelines for IaC repositories and platform releases.
- Diagnose infrastructure incidents, propose root‑cause solutions, and improve resilience.
- Manage IAM roles, secrets, and tenant isolation across multi‑service deployments.
**Required Skills:**
- 5+ years SRE/infrastructure experience with AWS (VPC, IAM, RDS, MSK, S3) and Kubernetes (Helm, RBAC, ServiceAccounts).
- Proficient in Python and IaC using AWS CDK or CDK8s; ability to write clean, reusable code.
- Strong knowledge of Prometheus, Grafana, alert routing, and observability patterns.
- Experience designing internal developer platforms or reusable infrastructure patterns.
- Proven track record in automating reliability improvements through monitoring and ops best practices.
**Required Education & Certifications:**
- Bachelor’s degree in Computer Science, Engineering, or related technical field.
- Optional: AWS Certified Solutions Architect – Associate / Professional, or Kubernetes Administrator (CKA) certification.