- Company Name
- Curative AI, Inc.
- Job Title
- Senior DevOps Engineer
- Job Description
-
**Job Title:**
Senior DevOps Engineer
**Role Summary:**
Architect, build, and manage scalable cloud infrastructure and CI/CD pipelines for an AI‑powered healthcare platform. Drive reliability, security, and performance across multi‑cloud environments, and lead automation and observability initiatives to support rapid delivery of AI models and services.
**Expectations:**
- Deliver highly available, secure, and cost‑efficient cloud architectures that meet HIPAA, GDPR, and other compliance requirements.
- Own end‑to‑end CI/CD lifecycle, ensuring fast, repeatable deployments and zero‑downtime releases.
- Maintain operational excellence through continuous monitoring, incident response, and proactive performance tuning.
- Collaborate closely with development, data science, and product teams in Agile settings.
- Participate in on‑call rotation and provide clear incident documentation.
**Key Responsibilities:**
1. Design, provision, and operate AWS, Azure, and GCP resources for AI workloads.
2. Implement and maintain CI/CD pipelines using GitHub Actions and related tooling.
3. Deploy and manage Kubernetes clusters, configure Istio service mesh, and maintain Helm charts.
4. Automate infrastructure using Terraform and enforce IaC best practices.
5. Embed DevSecOps: apply security controls, scanning, and compliance checks throughout deployment.
6. Design and deploy telemetry stack (Prometheus, Grafana, ELK, etc.) for observability, reliability, and root‑cause analysis.
7. Monitor system health, performance, and cost; optimize resource utilization and uptime.
8. Troubleshoot infrastructure and application faults; lead incident post‑mortems.
9. Stay current with emerging DevOps tools, cloud services, and AI infrastructure trends.
**Required Skills:**
- 5+ years DevOps engineering experience.
- Proficiency with at least one cloud provider (AWS, Azure, GCP) and multi‑cloud strategy.
- Expertise in IaC with Terraform, Kubernetes (k8s, Helm, Istio), Docker, and GitHub Actions.
- Advanced scripting in Python, Bash, or JavaScript.
- Strong understanding of monitoring, logging, and observability tools: Prometheus, Grafana, ELK.
- Experience with DevSecOps, security best practices, and regulatory compliance (HIPAA, GDPR).
- Networking fundamentals and protocol knowledge.
- Familiarity with AI/ML workloads and MLflow is a plus.
- Excellent problem‑solving, communication, and teamwork skills.
**Required Education & Certifications:**
- Bachelor’s degree in Computer Science, Engineering, or related discipline, or equivalent practical experience.
- Relevant certifications (e.g., AWS Certified Solutions Architect, Google Cloud Professional Cloud Architect, Certified Kubernetes Administrator) are advantageous but not mandatory.