- Company Name
- KEMIO Consulting
- Job Title
- Head of MLOps and AI Infrastructure
- Job Description
-
**Job Title**
Head of MLOps and AI Infrastructure
**Role Summary**
Lead the design, development, and operations of large‑scale machine‑learning infrastructure for biotechnological research. Build distributed training systems, data pipelines, and researcher‑friendly platforms that accelerate foundation‑model development on complex biological datasets. Grow and mentor a high‑impact engineering team while controlling cost and ensuring reproducibility.
**Expectations**
- Own end‑to‑end AI infrastructure strategy and delivery
- Deliver scalable, cost‑effective compute across cloud & on‑prem environments
- Drive rapid experimentation for foundation models in genomics, multi‑omics, and clinical data
- Maintain high standards for data quality, privacy, and reproducibility
- Foster a culture of engineering excellence and continuous improvement
**Key Responsibilities**
1. Design & scale distributed training pipelines for large ML models (PyTorch, DeepSpeed, Ray, etc.)
2. Build robust data ingestion, transformation, and storage pipelines from raw biological sources to ML‑ready formats
3. Develop researcher‑centric platforms for experiment tracking, versioning, and reproducibility
4. Manage cloud and on‑prem resources, monitor usage, and optimize spend
5. Lead, coach, and expand a team of ML/platform engineers
6. Collaborate closely with scientists to translate research needs into infrastructure solutions
7. Establish and enforce best practices for CI/CD, testing, and deployment of ML workloads
**Required Skills**
- 5+ years of experience building ML or research infrastructure
- 10+ years of commercial engineering leadership
- Deep expertise in distributed training frameworks (PyTorch, DeepSpeed, Ray, etc.)
- Strong software engineering fundamentals (Python, Go, etc.)
- Proficient with cloud services (AWS, GCP, Azure) and Kubernetes for ML workloads
- Experience managing compute budgets and cost controls
- Ability to thrive in fast‑moving, research‑driven environments
- Excellent communication, mentorship, and stakeholder‑management skills
- Bonus: prior work with scientific or biological data
**Required Education & Certifications**
- Bachelor’s degree in Computer Science, Engineering, or related field (advanced degree or equivalent experience preferred)
- Optional certifications: Cloud Professional (AWS, GCP, Azure), Kubernetes (CKA/CKAD), or ML Ops certifications
---