- Company Name
- Generis Tek Inc
- Job Title
- ML Ops Lead
- Job Description
-
Job Title: ML Ops Lead
Role Summary:
Lead the design, development, and operations of end‑to‑end machine learning pipelines on AWS. Drive continuous integration, delivery, and deployment of ML models, ensuring reproducibility, scalability, and reliability across the lifecycle.
Expectations:
• Deliver robust, automated ML pipelines and model deployment processes.
• Ensure production readiness, performance, and compliance with governance standards.
• Collaborate closely with data scientists, data engineers, and platform architects to translate model artifacts into production services.
• Maintain cost‑effective, secure, and highly available cloud infrastructure.
Key Responsibilities:
- Design, build, and maintain CI/CD pipelines for model training, testing, versioning, and deployment using AWS services (SageMaker, ECS/EKS, Lambda, S3, SQS, SNS, IAM).
- Containerize and orchestrate ML workloads with Docker, ECS, and EKS.
- Implement infrastructure-as-code using Terraform and manage IaC deployments.
- Develop and enforce model and data versioning, auditability, and access controls.
- Set up monitoring, alerting, and logging for model performance, data integrity, and pipeline health; detect and remediate drift.
- Create reusable tooling and frameworks to accelerate data science workflows.
- Document processes, configurations, and best practices.
- Communicate status and technical concepts to cross‑functional teams.
Required Skills:
- Deep experience with AWS ML services (SageMaker, SageMaker Pipelines, Model Registry).
- Proficient in Python; strong automation skills with GitHub Actions, Terraform, and CI/CD tooling.
- Expertise in containerization (Docker) and orchestration (ECS, EKS).
- Familiarity with ML Ops frameworks (MLflow, Weights & Biases, Kubeflow) and orchestration (Airflow, Argo Workflows).
- Knowledge of ML model lifecycle, testing, validation, and performance monitoring.
- Understanding of ML frameworks such as PyTorch or TensorFlow.
- Strong communication and documentation abilities.
- Ability to work in a fast‑paced contract environment and transition to full‑time roles.
Required Education & Certifications:
- Bachelor’s degree in Computer Science, Software Engineering, Data Science, or related field.
- Certifications in AWS (e.g., AWS Certified Machine Learning – Specialty, AWS Certified DevOps Engineer) are preferred.