- Company Name
- InfoVision Inc.
- Job Title
- Machine Learning Engineer
- Job Description
-
**Job title:** Machine Learning Engineer (MLOps)
**Role Summary:**
Design, build, and maintain end‑to‑end MLOps pipelines and FastAPI microservices for scalable model inference, deployment, and monitoring. Lead architecture of a self‑serve MLOps platform, implement CI/CD, GitOps, and observability across Azure Kubernetes Service (AKS) and other cloud environments. Collaborate cross‑functionally to ensure reproducible, secure, and high‑availability model delivery.
**Expactations:**
• Minimum 2+ years of production ML engineering experience
• Proven track record in MLOps, CI/CD, and containerized deployments
• Strong Python programming with FastAPI and experience in model packaging and serving (e.g., scikit‑learn, PyTorch, XGBoost)
• Hands‑on expertise with Docker, Kubernetes, AKS, GitHub Enterprise, and GitOps (Argo CD)
• Experience with deployment strategies (blue/green, canary, shadow) and release promotion pipelines
**Key Responsibilities:**
- Design, develop, and maintain MLOps pipelines for data preparation, training, validation, packaging, and deployment.
- Build FastAPI microservices for model inference, ensuring clear API contracts, versioning, and comprehensive documentation.
- Implement deployment strategies on AKS using GitOps with Argo CD; manage blue/green, canary, shadow, champion/challenger workflows.
- Architect and evolve a self‑serve MLOps platform (standards, templates, CLI / scaffolds) for repeatable, secure model delivery across teams.
- Operationalize machine‑learning frameworks (scikit‑learn, PyTorch, XGBoost) for low‑latency, scalable serving.
- Develop CI/CD pipelines for ML, integrating automated testing, security scanning, build, packaging, and promotion steps.
- Integrate telemetry, observability, logging, metrics, tracing; establish SLOs for model services.
- Monitor model and data drift, automate retraining, evaluation, and safe rollout/rollback workflows.
- Collaborate with software engineers, data scientists, and platform/SRE teams to integrate ML services into applications and shared platforms.
- Champion best practices for code quality, reproducibility, governance (model registry, artifact approval, security).
**Required Skills:**
- Python (production services, FastAPI)
- MLOps: packaging, serving, scaling, monitoring, drift detection
- CI/CD (GitHub Enterprise, automated pipelines)
- Containerization & orchestration: Docker, Kubernetes, AKS
- GitOps: Argo CD, deployment strategies (blue/green, canary, rollback)
- RESTful API design, microservice patterns, API contract governance
- Helm/Kustomize, secrets management, security scanning
- Cloud platforms: Azure (preferred), familiarity with GCP and managed ML services
- Familiarity with ML lifecycle tools (MLflow, feature stores)
- Agile development practices
**Required Education & Certifications:**
- Bachelor’s degree in Computer Science, Software Engineering, Data Science, or related field (or equivalent practical experience).
- Professional certifications (e.g., Azure Certified Solutions Architect – optional).