- Company Name
- ALTER SOLUTIONS FRANCE
- Job Title
- AI Platform Engineering
- Job Description
-
**Job Title**
AI Platform Engineer
**Role Summary**
Design, operate, and continuously improve a Kubernetes/OpenShift‑based AI platform that deploys, monitors, and manages machine‑learning models in production. Ensure reliability, scalability, and security while automating model lifecycle through CI/CD pipelines and MLOps tooling.
**Expectations**
- Join the Digital Transformation Division supporting an industrial client.
- Work in an agile team focused on delivering stable AI platform operations.
- Maintain day‑to‑day platform health, apply upgrades, and troubleshoot issues.
- Automate and optimize model deployment, scaling, retraining, and inference workflows.
**Key Responsibilities**
1. **Platform Operations** – Monitor cluster health, manage upgrades, and apply configuration changes.
2. **Model Deployment & Supervision** – Deploy models in Kubernetes/OpenShift, automate scaling, manage resource allocation, and oversee operational metrics.
3. **CI/CD Pipeline Management** – Build and maintain Tekton/Kubeflow pipelines for training, retraining, inference, and related tasks; integrate version control, artifact repositories, and automated testing.
4. **Troubleshooting** – Diagnose pod crashes, resource contention, pipeline failures, and implement corrective actions.
5. **Customization & Security** – Build and rebuild custom runtime images, integrate third‑party libraries, and enforce security policies and compliance in all production environments.
6. **Collaboration** – Work closely with data scientists, DevOps, security, and client teams to align platform capabilities with business objectives.
**Required Skills**
- **Container & Orchestration**: Docker, Kubernetes (OpenShift preferred).
- **DevOps & IaC**: Terraform, Helm, Ansible, GitOps workflows.
- **CI/CD & MLOps**: Tekton, Kubeflow, Elyra, LLM tooling.
- **Programming**: Python, Go.
- **ML Frameworks**: PyTorch, TensorFlow, or equivalent.
- **Tooling**: Artifactory, Prometheus, Grafana, security scanning tools.
- **Soft Skills**: Problem‑solving, collaboration in cross‑functional, agile environments, effective communication in English.
**Required Education & Certifications**
- Bachelor’s or Master’s degree in Computer Science, Software Engineering, DevOps, or related field.
- Professional certifications (CKA/CKAD, Docker Certified Associate, CNCF Cloud Native Associate, or MLOps Practitioner) are preferred.