- Company Name
- team.blue
- Job Title
- AI/ML - Platform Lead
- Job Description
-
**Job Title**: AI/ML - Platform Lead
**Role Summary**: Lead the development and maintenance of an enterprise AI/ML platform to support AI/ML workloads across an organization spanning 22 European countries. Focus on architecting scalable infrastructure, driving adoption of GenAI technologies, and mentoring cross-functional teams to operationalize LLMs and AI/ML workflows.
**Expectations**:
6+ years of hands-on experience building AI/ML platforms; deep expertise in cloud infrastructure, AI/ML operations, and cross-functional collaboration. Proficiency in modern LLM toolchains, model optimization, and backend development required.
**Key Responsibilities**:
- Architect and implement scalable AI/ML infrastructure for GPU-accelerated workloads, including private cloud environments, Kubernetes, and observability frameworks.
- Build core GenAI application platforms to power LLM workflows (e.g., RAG, inference pipelines, vector databases).
- Design and optimize services for training/inference scalability, model versioning, and latency management in production LLMs.
- Drive cross-functional adoption of ML/AI tooling via reusable components, automation, and metrics-driven dashboards.
- Define technical roadmaps aligning innovation with compliance, fairness, and security standards for AI/ML systems.
- Mentor teams of engineers, scientists, and domain experts to operationalize LLMs, implement RAG systems, and integrate multi-modal data processing.
**Required Skills**:
- **Technical**: Kubernetes, Python (FastAPI), Go, Docker, Terraform; expertise in AIOps/MLOps practices, ETL pipelines, and observability frameworks.
- **AI/ML Specialization**: LLM toolchains (LangChain, LlamaIndex), model optimization (quantization, ONNX), RAG systems, vector databases (Qdrant, Pinecone), LLM inference engines (vLLM, Tensor-LLM).
- **Architecture**: Microservices, event-driven systems (Kafka, SSE), and production-grade ML pipeline design.
- **Problem Solving**: Model fine-tuning, prompt engineering, semantic search, and latency-critical deployment strategies.
**Required Education & Certifications**:
Bachelor’s or Master’s in Computer Science, Engineering, or equivalent; certifications in cloud (AWS/GCP/Azure) or AI/ML (e.g., TensorFlow, PyTorch) preferred but not mandatory.