- Company Name
- Trigma.AI
- Job Title
- Lead AI Engineer
- Job Description
-
**Job Title:** Lead AI Engineer
**Role Summary:**
Design, build, and maintain the end‑to‑end AI/ML platform that powers autonomous, multi‑agent systems and foundation models. Own production code, define architecture, mentor an engineering team, and ensure operational excellence across multi‑cloud environments.
**Expectations:**
- Senior leadership of AI architecture and delivery.
- Hands‑on coding, CI/CD, and production rollout.
- Mentorship, code review, pair programming, and knowledge sharing.
- Cross‑functional collaboration with product, data science, and business stakeholders.
- Commitment to security, governance, and compliance (SOC 2, ISO 27001).
**Key Responsibilities:**
1. **Agentic & Autonomous AI** – design stateful multi‑agent stacks, tool‑driven execution, interrupt/resume logic, and recovery checkpoints.
2. **RAG & Retrieval** – build hierarchical chunking, hybrid retrieval, relevance scoring, and context‑window management.
3. **ML Platform** – architect modular pipelines: data ingestion, model fine‑tuning, distributed training, inference, feature stores, metadata registry, and experiment tracking.
4. **MLOps & Infrastructure** – implement end‑to‑end pipelines with SageMaker, Vertex AI, Azure ML; manage Kubernetes, Docker, Terraform, ArgoCD, Helm, GitHub Actions.
5. **Observability & Governance** – deploy real‑time monitoring (Prometheus, Grafana, Arize AI, Evidently); enforce RBAC, data lineage, encryption, GDPR data‑access policies.
6. **Technical Leadership** – establish architectural standards, patterns, best practices; produce internal docs, guidelines, and evaluation frameworks.
**Required Skills:**
- 10+ years software/ML engineering, 3+ years senior/lead AI role.
- Python + one of Go, Scala, Java.
- Proven production experience with multi‑agent AI, long‑context reasoning, and tool integration.
- Deep knowledge of RAG, vector stores (Pinecone, Weaviate, FAISS).
- Frameworks: PyTorch, TensorFlow, Keras, Scikit‑learn, XGBoost, LightGBM, Hugging Face, LangChain, MLflow, Ray, Weights & Biases.
- Foundation & multimodal models (CLIP, Flamingo, DALL·E, Stable Diffusion, Perceiver IO).
- Cloud MLOps: AWS (SageMaker, Lambda, EKS, Bedrock), GCP (Vertex AI, BigQuery), Azure ML.
- Orchestration & IaC: Kubernetes, Docker, Terraform, ArgoCD, Helm, GitHub Actions.
- Serving: Triton, TorchServe, BentoML.
- Data pipelines: Kafka, Spark, Airflow, dbt, Flink.
- Observability tools: Prometheus, Grafana, Arize AI, Evidently.
- Security & compliance: SOC 2, ISO 27001, GDPR, RBAC, data‑lineage.
**Required Education & Certifications:**
- Bachelor’s or Master’s degree in Computer Science, Engineering, Machine Learning, or related field.
- Relevant cloud certifications (e.g., AWS Certified Machine Learning, GCP Professional ML Engineer, Azure AI Engineer Associate) preferred.
---