Skills

Leadership Python Java Go Scala Encryption GitHub CI/CD Docker Kubernetes Monitoring Training Architecture Cloud Architecture PyTorch Scikit-Learn TensorFlow Regression Programming Azure AWS GCP Data Science Spark Langchain CI/CD Pipelines Keras Kafka Terraform Prometheus Grafana Flink GitHub Actions

Job Specifications

Job Specification - Lead AI Engineer

Autonomous Systems • Multi-Agent Architectures • Long-Context RAG • MLOps & Cloud Infrastructure

Location: London, UK

(Hybrid)

Level: Lead / Principal

Type: Full-Time, Permanent

Salary: Competitive + Equity

Start: Immediate

About Trigma.ai

Trigma.ai is a next-generation AI product and services company building intelligent,

autonomous systems at scale. We work across healthcare, enterprise automation, and public

sector domains, delivering production-grade AI infrastructure that is robust, governed, and

impactful. We are growing our core engineering team and are looking for an exceptional Lead AI

Engineer to shape the direction of our AI platform and mentor a high-calibre team.

Role Overview

As Lead AI Engineer at Trigma.ai, you will own the design and delivery of our core AI and ML

platform—spanning agentic systems, foundation model infrastructure, MLOps pipelines, and

multi-cloud architecture. You will bring deep technical expertise alongside the ability to lead,

mentor, and set architectural direction. This is a hands-on leadership role: you will write

production code, define system design, and drive engineering excellence across the team.

1. Key Responsibilities

Agentic & Autonomous AI Systems

• Design and implement stateful, multi-agent architectures supporting long-horizon

reasoning, tool-driven execution, interrupt/resume logic, and checkpoint-based recovery.

• Build graph-based agent controllers, decision state machines, and adaptive execution frameworks for complex, multi-step AI workflows.

• Develop agent-optimized RAG systems with hierarchical chunking, hybrid retrieval, relevance scoring, and context-window management strategies.

• Create evaluation and reliability frameworks for non-deterministic agents, including failure simulation, regression tracking, and behavioral stability metrics.

ML Platform & Foundation Model Infrastructure

• Lead development of modular ML platforms covering data ingestion, pipeline orchestration, and automated fine-tuning for large foundatio

n models.

• Architect scalable, distributed training and inference infrastructure across AWS, GCP, and Azure, optimizing for GPU utilization and cost-efficiency.

• Build and maintain feature stores, metadata registries, and experiment tracking systems to ensure reproducibility and team-wide consistency.

• Productionize agentic and ML workloads using containerization, autoscaling, and execution isolation strategies.

MLOps & AI Governance

• Design and implement end-to-end MLOps frameworks using SageMaker, Vertex AI, Terraform, CI/CD pipelines, and container orchestration (Kubernetes, ArgoCD).

• Develop real-time observability frameworks for model drift, bias, performance monitoring, and cost tracking using tools such as Prometheus, Grafana, Arize AI, and Evidently.

• Embed governance, safety, and auditability controls into AI pipelines—including RBAC, data lineage, encryption, and GDPR-aligned access policies.

• Ensure compliance with relevant standards (SOC 2, ISO 27001) across cloud AI workloads.

Technical Leadership

• Define and champion architectural standards, design patterns, and engineering best practices across the AI engineering team.

• Mentor and develop a team of ML and data engineers, conducting code reviews, pair programming, and structured knowledge-sharing sessions.

• Collaborate with product, data science, and business stakeholders to translate complex requirements into scalable, production-ready AI solutions.

• Author internal documentation, architectural guidance, and evaluation strategies to build organisational AI capability.

2. Required Experience & Skills

Core Technical Skills

• 10+ years of software and ML engineering experience, with at least 3 years in a senior or lead AI/ML engineering role.

• Deep expertise in Python and at least one additional language (Go, Scala, Java).

• Proven experience designing and shipping agentic AI systems in production, including multi-agent coordination, tool use, and long-context reasoning.

• Strong grasp of RAG architectures, vector stores (Pinecone, Weaviate, FAISS), and embedding-based retrieval systems.

ML & GenAI Frameworks

• Hands-on experience with PyTorch, TensorFlow, Keras, Scikit-learn, XGBoost, and LightGBM.

• Practical experience with Hugging Face Transformers, LangChain, MLflow, Ray, and Weights & Biases.

• Familiarity with foundation and multimodal models (CLIP, Flamingo, DALL·E, Stable Diffusion, Perceiver IO).

MLOps & Infrastructure

• Extensive hands-on experience with AWS (SageMaker, Lambda, EKS, Bedrock), GCP (Vertex AI, BigQuery), and Azure ML.

• Proficiency with Kubernetes, Docker, Terraform, ArgoCD, Helm, and GitHub Actions.

• Experience with model serving frameworks such as Triton Inference Server, TorchServe, or BentoML.

• Strong background in building and operating real-time data pipelines using Kafka, Spark, Airflow, dbt, and Flink.

Observability, Security & Co

About the Company

TRIGMA.AI - an AI tech startup founded with the vision to revolutionise framework of product promotional investment optimization, effective customer engagement model for Next Best Experience and Promotional Digital asset Modularization in Connected Artificial Intelligence ecosystem to fuel optimization of omnichannel investment, market share growth and revenue operations through tailored Artificial Intelligence and Gen AI products. Our mission is to end asymmetry of information in large organizations and eradicate siloed m... Know more