Skills

Leadership Python Java JavaScript C# TypeScript Encryption CI/CD Docker Monitoring Stakeholder Management Architecture Solution Architecture Cloud Architecture Windows Regression Programming Databases Azure AWS Cost Management GCP Microservices gRPC

Job Specifications

Role- AI Architect

Location- London, UK

Experience level- 15+ Years

Job Description:

Architecture & Solution Design

Define reference architectures for GenAI systems: RAG, agentic orchestration, tool/function calling, multi-step reasoning workflows, memory patterns, and context strategies.
Design multi-tenant and enterprise-scale GenAI platforms with clear separation of concerns: UI, orchestration, retrieval, inference, evaluation, and observability.
Select model strategies: hosted LLMs, open-weight models, fine-tuning vs. prompt/RAG, latency and cost tradeoffs, and deployment patterns.

2) Agentic AI Orchestration & Tooling

Architect agent systems (single/multi-agent) including:
Task decomposition, planners/executors, reflection/verification loops
Tool use patterns (APIs, databases, search, workflow engines)
Guardrails to prevent unsafe tool actions and hallucinated commands
Build reliable flows for “human-in-the-loop” decision points and approvals (e.g., procurement, customer comms, incident triage).

3) Retrieval, Knowledge Systems & Data Design

Lead design of knowledge ingestion pipelines:
document parsing, chunking strategies, embeddings, metadata, lineage, freshness SLAs
Architect vector search and hybrid retrieval:
semantic + keyword, reranking, filtering, ACL-aware retrieval
Ensure retrieval respects access control, PII handling, data residency, and auditability.

4) Production Engineering, Reliability & Cost

Set non-functional requirements for GenAI workloads:
SLOs, latency budgets, fallback models, caching, rate limiting
Design cost controls: prompt/token optimization, model routing, batching, and usage governance.
Implement resiliency patterns: circuit breakers, retries, queue-based orchestration, idempotency.

5) Security, Risk & Responsible AI

Establish AI security posture:
prompt injection defenses, data exfiltration controls, tool sandboxing
Define policies and controls for:
sensitive data, logging, redaction, encryption, secret management, and auditing
Collaborate with risk/compliance to drive:
model governance, content safety, bias/quality monitoring, and regulatory alignment

6) Evaluation, Observability & Continuous Improvement

Create evaluation frameworks:
offline evals (golden sets), automated regression, and scenario-based testing
Instrument systems for observability:
traces, prompt/versioning, retrieval diagnostics, tool-call logs, and outcome metrics
Run A/B tests and iterate on prompts, retrieval, and agent policies based on measurable outcomes.

7) Leadership & Stakeholder Management

Partner with product leaders to identify high-value use cases and define roadmap.
Mentor engineers and data scientists on best practices for LLM apps.
Produce architecture artifacts: ADRs, threat models, system diagrams, runbooks.

Required Skills & Experience

Core Technical Skills (Must Have)

8+ years in software/solution architecture with 2+ years delivering GenAI/LLM solutions in production (adjust as needed).
Strong knowledge of LLMs: prompting patterns, context windows, tool/function calling, model limitations, and safety risks.
Agentic AI design experience:
orchestrators, workflows, multi-step reasoning, tool usage, HITL patterns
RAG expertise:
embeddings, vector DBs, hybrid retrieval, reranking, chunking strategies, evaluation
Cloud architecture (Azure/AWS/GCP) with production engineering rigor:
microservices, containers (Docker/K8s), serverless, CI/CD
Solid programming skills (one or more):
Python, TypeScript/JavaScript, Java, C#
Experience with APIs and integration patterns:
REST/gRPC, event-driven systems, queues, workflow engines

Security & Governance (Must Have)

Understanding of GenAI-specific threats:
prompt injection, data leakage, jailbreaks, insecure tool calling
Familiarity with enterprise controls:
IAM, key management, encryption, network isolation, audit logging
Responsible AI practices:
evaluation, content moderation, privacy, and compliance-by-design

Architecture & Systems Skills (Must Have)

Distributed system design:
scalability, fault tolerance, caching, performance tuning
Observability:
logging/metrics/tracing, prompt/version tracking, monitoring SLIs/SLOs
Cost management and performance optimization:
model selection/routing, token reduction, caching, batching

Preferred / Nice-to-Have Skills

Fine-tuning approaches:
LoRA/QLoRA, instruction tuning, adapters, distillation (when appropriate)
Experience with:
Knowledge graphs, semantic layers, enterprise search
Advanced evaluation:
LLM-as-judge with safeguards, rubric scoring, adversarial testing
MLOps/LLMOps toolchains:
experiment tracking, feature stores, model registries, data quality tools
Domain experience:
customer support automation, developer productivity copilots, IT ops agents, finance or healthcare compliance
Experience building platforms:
reusable agent frameworks, reusable RAG components, multi-team enablement

For more information on how we process your personal data, please refer to HCLTech’s Candidat

About the Company

HCLTech is a global technology company, home to more than 220,000 people across 60 countries, delivering industry-leading capabilities centered around digital, engineering, cloud and AI, powered by a broad portfolio of technology services and products. We work with clients across all major verticals, providing industry solutions for Financial Services, Manufacturing, Life Sciences and Healthcare, Technology and Services, Telecom and Media, Retail and CPG, and Public Services. Consolidated revenues as of 12 months ending Dece... Know more