- Company Name
- The AES Group
- Job Title
- Sr. AI DevOps Engineer
- Job Description
-
**Job Title**
Sr. AI DevOps Engineer
**Role Summary**
Design, develop, and deliver an AI‑powered toolkit that integrates Large Language Models (LLMs) with Retrieval‑Augmented Generation (RAG) to automate and enhance technical support operations across hybrid Kubernetes/virtualization environments. Drive end‑to‑end automation, high‑availability design, and AI‑assisted troubleshooting for enterprise‑scale systems.
**Expectations**
- Operate independently in a fully remote setting (US CST/EST).
- Consistently meet defined deliverables on a long‑term contract basis.
- Translate complex infrastructure concepts into usable AI‑driven support tools.
- Collaborate virtually with cross‑functional teams to ensure toolkit alignment with support processes.
**Key Responsibilities**
- Architect and implement AI/LLM components (RAG, chatbot pipelines) within the support toolkit.
- Build and maintain automation for provisioning, scaling, and orchestration of Kubernetes (OpenShift 4.18) and VM workloads.
- Develop modules for Kubernetes fundamentals, OpenShift Virtualization, KVM, and mixed workload management.
- Design resource‑optimization, networking, storage, and high‑availability solutions (e.g., F5 SPK, UDNs, HA replication).
- Implement live‑migration and fault‑tolerance mechanisms for VMs and databases.
- Create real‑time log analysis, alerting, and patching workflows powered by AI.
- Document architecture, APIs, and usage guides for global technical support teams.
- Conduct performance testing and continuous improvement of AI‑driven features.
**Required Skills**
- Deep expertise in Kubernetes (pods, deployments, namespaces, network policies) and OpenShift 4.x.
- Strong background in virtualization technologies (OpenShift Virtualization, KVM, VM orchestration).
- Proven experience designing and integrating LLM/RAG solutions into production tools.
- Proficiency with infrastructure automation (IaC, CI/CD pipelines, Terraform, Ansible, Helm).
- Solid understanding of networking (K8s networking, F5, UDNs) and storage systems (persistent volumes, replication, HA).
- Programming/scripting skills (Python, Go, Bash) and familiarity with AI/ML libraries.
- Excellent problem‑solving ability and capacity to work autonomously in a remote contract role.
**Required Education & Certifications**
- Bachelor’s degree in Computer Science, Engineering, or a related technical field (or equivalent experience).
- Kubernetes Certified Administrator (CKA) or Kubernetes Certified Engineer (CKE).
- Red Hat OpenShift Certified Engineer (RHCE) preferred.
- Cloud certifications (AWS, Azure, or GCP) and/or AI/ML certifications are a plus.