Job Specifications
Role Overview - SRE – Services Lead / Onsite Technical Lead (Nutanix/Wintel/Citrix):
The SRE – Services Lead / Onsite Technical Lead will be responsible for leading end-to-end operations, reliability engineering, automation, and continuous improvement of the enterprise Wintel and Hyperconverged Infrastructure (HCI) landscape. This includes managing large-scale Windows Server, Nutanix HCI Platforms, Citrix environments, and virtualization platforms while ensuring availability, performance, and resilience of core infrastructure services. The role requires strong technical leadership, ownership of critical incidents, and the ability to guide Onsite teams in delivering high-quality operational outcomes aligned with SRE best practices.
Key Responsibilities:
Ensure availability, performance, and health of Windows servers, Nutanix clusters, virtual machines, Citrix servers, and Dell infrastructure.
Oversee monitoring, alert management, capacity planning, and proactive remediation across compute, storage, virtualization, and Citrix workloads.
Lead root-cause analysis for P1/P2 infrastructure incidents and drive preventative improvements.
Manage operations of Nutanix platforms including cluster health, node maintenance, firmware upgrades, storage management, and replication.
Oversee VM lifecycle management, failover clustering, high availability configuration, snapshots, backups, and DR readiness.
Support VM migrations, OS upgrades, and environment refreshes at scale.
Lead patch management, security hardening, OS recovery, configuration changes, and performance optimization across Windows Server estates.
Guide troubleshooting of logon issues, GPO failures, service failures, and OS-level incidents.
Ensure compliance with configuration baselines, CMDB updates, and operational standards.
Provide leadership for XenApp/XenDesktop, StoreFront, NetScaler Gateway, Citrix policies, profile management, and resource scaling.
Ensure reliability of session performance, user experience, HA configuration, and platform upgrades.
Drive optimization, automation, and incident reduction for Citrix infrastructure.
Implement automation using PowerShell, Ansible, or SDKs to eliminate repetitive tasks and improve operational reliability.
Apply SRE principles—SLIs/SLOs, observability, resilience engineering, error budgeting—to improve overall service performance.
Implement, tune, and maintain monitoring solutions such as SCOM, Prism Central, SolarWinds.
Lead Oniste/onshore teams, ensuring operational excellence, proper workload distribution, and skill development.
Provide technical escalation support and coordinate with vendors (Dell, Nutanix, Citrix).
Drive service improvement initiatives, process automation, and continuous optimization across infrastructure domains.
Ensure compliance with ITIL processes—Incident, Problem, Change, and Release Management.
Required Skills & Expertise:
8 – 10 years for deep hands-on expertise in Nutanix, Windows, Citrix, and Virtualization.
Strong knowledge of Windows Server, clustering, failover technologies, and VMware/Hyper-V virtualization.
Solid understanding of Citrix environments (XenApp/XenDesktop, StoreFront, NetScaler, ICA/HDX troubleshooting).
Experience with backup, DR, failover, and restore processes across VM estates.
Proficiency in scripting/automation using PowerShell, Ansible, or similar tools.
Strong troubleshooting ability across compute, storage, OS, virtualization, and Citrix layers.
Experience operating in large-scale enterprise environments with high availability and security standards.
Demonstrated track record of leading technical teams (onsite or blended model).
Must-Have Skills
Advanced experience with at least one major Hyperconverged platform (Nutanix (strongly preferred) or VxRail).
Strong Wintel cluster support experience.
Hands-on expertise with Citrix infrastructure operations.
Experience leading incident management, RCA, and service improvement.
Solid automation/scripting capabilities (PowerShell).
Strong virtualization experience (VMware ESXi/vCenter).
Good-to-Have Skills
Certifications: Nutanix NCP/NCM, Dell EMC, MCSE, VCP, Citrix CCE/CCP.
Exposure to Azure cloud platforms.
Experience with CI/CD, Infrastructure as Code, or advanced SRE tooling.
Security baselining (CIS), audit compliance, golden image/template management.