cover image
Sustainable Talent

Sustainable Talent

www.SustainableTalent.com

1 Job

16 Employees

About the Company

Sustainable Talent delivers in-demand talent on demand, providing AI-driven, human-centric workforce solutions to help companies scale effectively. We specialize in talent acquisition, offering strategic consulting, recruitment, and specialized services that align with our clients' business objectives. Whether you're looking for contingent workers, executive talent, or tech specialists, we ensure the right people are in place at the right time to drive success.

We partner with industry leaders like Amazon, Ford, and NVIDIA to fuel innovation through diverse and inclusive hiring practices. Whether you're scaling teams across 180+ countries or need flexible recruitment solutions, our services are tailored to meet your unique needs.
Let's innovate together!

#SustainableTalent #RPO #TechTalent #WBENC #WomenOwned #Inc5000

Listed Jobs

Company background Company brand
Company Name
Sustainable Talent
Job Title
Senior System Engineer
Job Description
**Job Title** Senior System Engineer **Role Summary** Design, deploy, and maintain a high‑availability compute cluster supporting NVIDIA’s next‑generation GPU, AI/ML, and accelerated computing hardware. Lead system recovery, root‑cause analysis, and continuous improvement of infrastructure reliability, performance, and operational efficiency in a large‑scale on‑prem data center and lab environment. **Expectations** - Manage and expand a dense GPU‑clustered compute farm, ensuring uptime, performance, and safety. - Deliver rapid remediation of hardware, network, storage, and thermal incidents. - Scale new systems, perform qualification, benchmarking, and lifecycle management. - Meet internal SLAs (PUE, MTTR, test throughput) and collaborate on capacity planning. **Key Responsibilities** - Partner with system architects, hardware, firmware, QA, and platform teams to develop and release products. - Maintain racks, GPU nodes, interconnects, storage arrays, and supporting infrastructure (power, cooling, UPS). - Monitor availability, conduct root‑cause analysis, and drive remediation initiatives. - Deploy, qualify, and scale high‑density GPU clusters, rack‑scale systems, and liquid‑cooling environments. - Coordinate inventory, asset lifecycle, configuration management, decommissioning, and refresh. - Ensure lab and data‑center hygiene (cable management, ESD compliance, tool control). - Troubleshoot cross‑platform issues (Windows, Linux, macOS) with firmware, OS, and platform infrastructure. - Represent the infrastructure team in reviews and global NVIDIA coordination meetings. **Required Skills** - Experience in large‑scale datacenter or compute‑lab environments (compute‑dense, hyperscale). - Proficient with DCIM tools (e.g., Nautobot), version control (Git, Perforce), and automation (shell, Python, Ansible). - Strong networking fundamentals (TCP/IP, DNS, NFS, SSL/TLS, IPv6) and high‑bandwidth interconnects. - Multi‑OS support: Windows, macOS, Linux, BIOS/firmware updates, driver deployments, system imaging. - Physical hardware expertise: PCBs, GPUs, server/node deployments, rack integration, cooling/power, cable/fibre management. - Excellent written and verbal communication; analytical problem‑solving; ownership mindset. **Required Education & Certifications** - Associate’s or Bachelor’s degree in Engineering, Computer Science, or related technical field (or equivalent experience). - Certifications preferred: CCNA/CCNP, or similar networking/infrastructure credentials. - Experience with HPC or GPU clusters (Slurm, Kubernetes, BCM) and private cloud stacks (OpenStack, VMware, Nutanix) is advantageous.
Hillsboro, United states
On site
Senior
06-11-2025