- Company Name
- Hyperbolic
- Job Title
- Head Of Infrastructure
- Job Description
-
**Job Title**
Head of Infrastructure
**Role Summary**
Lead the design, scaling, and reliability of a globally distributed GPU cloud, managing a cross‑functional infrastructure organization and aligning engineering efforts with product, security, and market objectives.
**Expectations**
- Own and execute a multi‑year infrastructure roadmap.
- Build a world‑class engineering team, set standards for excellence, and mentor senior staff and managers.
- Deliver high‑availability, secure, and cost‑efficient systems that support AI workloads at scale.
**Key Responsibilities**
- Architect distributed systems, networking, resource orchestration, and global capacity strategy.
- Design and maintain peer‑to‑peer GPU marketplace, inference fabric, and core platform primitives.
- Oversee multi‑cloud, on‑prem, and edge topologies with GPU‑centric workloads.
- Lead incident response, resilience engineering, and uptime targets (99.9–99.99%).
- Implement automation, IaC, GitOps, and observability (metrics, tracing, logging).
- Drive capacity planning, load forecasting, and cost optimization.
- Ensure security‑first infrastructure: isolation, IAM, hardening, and compliance.
- Collaborate with Product, Security, Platform, and GTM leaders to translate AI workloads into infrastructure solutions.
**Required Skills**
- 10+ years in infrastructure, systems engineering, or distributed systems; 5+ years in leadership.
- Deep knowledge of distributed systems, OS internals, networking, and resource orchestration.
- Hands‑on experience with Kubernetes, Nomad, SLURM, or custom schedulers at global scale.
- Proficiency in Go, Rust, Python, or equivalent for production code.
- Expertise in IaC, automation, GitOps, observability, and incident management.
- Strong judgment balancing velocity, reliability, cost, and security.
- Proven ability to mentor and grow engineering teams across infrastructure, platform, and SRE disciplines.
**Required Education & Certifications**
- Bachelor’s or Master’s degree in Computer Science, Engineering, or related field (preferred).
- Relevant certifications (e.g., Certified Kubernetes Administrator, AWS Certified Solutions Architect, or equivalent) are a plus.
San francisco, United states
On site
21-12-2025