cover image
xAI

xAI

x.ai

14 Jobs

2,602 Employees

About the Company

Understand the Universe

Listed Jobs

Company background Company brand
Company Name
xAI
Job Title
Software Engineer - Sandbox Service
Job Description
**Job Title:** Software Engineer – Sandbox Service **Role Summary:** Design, develop, and operate a secure, high‑performance sandbox platform that provides AI models with controlled access to compute environments (containers and virtual machines). Work spans the full stack—from cluster‑level job orchestration and resource scheduling to node‑level filesystem and networking optimization—supporting both training and product workloads. **Expectations:** - Deliver robust, scalable solutions in a fast‑moving, iterative environment. - Maintain high reliability and performance standards. - Communicate technical concepts clearly and collaborate effectively with cross‑functional teams. - Demonstrate initiative and ownership of end‑to‑end feature development. **Key Responsibilities:** - Build and maintain sandbox infrastructure for provisioning containers/VMs on large clusters. - Implement job orchestration, resource scheduling, and isolation mechanisms. - Optimize filesystem performance and network stack on compute nodes. - Ensure security and isolation of model‑executed code (e.g., using cgroups, KVM, gVisor, QEMU). - Support real‑time code execution for user queries and reinforcement‑learning training pipelines. - Diagnose and resolve performance, reliability, and scalability issues. - Contribute to documentation and knowledge sharing within the team. **Required Skills:** - Expert‑level programming in **Rust**, **C++**, or **Go** (one required, others a plus). - Proficiency with **Python** for integration and scripting. - Deep experience with **Linux** systems; Windows knowledge is a strong plus. - Hands‑on experience with virtualization/container technologies (cgroups, KVM, gVisor, QEMU). - Solid understanding of the **networking stack** (TCP/IP, sockets, firewalls, etc.). - Experience designing or working on **distributed systems**. - Strong problem‑solving ability and written/oral communication skills. **Required Education & Certifications:** - Bachelor’s (or higher) degree in Computer Science, Software Engineering, or a related technical field, **or** equivalent professional experience. - No specific certifications required; demonstrated technical expertise is essential.
London, United kingdom
On site
18-12-2025
Company background Company brand
Company Name
xAI
Job Title
Backend Engineer - Grok Imagine
Job Description
**Job title:** Backend Engineer – Grok Imagine **Role Summary:** Design, develop, and scale backend services that power AI‑driven, media‑rich experiences for Grok users worldwide. Own the end‑to‑end lifecycle—from architecture and code to deployment and monitoring—ensuring high performance, reliability, and seamless real‑time interactions for millions of users. **Expectations:** - Deliver production‑ready solutions with precision and ownership. - Demonstrate strong prioritization, initiative, and clear communication. - Thrive in a small, flat organization that values engineering excellence. - Maintain a passion for AI and media technologies while ensuring user focus and scalability. **Key Responsibilities:** - Architect scalable backend systems that support real‑time, multimodal media interactions. - Build and optimize large‑scale data pipelines for ingesting, processing, and analyzing multimodal content. - Collaborate with frontend engineers, AI researchers, and product teams to deliver media‑rich features. - Ensure high performance, reliability, and security across all services. - Own end‑to‑end development: from system design, implementation, testing, deployment, to production monitoring and incident response. **Required Skills:** - Proficiency in Python or Rust, with clean, efficient, maintainable coding practices. - Experience building backend services for consumer‑facing products at scale. - Strong knowledge of distributed systems, database design (SQL/NoSQL), caching, and message‑queue architectures. - Ability to design, implement, and maintain large‑scale data infrastructure for AI applications. - Familiarity with AI/ML model serving pipelines and real‑time inference workloads. - Excellent problem‑solving, debugging, and performance‑optimization skills. - Strong verbal and written communication; ability to explain complex concepts succinctly. **Required Education & Certifications:** - Bachelor’s degree in Computer Science, Software Engineering, or a related technical field (advanced degree preferred). - Certifications in relevant technologies (AWS, GCP, Azure, Kubernetes, etc.) are advantageous but not mandatory.
Palo alto, United states
On site
06-01-2026
Company background Company brand
Company Name
xAI
Job Title
RDMA Engineer - Supercomputing
Job Description
Job title: RDMA Engineer – Supercomputing Role Summary: Design, implement, and optimize RDMA‑based networking solutions for large GPU supercomputing clusters. Focus on minimizing latency and maximizing bandwidth using NVIDIA GPUDirect RDMA, Mellanox InfiniBand/RoCE, and containerized workloads. Expectations: - Deliver low‑latency, high‑throughput inter‑node communication for AI training and inference workloads. - Integrate RDMA technologies into Kubernetes environments and HPC frameworks. - Work closely with AI researchers and infrastructure teams to accelerate data pipelines and collective communications. Key Responsibilities: - Develop and tune RDMA communication stacks (GPUDirect RDMA, RoCEv2) on NVIDIA GPUs and Mellanox NICs. - Optimize direct GPU‑to‑network memory access to reduce CPU overhead. - Integrate RDMA solutions with Kubernetes networking, ensuring reliability across distributed compute and storage. - Enhance HPC communication libraries (MPI, NCCL) for GPU‑accelerated workloads. - Diagnose and resolve performance bottlenecks in high‑throughput, low‑latency environments. - Collaborate with cross‑functional teams to support large‑scale parameter synchronization. Required Skills: - Hands‑on experience with NVIDIA RDMA technologies (GPUDirect RDMA, RoCE, InfiniBand) in HPC or AI supercomputing. - Proficient in Rust, C, or C++ for low‑level networking and systems optimization. - Knowledge of NVIDIA networking stack, Mellanox drivers, libibverbs, NVPeerMemory. - Experience optimizing distributed systems using MPI, NCCL, or similar GPU communication frameworks. - Familiarity with Kubernetes networking and containerized RDMA deployment. - Strong analytical and troubleshooting skills; ability to reduce latency and improve throughput. Required Education & Certifications: - Bachelor’s degree (or higher) in Computer Engineering, Computer Science, Electrical Engineering, or related field. - Relevant certifications in high‑performance networking, HPC, or systems engineering are a plus. ---
Palo alto, United states
Hybrid
25-01-2026
Company background Company brand
Company Name
xAI
Job Title
AI Engineer & Researcher - GPU Kernel
Job Description
Job title: AI Engineer & Researcher – GPU Kernel Role Summary: Develop, optimize, and maintain low‑level CUDA kernels for state‑of‑the‑art AI inference and training workloads. Design GPU‑specific solutions that push performance limits, integrate kernels into deep‑learning frameworks, and continuously evaluate and improve GPU utilization. Expectations: * Deliver production‑ready CUDA kernels that meet or exceed performance benchmarks. * Write both forward and backward passes with rigorous correctness checks and floating‑point error handling. * Leverage profiling tools to identify and eliminate bottlenecks in single and multi‑GPU contexts. * Translate cutting‑edge research into efficient GPU implementations and propose novel optimizations. * Communicate technical progress clearly to teammates and stakeholders. Key Responsibilities: * Design and implement high‑performance GEMM kernels using Tensor Cores or CUDA cores, via CUDA/CUTLASS or from scratch. * Extend or author attention kernels, ensuring full forward/backward functionality and correctness. * Optimize memory‑bound and compute‑bound operations, balancing register pressure, shared‑memory usage, and GPU occupancy. * Profile, debug, and tune kernels with Nsight and related tools; remove identified bottlenecks. * Integrate custom kernels into frameworks (e.g., JAX/XLA) using pybind and maintain interface compatibility. * Document kernel performance, design decisions, and integration guidelines for internal knowledge sharing. Required Skills: * Proficiency in C/C++ with CUDA, including deep knowledge of GPU architecture, memory hierarchy, and parallel execution models. * Experience with CUTLASS and Tensor Core programming. * Strong background in implementing both forward and backward passes for deep‑learning kernels, with attention to numerical stability. * Expertise in profiling (Nsight, nvprof) and performance optimization of GPU workloads. * Familiarity with pybind for Python/C++ integration and working with JAX/XLA or similar frameworks. * Solid understanding of floating‑point arithmetic, register allocation, shared memory, and GPU utilization strategies. Required Education & Certifications: * Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, or a related field. * Demonstrated experience in GPU programming and high‑performance computing, ideally within AI or deep‑learning contexts. ---
Palo alto, United states
Hybrid
26-01-2026