- Company Name
- Moonlite AI
- Job Title
- Senior Software Engineer, Storage Platform
- Job Description
-
**Job Title:** Senior Software Engineer, Storage Platform
**Role Summary**
Design and implement high-performance storage platforms for AI infrastructure, supporting massive datasets and enterprise-scale data processing requirements.
**Expectations**
5+ years in software engineering with proven experience in storage platforms, distributed storage systems, or data infrastructure for production environments.
**Key Responsibilities**
- Design and build scalable storage orchestration systems for block, object, and file storage optimized for AI training datasets, model checkpoints, and large-scale data processing.
- Develop systems for Kubernetes/SLURM clusters, enabling shared datasets, persistent storage, and high-throughput access for distributed training and batch workloads.
- Implement storage solutions with low-latency, high-throughput performance for AI training, simulations, and real-time data processing.
- Engineer robust data pipelines for ingestion, processing, and large-scale data movement.
- Build multi-tiered storage orchestration (NVMe, SSD, high-capacity) aligned with access patterns and workload needs.
- Implement enterprise-grade backup, snapshot, replication, and disaster recovery systems.
- Develop storage APIs/SDKs for integration with compute platforms and data systems.
- Design monitoring/optimization systems to track performance, capacity utilization, and access patterns.
**Required Skills**
- Kubernetes storage architecture (persistent volumes, storage classes, CSI drivers), container orchestration.
- Expertise in block/object/file storage, distributed systems, performance optimization.
- Proficiency in Python (expert level); experience with C/C++, Rust, or Go for critical components.
- Strong Linux systems programming (file systems, storage subsystems, kernel-level interfaces).
- Data pipeline engineering, ETL, and large-scale data processing systems.
- Platform/API design for multi-tenancy, data isolation, and reliability.
- Problem-solving for complex performance/scalability challenges in distributed environments.
**Required Education & Certifications**
Bachelor’s degree in Computer Science or related field; relevant certifications (e.g., cloud/storage technologies) preferred but not required.