- Company Name
- Sentiro Partners
- Job Title
- Machine Learning Researcher (Foundational Models) East Coast
- Job Description
-
**Job title**
Machine Learning Researcher – Pretraining Systems (Foundational Models)
**Role Summary**
Lead experimental and theoretical research on large‑scale model pretraining. Design, execute, and analyze pretraining runs (10B+ parameters), develop diagnostics, and optimize distributed training for efficiency and scalability. Bridge modeling, system engineering, and empirical science to uncover principles that enable efficient learning in massive models.
**Expectations**
- Deliver reproducible, quantifiable results on scaling, data mixtures, and training dynamics.
- Reduce pretraining cost while maintaining or improving validation performance.
- Publish actionable insights on scaling laws, mixture design, and system optimizations.
- Mentor junior researchers and collaborate cross‑functionally with engineering teams.
**Key Responsibilities**
- Conduct controlled ablations and large‑scale experiments on pretraining objectives and data mixtures.
- Build and maintain instrumentation for profiling loss surfaces, gradient flow, activation distributions, and inflection point prediction.
- Optimize distributed training pipelines: scheduling, sharding, checkpointing, and resource utilization on multi‑node GPU/TPU clusters.
- Design evaluation harnesses for emergent behaviors (reasoning, tool‑use, temporal consistency).
- Analyze and interpret results, quantify trade‑offs (e.g., tokenization choices, mixture composition) and communicate findings.
- Contribute to open‑source tool development and share insights with the broader research community.
**Required Skills**
- 2–5 years post‑PhD in machine learning research (PhD preferred, high‑performing MA accepted).
- Proven experience designing/scaling pretraining runs (≥10B parameters) and distributed training systems.
- Deep familiarity with FSDP, DeepSpeed, Megatron‑LM, JAX/TPU frameworks.
- Strong profiling and diagnostics expertise: gradient noise scale, loss curvature, tokenization effects.
- Data‑centric experimentation: dataset filtering, mixture sampling, quality assessment.
- Proficiency in Python (PyTorch/JAX); C++ (or equivalent) for system instrumentation.
- Quantitative mindset: rigorous statistical analysis, reproducibility, and metric‑driven hypothesis testing.
**Required Education & Certifications**
- PhD in Computer Science, Machine Learning, or related field (or equivalent high‑level experience).
- Optional certifications in distributed systems or high‑performance computing are welcome.
California, United states
On site
Junior
05-11-2025