- Company Name
- Cerebras Systems
- Job Title
- Applied Machine Learning Research Scientist
- Job Description
-
Job Title: Applied Machine Learning Research Scientist
Role Summary:
Develop and optimize end‑to‑end machine learning pipelines for large language models (LLMs) on a wafer‑scale AI platform. Translate research ideas into production‑ready systems, focusing on training, fine‑tuning, reinforcement learning post‑training, evaluation, and performance tuning.
Expectations:
- Deliver high‑quality, maintainable code that scales to large datasets and distributed training.
- Achieve measurable improvements in model performance and training efficiency.
- Collaborate with researchers and senior engineers to translate research insights into scalable solutions.
- Maintain rigorous testing and evaluation standards across tasks and domains.
Key Responsibilities:
- Apply post‑training techniques (RLVR, RLHF, GRPO, etc.) to boost model performance.
- Build, maintain, and extend evaluation pipelines for multi‑domain LLM metrics.
- Debug end‑to‑end ML stack issues, including data ingestion, training jobs, precision handling, and model outputs.
- Design, implement, and scale pipelines for pretraining, fine‑tuning, alignment, and reinforcement learning stages.
- Manage large datasets: generation, filtering, synthetic augmentation, and data pipeline optimization.
- Optimize training and inference workflows for speed, efficiency, and reliability on wafer‑scale hardware.
- Contribute high‑quality, maintainable code to shared infrastructure and open‑source projects where applicable.
Required Skills:
- Strong programming in Python with experience in PyTorch.
- Solid foundation in machine learning fundamentals and deep‑learning architectures (especially transformers).
- Ability to read, interpret, and implement concepts from contemporary ML research papers.
- Experience debugging and optimizing ML systems across data pipelines, training jobs, and model inference.
- Familiarity with distributed training frameworks (e.g., FSDP, Megatron).
- Experience working with large‑scale datasets and data pipelines.
- Knowledge of reinforcement learning concepts and post‑training methods.
Required Education & Certifications:
- Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
- 0–5 years of experience (internships, research, or industry) in machine learning systems.