- Company Name
- Skills Alliance
- Job Title
- Senior Machine Learning Engineer
- Job Description
-
Job title: Senior Machine Learning Engineer
**Role Summary**
Build, optimize, and deploy scalable machine‑learning pipelines for foundation models applied to large‑scale biological datasets. Design modular, production‑ready systems that run efficiently on distributed cloud and GPU clusters while collaborating closely with researchers and product teams.
**Expactations**
- Demonstrated ownership and low‑ego mindset in fast‑moving engineering environments.
- Ability to translate research‑level code into reliable, production‑ready systems.
- Strong collaboration skills with researchers and product engineers to ship continuous iterations quickly.
**Key Responsibilities**
- Design and implement large‑scale training and inference pipelines for modern architectures (Transformers, SSMs, diffusion models).
- Optimize throughput, latency, and resource utilization across distributed GPU clusters and cloud environments.
- Develop reusable, modular ML components that can be shared across teams.
- Convert prototype research code into robust production systems.
- Integrate MLOps tooling (Weights & Biases, Ray, Docker, etc.) and maintain CI/CD pipelines.
- Monitor, debug, and improve ML workflows in production.
- Document system architecture, best practices, and operational procedures.
**Required Skills**
- Strong Python programming and software engineering fundamentals.
- Deep experience with PyTorch, JAX, or TensorFlow.
- Proven track record of scaling ML workflows on cloud, GPU clusters, or distributed training systems.
- Familiarity with modern model architectures (Transformers, SSMs, diffusion‑style models).
- Proficiency with MLOps tools: Weights & Biases, Ray, Docker, Kubernetes, CI/CD.
- Solid understanding of distributed systems, networking, and performance tuning.
- Excellent problem‑solving, ownership, and communication skills.
**Required Education & Certifications**
- Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, Data Science, or a related field.
- Optional certifications in cloud platforms (AWS, GCP, Azure) or ML frameworks.