- Company Name
- aion
- Job Title
- Machine Learning Engineer, Platform
- Job Description
-
Job title: Machine Learning Engineer, Platform
Role Summary: Develop and optimize large language models (LLMs) and transformer-based systems on an AI cloud platform. Own end‑to‑end LLMOps pipelines from data preparation to deployment, ensuring high model quality and production reliability.
Expectations: • 4–6 years in applied ML or ML engineering. • Proven ability to fine‑tune modern LLMs (Llama, Mistral, Gemma, etc.) using full fine‑tuning and PEFT (LoRA, QLoRA). • Hands‑on experience with RLHF, experiment automation, and model quality monitoring.
Key Responsibilities: • Design, implement, and maintain end‑to‑end LLMOps pipelines for training, fine‑tuning, and evaluation. • Prepare, sanitize, and label training datasets; build synthetic data generation processes. • Develop custom evaluation metrics (BLEU, ROUGE, perplexity, accuracy) and automated hyper‑parameter tuning. • Deploy models with vLLM, manage multi‑adapter LoRA serving, hot‑swapping, and inference optimizations. • Set up monitoring for model drift, performance, and retraining triggers; manage versioning and artifact lineage. • Troubleshoot production issues, balance cost and performance, and document best practices.
Required Skills: • Deep knowledge of transformer architectures (attention, MoE, Grouped‑Query Attention, Flash Attention). • Strong Python skills and experience with PyTorch, HuggingFace Transformers, PEFT, TRL, Unsloth, Axolotl. • Expertise in fine‑tuning, PEFT, RLHF pipelines, and prompt engineering. • Familiarity with GPU training (multi‑GPU, mixed precision, memory management). • Experience building evaluation pipelines and integrating custom metrics. • Basic deployment skills with vLLM and inference optimization techniques.
Required Education & Certifications: • Bachelor’s or Master’s degree in Computer Science, Machine Learning, or related field. • Industry‑accepted certifications (e.g., AWS Certified Machine Learning – Specialty, GCP Professional Data Engineer) are a plus but not mandatory.