- Company Name
- Tubi
- Job Title
- Principal, ML Infrastructure Engineer
- Job Description
-
**Job title:** Principal, ML Infrastructure Engineer
**Role Summary**
Lead the design, architecture, and evolution of a high‑performance machine learning infrastructure platform. Drive technical strategy, mentor senior engineers, and collaborate with data science, ML engineering, and product teams to deliver scalable, reliable, and secure ML services that support millions of users.
**Expectations**
- Deliver technical leadership and vision for ML infrastructure.
- Maintain a 6‑12 month roadmap that aligns with organizational goals and industry trends.
- Mentor and empower engineers to adopt best practices and innovative solutions.
**Key Responsibilities**
- Define and champion long‑term ML infrastructure strategy and technology choices.
- Develop, communicate, and execute a 6‑12 month roadmap for the ML Infrastructure team.
- Lead architecture and design of complex, scalable, and secure ML systems.
- Design and build distributed, high‑throughput, low‑latency solutions using Scala and related technologies (distributed databases, message queues, stream processing).
- Drive performance optimization, scalability improvements, and infrastructure efficiency.
- Enforce engineering standards: code quality, testing, documentation, and security.
- Resolve critical technical challenges, debug, and troubleshoot system performance issues.
- Lead end‑to‑end delivery of ML infrastructure projects, managing scope, timelines, and dependencies.
- Mentor engineers on project management, architecture, and technical best practices.
- Partner with data scientists, ML engineers, and product managers to translate requirements into infrastructure solutions.
- Communicate progress, risk, and technical concepts to stakeholders, including senior leadership.
- Promote knowledge sharing through documentation, talks, and mentorship initiatives.
**Required Skills**
- 10+ years in software engineering with extensive experience building large‑scale distributed systems.
- Expertise in architecture, design, and implementation of scalable, high‑performance distributed solutions.
- Signficant background in ML infrastructure (platforms, pipelines, model serving).
- Proficient in Scala, Java, and Python.
- Deep understanding of databases, caching technologies, message brokers, and stream processing frameworks.
- Extensive experience with cloud platforms (AWS preferred).
- Proven ability to lead technical teams, mentor engineers, and influence cross‑functional collaboration.
- Strong troubleshooting, performance tuning, and debugging skills.
- Excellent written and verbal communication and stakeholder engagement.
**Required Education & Certifications**
- Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, or related field.
- Any relevant cloud or architecture certifications (e.g., AWS Certified Solutions Architect) are a plus.