Job Specifications
About The Company
Red Hat is the world's leading provider of enterprise open source software solutions, renowned for its community-powered approach to delivering high-performing Linux, cloud, container, and Kubernetes technologies. Operating across more than 40 countries, Red Hat fosters an inclusive and innovative environment where associates are encouraged to contribute their ideas, solve complex problems, and make a significant impact in the technology landscape. Committed to open source principles, Red Hat supports a diverse workforce and emphasizes collaboration, transparency, and continuous learning to drive technological advancement and customer success.
About The Role
Red Hat's OpenShift AI team is seeking a Principal Machine Learning Engineer dedicated to advancing the safety, reliability, and ethical alignment of large language models (LLMs) and AI agents. In this strategic role, you will lead the development of large-scale evaluation platforms that enable automated, reproducible, and extensible assessment of AI systems across various domains. Your expertise will be instrumental in defining evaluation standards, designing innovative pipelines, and integrating safety metrics into the lifecycle of AI development--from training and fine-tuning to deployment and monitoring. You will collaborate closely with cross-functional teams, including safety, research, product, and infrastructure, to translate complex evaluation goals into practical, system-level frameworks, ensuring AI systems are trustworthy and aligned with human values. Additionally, you will mentor engineering teams, influence industry standards, and contribute to open-source projects that democratize trustworthy AI infrastructure, shaping the future of responsible AI innovation.
Qualifications
10+ years of experience in machine learning engineering, with at least 3 years focused on large-scale evaluation of transformer-based LLMs and/or agentic systems.
Proven experience in building evaluation platforms or frameworks that operate across training, deployment, and post-deployment environments.
Deep expertise in designing and implementing evaluation metrics such as factuality, hallucination detection, grounding, toxicity, and robustness.
Strong background in scalable platform engineering, including development of APIs, pipelines, and integrations used by multiple product teams.
Demonstrated ability to operationalize safety and alignment techniques into production evaluation systems.
Proficiency in Python, PyTorch, Hugging Face, and modern ML operations and deployment environments.
Experience in technical leadership, mentoring, architecture design, and establishing organization-wide best practices.
Advanced degree in Machine Learning, Computer Science, or related fields with a focus on evaluation, safety, or interpretability is preferred.
Responsibilities
Architect and lead the development of large-scale evaluation platforms for LLMs and AI agents, enabling comprehensive assessment of accuracy, safety, and performance.
Define organizational standards and metrics for evaluation, including hallucination detection, factuality, bias, robustness, interpretability, and alignment drift.
Develop platform components and APIs that facilitate seamless integration of evaluation processes into training, fine-tuning, deployment, and continuous monitoring workflows.
Design automated pipelines and benchmarks for adversarial testing, red-teaming, and stress testing of LLMs and retrieval-augmented generation (RAG) systems.
Lead initiatives in multi-dimensional evaluation, focusing on safety, grounding, and agent behavior metrics.
Collaborate with cross-functional stakeholders to translate abstract evaluation goals into practical, system-level frameworks.
Advance interpretability and observability tools to enable teams to understand, debug, and explain LLM behaviors in production environments.
Mentor engineers, promote best practices, and drive the adoption of evaluation-driven development methodologies.
Represent the team's evaluation-first approach in external forums, publications, and industry conferences, influencing the future direction of AI safety and evaluation standards.
Benefits
Comprehensive medical, dental, and vision coverage.
Flexible Spending Account (FSA) for healthcare and dependent care expenses.
Health Savings Account (HSA) for high deductible medical plans.
Retirement plan with employer matching contributions.
Paid time off, holidays, and paid parental leave for new parents.
Leave benefits including disability, paid family medical leave, and military leave.
Additional perks such as employee stock purchase plans, family planning reimbursement, tuition reimbursement, transportation expense accounts, employee assistance programs, and more.
Equal Opportunity
Red Hat is an equal opportunity employer committed to creating a diverse and inclusive workplace. We do not discriminate based on race, color, religion, national origin, gender, ge