Braintrust

www.usebraintrust.com

4 Jobs

338 Employees

About the Company

Braintrust is revolutionizing hiring with Braintrust AIR, the world's first and only end-to-end AI recruiting platform. Trained with human insights and proprietary data, Braintrust AIR reduces time to hire from months to days, instantly matching you with pre-vetted qualified candidates, and conducting the first round phone screen for you. Trusted by hundreds of Fortune 1000 enterprises including Nestlé, Porsche, Atlassian, Goldman Sachs, and Nike, Braintrust AIR is making talent acquisition professionals 100x more effective and saving companies hundreds of thousands of dollars in recruiting costs.

Listed Jobs

Company Name: Braintrust
Job Title: AI Evaluation Engineer
Job Description: **Job Title:** AI Evaluation Engineer **Role Summary:** Conduct advanced evaluation and annotation of large language models (LLMs) with a focus on code generation, debugging, and refactoring tasks. Design benchmark-style coding prompts, assess model outputs against rigorous criteria, document failure modes, and contribute to reinforcement learning (RL) workflows to enhance model reliability and reasoning. **Expectations:** - Deliver high‑quality, reproducible evaluation artifacts for internal and external LLMs. - Maintain consistency and thoroughness in annotation according to detailed guidelines. - Provide actionable feedback to model training and RL teams to drive continuous improvement. - Act as a technical senior, guiding junior evaluators and shaping evaluation standards. **Key Responsibilities:** - Create benchmark‑style coding prompts and reference solutions (e.g., SWE‑Bench‑like problems). - Evaluate LLM outputs on code generation, refactoring, debugging, and implementation tasks. - Identify, document, and analyze model failures, edge cases, and reasoning gaps. - Conduct head‑to‑head evaluations between proprietary LLMs (e.g., Mistral‑based) and leading external models. - Build and configure coding environments that support evaluation and RL pipelines. - Apply structured evaluation criteria and produce clear, concise technical feedback. - Collaborate with model developers to iterate on prompts and metrics based on evaluation findings. - Mentor and lead junior annotators, ensuring adherence to evaluation guidelines. **Required Skills:** - 5+ years professional software development experience. - Strong Python programming; proficiency in at least one additional language (Java, C++, etc.) is a plus. - 1+ year experience in coding annotation or LLM evaluation for a major AI lab or infrastructure company (part‑time acceptable). - Prior code‑reviewing experience and ability to produce structured evaluation reports. - Fluent in English (written and spoken). - Demonstrated ability to write clear, technical feedback and apply rigorous evaluation criteria. - Team‑lead or mentoring experience considered a strong plus. **Required Education & Certifications:** - Bachelor’s degree (or equivalent) in Computer Science, Software Engineering, or related field. - No specific certifications required.

Paris, France

Hybrid

15-01-2026

Company Name: Braintrust
Job Title: Senior AI Training Engineer - Python
Job Description: **Senior AI Training Engineer** **Role Summary:** Lead evaluation and improvement of state-of-the-art LLMs through rigorous coding task design, model output assessment, and reinforcement learning integration. Focus on enhancing model reliability, reasoning, and code quality through applied engineering and evaluation. **Expectations:** 6-month contract with potential extension. **Key Responsibilities:** - Design high-quality coding benchmarks and reference answers (e.g., SWE-Bench-style tasks). - Evaluate LLM outputs for code generation, refactoring, debugging, and implementation. - Analyze model failures, edge cases, and reasoning gaps. - Conduct comparative evaluations between internal (Mistral-based) and external LLMs. - Configure coding environments for evaluation and reinforcement learning workflows. - Follow structured annotation/evaluation guidelines with consistent accuracy. **Required Skills:** - 5+ years professional software development experience. - Strong Python proficiency (required); knowledge of additional programming languages (bonus). - Minimum 1 year of LLM evaluation or coding annotation experience (part-time acceptable) in frontier AI or infrastructure companies. - Proven ability to apply structured evaluation criteria and provide lucid technical feedback. - English fluency (written/verbal). **Required Education & Certifications:** Not specified.

Company Name: Braintrust
Job Title: Senior Coding Annotator / LLM Evaluation Engineer (Contract)
Job Description: Senior Coding Annotator / LLM Evaluation Engineer **Role Summary** Senior role at intersection of software engineering, LLM evaluation, and applied AI. Responsible for designing coding benchmarks, evaluating model outputs, identifying failures, and supporting model improvement workflows. **Expactations** - 5+ years professional software development experience - 1+ year coding annotation/LLM evaluation experience, preferably in frontier AI labs/infrastructure companies - Proven ability to apply structured evaluation criteria and provide clear technical feedback - Team lead or mentoring experience (preferred) - Fluent in English (written/spoken) **Key Responsibilities** - Design and develop high-quality coding prompts/reference answers (e.g., SWE-Bench format) - Evaluate LLM code generation, refactoring, debugging, and implementation against benchmarks - Analyze model failures, edge cases, and reasoning gaps - Conduct head-to-head evaluations of private (Mistral-based) and external models - Configure coding environments for model evaluation and reinforcement learning workflows - Maintain consistency in annotation/evaluation using detailed guidelines **Required Skills** - Strong Python programming (required) - Knowledge of at least one additional programming language (optional) - Experience building/configuring coding environments for evaluation tasks - Prior code review or annotation experience (optional) **Required Education & Certifications** - Bachelor’s or Master’s degree in Computer Science or related field - No specific certifications required

Company Name: Braintrust
Job Title: Writer / AI Annotator / (Remote- freelance, 100+ openings)
Job Description: Job title: Multimodal GenAI Evaluation Analyst Role Summary: Perform detailed evaluation of AI outputs across text, image, video, and multimodal prompts. Assess correctness, coherence, completeness, style, cultural appropriateness, safety, and bias against complex guidelines to support fine‑tuning of large language, vision, and multimodal models. Expactations: Deliver consistent, high‑quality annotations, meet turnaround benchmarks, and continuously refine evaluation criteria in collaboration with project managers and quality leads. Key Responsibilities: - Review and score AI‑generated content in multiple modalities (text, image captions, video descriptions, multimodal prompts). - Identify errors, hallucinations, biases, and cultural misalignments. - Provide clear, actionable written feedback, tags, and scores for each output. - Escalate ambiguous cases and assist in updating evaluation guidelines. - Collaborate with cross‑functional teams to maintain accuracy, reliability, and productivity targets. - Adapt to evolving workflows, shifting project requirements, and rapid feedback cycles. Required Skills: - Strong critical reading, observational, and evaluative judgment across modalities. - Clear articulation of nuanced assessments. - Proficiency in English (CEFR B2 or above); additional language skills are a plus. - Familiarity with large language models, generative AI, and multimodal AI systems. - Experience with data annotation tools and quality management platforms. - Knowledge of cultural and linguistic nuances, bias mitigation, and safety considerations. - Detail‑oriented with consistent guideline adherence. - Ability to work independently and in fast‑paced environments. Required Education & Certifications: - Bachelor's degree or equivalent in Computer Science, Linguistics, Data Science, or related field. - Minimum of 1 year experience in data annotation, AI/ML evaluation, content moderation, or related domains. - Exposure to LLM training data annotation, prompt engineering, or fine‑tuning workflows is advantageous.