- Company Name
- Braintrust
- Job Title
- Writer / AI Annotator / (Remote- freelance, 100+ openings)
- Job Description
-
Job title: Multimodal GenAI Evaluation Analyst
Role Summary: Perform detailed evaluation of AI outputs across text, image, video, and multimodal prompts. Assess correctness, coherence, completeness, style, cultural appropriateness, safety, and bias against complex guidelines to support fine‑tuning of large language, vision, and multimodal models.
Expactations: Deliver consistent, high‑quality annotations, meet turnaround benchmarks, and continuously refine evaluation criteria in collaboration with project managers and quality leads.
Key Responsibilities:
- Review and score AI‑generated content in multiple modalities (text, image captions, video descriptions, multimodal prompts).
- Identify errors, hallucinations, biases, and cultural misalignments.
- Provide clear, actionable written feedback, tags, and scores for each output.
- Escalate ambiguous cases and assist in updating evaluation guidelines.
- Collaborate with cross‑functional teams to maintain accuracy, reliability, and productivity targets.
- Adapt to evolving workflows, shifting project requirements, and rapid feedback cycles.
Required Skills:
- Strong critical reading, observational, and evaluative judgment across modalities.
- Clear articulation of nuanced assessments.
- Proficiency in English (CEFR B2 or above); additional language skills are a plus.
- Familiarity with large language models, generative AI, and multimodal AI systems.
- Experience with data annotation tools and quality management platforms.
- Knowledge of cultural and linguistic nuances, bias mitigation, and safety considerations.
- Detail‑oriented with consistent guideline adherence.
- Ability to work independently and in fast‑paced environments.
Required Education & Certifications:
- Bachelor's degree or equivalent in Computer Science, Linguistics, Data Science, or related field.
- Minimum of 1 year experience in data annotation, AI/ML evaluation, content moderation, or related domains.
- Exposure to LLM training data annotation, prompt engineering, or fine‑tuning workflows is advantageous.