- Company Name
- Yousign
- Job Title
- AI Productivity Engineer Intern
- Job Description
-
**Job title:** AI Productivity Engineer Intern
**Role Summary:**
Partner with the AI Productivity Lead to design, implement, and launch an end‑to‑end evaluation pipeline for AI productivity agents (Dust). The internship spans six months and focuses on defining quality criteria, curating benchmark datasets, building monitoring dashboards, and automating weekly evaluation runs to enhance agent reliability, safety, cost, and latency metrics.
**Expectations:**
- Deliver a repeatable, automated evaluation framework that integrates tightly with existing CI/CD workflows.
- Establish clear criteria and metrics for evaluating agent performance, including accuracy, relevance, safety, cost, and latency.
- Produce actionable dashboards and alerting mechanisms to detect regressions and track SLO compliance.
- Transition the pipeline and documentation to enable ongoing production maintenance.
**Key Responsibilities:**
1. Research and benchmark evaluation methods (LLM‑as‑Judge, golden sets, pairwise, human‑in‑the‑loop) and tools (Dust, LangSmith, OpenAI Evals, custom platforms).
2. Define quality parameters for different agent types.
3. Design a reproducible evaluation architecture covering datasets, orchestration, run storage, traceability, dashboard schematics, and alerts.
4. Spec and implement the evaluation platform using Python/TypeScript, integrating metrics extraction and LLM‑as‑Judge scoring.
5. Automate weekly evaluation campaigns, versioned results, and CI/CD integration.
6. Deploy the pipeline, dashboards, and alerts in production, and hand off operation responsibilities.
7. Communicate findings, train peers, and maintain documentation.
**Required Skills:**
*Must‑have*
- Strong programming in Python or TypeScript with metric calculation experience.
- Rigorous data handling: dataset preparation, annotation, reproducibility, and versioning.
- Proficiency in English (technical documentation, tool usage).
- Interest in generative AI and agent quality assessment.
*Nice‑to‑have*
- Experience with large‑language models (Claude, GPT, etc.) and automated evaluation pipelines.
- Familiarity with evaluation tools (LangSmith, OpenAI Evals, Langfuse) or observability platforms.
- Knowledge of CI/CD, dashboard creation, and alerting frameworks.
**Required Education & Certifications:**
- Current final‑year student at a French engineering school (or equivalent) seeking a capstone internship.
- No specific certifications required, but coursework in AI, machine learning, or data science is advantageous.
---