- Company Name
- Qualitest acq
- Job Title
- AI Data Scientist
- Job Description
-
Job title: AI Data Scientist
Role Summary: Lead end‑to‑end AI‑driven document understanding for genealogical/historical datasets, designing autonomous multi‑agent workflows, optimizing multi‑modal models, and ensuring high‑quality, bias‑aware outputs.
Expectations: Deliver scalable, production‑ready solutions on cloud platforms, conduct rigorous evaluation, communicate concepts to diverse stakeholders, and continuously improve model performance.
Key Responsibilities:
- Implement OCR/HTR, NER, relation extraction, coreference resolution, summarization, and knowledge graph construction on complex historical documents.
- Design and maintain multi‑agent pipelines (LangChain, LangGraph, CrewAI, AutoGen, etc.) for structured extraction and reasoning.
- Evaluate and benchmark multi‑modal LLMs (Gemini, Claude, GPT, Qwen) in zero‑shot/few‑shot scenarios; apply optimization techniques (vLLM, LoRA, QLoRA, quantization).
- Apply CV models (YOLO, Nougat, DONUT, OpenCV) for layout analysis and content segmentation.
- Build ensemble “LLM‑as‑a‑Judge” frameworks, monitor hallucination, drift, bias using Arize Phoenix, DeepEval, RAGAS.
- Accelerate development with AI coding assistants (Amazon Q, Cursor, Claude Code, Kiro).
- Collaborate with ML Ops to deploy datasets, models and pipelines on AWS (S3, SageMaker, Bedrock, ECS, EKS) or GCP (Vertex AI, Gemini API).
- Present technical findings to non‑technical stakeholders and support cross‑functional teams.
Required Skills:
- Advanced knowledge of LLMs, transformer architectures, and multi‑modal modeling.
- Expertise in NLP pipelines (spaCy, NLTK, BERT, SpaCy, etc.) and knowledge graph tools (Neo4j).
- Experience with CV tasks for document analysis (YOLO, Nougat, DONUT, OpenCV).
- Proficiency in Python, PyTorch/TensorFlow, and AI libraries (transformers, LangChain, LangGraph).
- Familiarity with embedding generation, vector databases, and retrieval‑augmented generation.
- Demonstrated ability in inference optimization (vLLM, LoRA, QLoRA, model quantization).
- Cloud‑native deployment on AWS or GCP, including SageMaker, Bedrock, Vertex AI, or equivalent services.
- Strong communication skills for translating technical concepts to business stakeholders.
Required Education & Certifications:
- Master’s or PhD in Computer Science, Data Science, Statistics, Mathematics, Linguistics, Engineering, or related quantitative field.
- Certifications in cloud platforms (AWS Certified Machine Learning Specialty, GCP Professional Machine Learning Engineer) are a plus.