- Company Name
- Argus Media
- Job Title
- Data Scientist (GenAI)
- Job Description
-
**Job Title:** Data Scientist (Generative AI)
**Role Summary:**
Design, build, and maintain AI‑ready datasets and generative AI pipelines that support large language models. Drive end‑to‑end data processing, prompt engineering, evaluation, and retrieval‑augmented generation for scalable GenAI applications. Collaborate across data science, engineering, and product teams to ensure high‑quality, production‑ready solutions.
**Expectations:**
- Deliver robust, scalable data pipelines and GenAI systems that meet performance and data‑quality standards.
- Apply state‑of‑the‑art algorithms, prompt‑engineering techniques, and LLM‑evaluation frameworks to optimize model outputs.
- Work effectively in a global, cross‑functional environment, providing clear documentation and maintaining code quality.
**Key Responsibilities:**
1. Design, develop, and maintain high‑quality AI‑ready datasets, ensuring integrity, usability, and scalability.
2. Perform data extraction, cleansing, and curation for diverse text and numeric sources; enrich metadata to enhance accessibility.
3. Implement and optimize algorithms and pipelines for feature engineering, data transformation, and model support (LLM, GenAI).
4. Build modular, scalable GenAI pipelines using LangChain, Hugging Face Transformers, and embedding models.
5. Apply advanced prompt‑engineering techniques to improve LLM performance for extraction, summarization, and generation tasks.
6. Conduct systematic evaluation of LLM outputs, using LLM‑as‑a‑judge and other metrics to assess quality, relevance, and alignment.
7. Develop and refine Retrieval‑Augmented Generation (RAG) systems, integrating vector databases, embedding models, and knowledge graphs.
8. Collaborate with global data science, engineering, and product stakeholders to integrate data solutions into broader initiatives.
9. Troubleshoot data‑related issues, ensuring rapid resolution and minimal operational impact.
10. Produce clean, well‑documented production‑grade code, adhering to version control and software‑engineering best practices.
**Required Skills:**
- **Programming & Libraries:** Python, TensorFlow or PyTorch, NLP libraries (spaCy, Hugging Face).
- **Generative AI Tools:** LangChain, Hugging Face Transformers, embedding models.
- **Prompt Engineering:** Prompt tuning, chaining, optimization.
- **LLM Evaluation:** Experience with LLM‑as‑a‑judge, output quality analysis.
- **RAG & Retrieval:** Vector‑database operations, RAG architecture.
- **Cloud & Deployment:** AWS, Google Cloud, Azure, Docker containerization.
- **Data Engineering:** Extraction, curation, metadata enrichment, AI‑ready dataset creation.
- **Soft Skills:** Strong communication, collaborative mindset, cross‑functional coordination.
**Required Education & Certifications:**
- Advanced degree (MSc or PhD) in Artificial Intelligence, Computer Science, Statistics, Mathematics, or related field.
- Relevant certifications (e.g., TensorFlow Developer, AWS Certified Machine Learning) are a plus but not mandatory.
---