- Company Name
- Okara
- Job Title
- Data Scientist
- Job Description
-
**Job Title**
Data Scientist – Deep Learning / Generative AI
**Role Summary**
Work as part of a 20‑person AI team focused on turning retail data into actionable insights. Deliver end‑to‑end solutions, from research‑grade model design to production deployment, for dynamic pricing, in‑store assistance, and long‑term vision projects. Two specialist tracks: (1) Deep Learning Modeling and (2) Generative AI / LLM Engineering.
**Expectations**
- Demonstrate deep expertise in advanced machine learning and applied data science.
- Own projects from problem analysis to production-ready pipelines.
- Collaborate closely with business stakeholders, ML engineers, and data engineers.
- Show a portfolio of real, self‑written code (GitHub or similar).
- Contribute to shared knowledge, code reviews, and continuous improvement.
**Key Responsibilities**
1. **Model Development** – Design, train, evaluate, and fine‑tune complex neural networks (CNNs, RNNs, transformers) using TensorFlow, Keras, or PyTorch.
2. **Generative AI Prototyping** – Prototype and iterate LLM‑based applications with LangChain, LangGraph, HuggingFace, and OpenAI APIs.
3. **Data & Feature Engineering** – Prepare high‑quality datasets, engineer features, and implement data pipelines with Pandas, NumPy, and similar tools.
4. **Deployment & Scaling** – Build reproducible, production‑ready pipelines (ZenML, Docker, CI/CD) and integrate models into internal tools.
5. **Experimentation & QA** – Conduct rigorous experiments, perform bias & fairness audits, and document results for stakeholders.
6. **Cross‑Functional Collaboration** – Translate business requirements into technical solutions; provide clear explanations of model assumptions and limitations.
**Required Skills**
- **Programming:** Advanced Python, familiarity with version control (Git).
- **Machine Learning:** TensorFlow, Keras, PyTorch, scikit‑learn, HuggingFace Transformers.
- **Generative AI:** LangChain/LangGraph, OpenAI API, GenAI concepts.
- **Data Engineering:** Pandas, NumPy, SQL, ETL fundamentals.
- **Operations:** Jenkins/GitHub Actions, Docker, ZenML, CI/CD pipelines.
- **Statistical Foundations:** Probability, hypothesis testing, experimental design.
- **Communication:** Ability to explain complex models to non‑technical audiences.
- **Autonomy & Collaboration:** Self‑directed, yet comfortable working in paired or cross‑functional teams.
**Required Education & Certifications**
- Bachelor’s (or higher) in Computer Science, Statistics, Mathematics, Data Science, or related field.
- Demonstrated professional experience (typically 3+ years) in applied ML or AI development.
- Relevant certifications (e.g., TensorFlow Developer, AWS Certified Machine Learning, or equivalent) are advantageous but not mandatory.
---