- Company Name
- Hispanic Alliance for Career Enhancement
- Job Title
- Medicare Stars DI Data Scientist
- Job Description
-
Job Title: Medicare Stars DI Data Scientist
Role Summary:
Data Scientist responsible for designing, building, and maintaining unified data pipelines and agentic document analytics workflows within the Medicare Stars data ecosystem. Lead the integration of Snowflake, large language models, and retrieval‑augmented generation for advanced analytics and natural language query interfaces.
Expectations:
- Deliver production‑ready analytical solutions that meet compliance and governance standards.
- Collaborate cross‑functionally with product, ML/AI, BI, and engineering teams.
- Maintain clear documentation, monitoring, and cost controls for data and LLM services.
Key Responsibilities:
- Design and implement Snowflake pipelines combining structured, semi‑structured, and document data (JSON, Parquet, PDF, DOCX).
- Build large‑scale document ingestion, extraction, cleaning, chunking, embeddings, and vector store workflows.
- Create NLP‑enabled query interfaces translating user prompts into analytic queries or retrieval flows with explainable results.
- Integrate Snowflake with LLMs (OpenAI or equivalent) via External Functions, Snowpark, and secure API patterns for summarization, QA, classification, and code‑generation.
- Develop retrieval‑augmented generation (RAG) architectures using Snowflake‑stored embeddings and vector indexes.
- Author Snowpark/Python/SQL transformations, Streams & Tasks, and orchestrate near‑real‑time and batch workloads.
- Implement Snowflake data modeling, governance, role‑based access, masking, lineage, and metadata for analytics compliance.
- Build monitoring, observability, and cost‑management dashboards for compute, storage, and API usage.
- Produce technical documentation, runbooks, and stakeholder‑friendly explanations of models/LLMs.
Required Skills:
- Statistical analysis in R, SAS, or Python.
- Data visualization with Tableau, Power BI, or D3.js.
- SQL and NoSQL database knowledge; strong Python programming; proficiency in Java or C+.
- Expertise in unified data pipelines, agentic document analytics, NLQ interfaces, Snowflake‑LLM integration, RAG, Snowpark transformations, data governance, monitoring, and documentation.
- Critical thinking, quantitative analysis, and attention to detail.
- Excellent communication, teamwork, and project leadership.
Required Education & Certifications:
- Bachelor’s degree in Statistics, Mathematics, Computer Science, or Engineering (Master’s/Ph.D. preferred).
- Professional certifications in data science, machine learning, or business analytics are advantageous.
Washington dc, United states
On site
30-12-2025