- Company Name
- HyperFi
- Job Title
- AI Systems & Data Engineer
- Job Description
-
Job Title: AI Systems & Data Engineer
Role Summary: Design, build, and maintain production‑grade data pipelines and AI agent pipelines in Databricks, leveraging unstructured data to implement retrieval‑augmented generation (RAG) and LangChain‑based agents that feed into production services.
Expectations: 5–7 years of experience delivering ML/DL or data engineering solutions at scale; strong foundation in prompt engineering, context construction, and retrieval design; proven ability to translate prototypes into robust production systems; fluent in English and comfortable collaborating cross‑functionally.
Key Responsibilities:
- Architect and implement data ingestion pipelines in Databricks (Auto Loader, Delta Lake, Delta Live Tables) for unstructured data.
- Build RAG systems from scratch, integrating storage, indexing, and retrieval components.
- Develop and maintain agentic LLM pipelines using LangChain, LangGraph, and LangSmith, including routing and orchestration.
- Orchestrate PySpark and Databricks workflows to prepare inputs, track AI model outputs, and manage data lineage.
- Instrument evaluation metrics, telemetry, and observability to refine prompt strategies and model performance.
- Integrate AI components with product, frontend, and backend teams for seamless end‑to‑end user flows.
- Apply Databricks Unity Catalog for data governance and implement security controls via the Databricks AI Security Framework.
Required Skills:
- Python core programming with emphasis on testable, maintainable code.
- Databricks workspace, PySpark, Delta Lake, Delta Live Tables, and Databricks Workflows.
- LangChain, LangGraph, LangSmith for LLM orchestration.
- Knowledge of RAG architecture, retrieval mechanisms, vector search, and semantic chunking.
- Prompt engineering, model evaluation, cross‑model routing (e.g., Gemini), and feedback loops.
- Data lakehouse concepts, Unity Catalog governance, and DASF security practices.
- Cloud platform (GCP), GitHub Actions CI/CD, and PostgreSQL database skills.
Required Education & Certifications:
- Bachelor’s or higher degree in Computer Science, Data Engineering, or related field (or equivalent work experience).
San francisco, United states
Hybrid
Mid level
19-10-2025