- Company Name
- HyperFi
- Job Title
- AI Systems & Data Engineer
- Job Description
-
**Job Title**
AI Systems & Data Engineer
**Role Summary**
Design, build, and operate production-scale data pipelines in Databricks to ingest, normalize, and transform large volumes of unstructured data into AI‑ready lakehouse tables. Orchestrate temporal workflows, enforce data governance, and integrate pipeline outputs into AI systems and APIs while optimizing Spark job performance and cost.
**Expectations**
- 5–7 years of experience developing ML, data, or AI production systems.
- Deep expertise in Python, PySpark, and Databricks (Auto Loader, Delta Live Tables, Workflows).
- Strong background in prompt engineering, context construction, and retrieval design (LangChain, LangGraph, LangSmith).
- Proficiency with Unity Catalog, Delta Lake, and Databricks Data Intelligence Platform for data governance and security.
- Ability to write testable, maintainable code and implement observability, model evaluation, and feedback loops.
- Excellent written and spoken English for cross‑team collaboration.
**Key Responsibilities**
1. Build and maintain Databricks pipelines in Python for batch and streaming ingestion of unstructured data.
2. Use Auto Loader, Delta Live Tables, and Workflows to create robust data pipelines.
3. Model and maintain AI‑ready lakehouse tables with Delta Lake and Unity Catalog.
4. Prepare retrieval and context datasets for Retrieval‑Augmented Generation (RAG) and agent systems.
5. Orchestrate time‑based workflows to coordinate data prep, validation, and AI handoff.
6. Enforce data quality, lineage, and access controls across all pipelines.
7. Optimize PySpark jobs for performance, reliability, and cost efficiency.
8. Integrate pipeline outputs into production AI systems and APIs.
9. Continuously monitor data freshness, schema drift, and pipeline health, and implement corrective actions.
**Required Skills**
- Python (primary language for all LLM and orchestration work)
- Databricks, PySpark, Delta Lake, Delta Live Tables, Auto Loader, Workflows
- LangChain, LangGraph, LangSmith (agent development)
- Unity Catalog, Databricks AI Security Framework (DASF)
- Prompt engineering, retrieval design, semantic chunking, vector search
- Data quality, lineage, and governance concepts
- Model evaluation, observability, and feedback loop implementation
- GitHub Actions, GCP cloud services
- Strong software engineering practices (testable, maintainable code)
- Excellent English communication
**Required Education & Certifications**
- Bachelor’s or Master’s degree in Computer Science, Engineering, Data Science, or a related field.
- Databricks Professional Program or equivalent Spark/Cloud certifications are a plus.
San francisco, United states
On site
Mid level
21-01-2026