- Company Name
- United Software Group Inc
- Job Title
- ML Platform Engineer with Strong Python
- Job Description
-
Job Title: ML Platform Engineer – Python & Feature Engineering
Role Summary:
Design, build, and maintain production‑grade feature pipelines for offline and real‑time ML workloads, ensuring training‑serving consistency and low‑latency feature retrieval for model scoring.
Expectations:
- 8+ years of hands‑on experience in ML feature engineering, feature stores, and data quality in cloud environments.
- 4+ years of explicit use of GenAI engineering tools (Cursor, Windsurf, GitHub Copilot, Databricks Assistant) for pipeline development, validation, automation, and documentation.
- 5+ years delivering tested, reliable, well‑documented feature pipelines that integrate seamlessly with downstream ML and decisioning systems.
Key Responsibilities:
- Architect and implement offline and online feature pipelines using Databricks, Spark/PySpark, and Delta Lake.
- Develop and maintain a feature store (Databricks FS, Feast, or equivalent) that supports both batch and streaming data.
- Ensure training‑serving alignment, managing feature versioning, lineage, and data quality.
- Build low‑latency retrieval layers for real‑time model scoring on streaming platforms (Kafka, Azure EventHub, etc.).
- Collaborate with data scientists, ML engineers, and Ops to support end‑to‑end ML workflows, including data preparation, model training, deployment, and monitoring.
- Automate pipeline workflows and documentation leveraging GenAI tools.
- Monitor performance, troubleshoot, and optimize feature pipelines for scalability and reliability.
Required Skills:
- Python programming with Spark/PySpark.
- Databricks, Delta Lake, and feature store technologies (Databricks FS, Feast).
- Streaming platforms: Kafka, EventHub, or similar.
- Cloud platforms (Azure, AWS, GCP).
- Data quality and governance best practices.
- GenAI engineering tools: Cursor, Windsurf, GitHub Copilot, Databricks Assistant.
- Strong understanding of ML lifecycle and training‑serving consistency.
- Experience with CI/CD for data pipelines.
Required Education & Certifications:
- Bachelor's degree (or higher) in Computer Science, Data Engineering, or related field.
- Certifications such as Databricks Certified Data Engineer Associate or equivalent Spark/Cloud credentials are preferred.