- Company Name
- ProSearch
- Job Title
- AI Data Engineer
- Job Description
-
**Job Title:** AI Data Engineer
**Role Summary:**
Design, develop, and maintain scalable, reliable data pipelines that feed AI/ML models with high‑quality, well‑structured datasets across discovery, imaging, clinical, and operational streams. Collaborate with data scientists, ML engineers, and product owners to deliver data assets that improve model performance and accelerate R&D.
**Expectations:**
- Remote work; full‑time.
- Minimum 4 years’ experience as a data engineer.
- Strong technical foundation, proactive problem‑solver, collaborative communicator.
**Key Responsibilities:**
- Build and optimize distributed data pipelines using Databricks, Spark, Delta Lake, Snowflake, and AWS (S3, Glue, EMR).
- Develop feature tables, analytical datasets, and automated workflows for machine‑learning ops and analytics workloads.
- Implement data transformations with dbt Core, ensuring documentation, testing, and semantic consistency.
- Enforce code quality with SQLFluff, YAML linters, and CI/CD (GitHub Actions).
- Resolve data quality issues (missing, duplicate, inconsistent data).
- Design and contribute to architecture patterns (data warehouse, lake, lakehouse, mesh).
- Build Python‑based pipelines to ingest structured tables, text, images, and other data types.
- Configure CI/CD and IaC (Terraform, CloudFormation) for pipeline deployment.
- Monitor, troubleshoot, and optimize data pipelines across the full lifecycle (exploration, production, recovery).
- Stay current on emerging data engineering and AI/ML practices, including Generative AI.
**Required Skills:**
- Advanced SQL (analytical and ETL).
- Python; R (beneficial).
- dbt Core – transformations, tests, documentation.
- Databricks, Spark, Delta Lake.
- Snowflake, AWS (S3, Glue, EMR).
- Git, Git-based workflows, GitHub Actions.
- Terraform or AWS CloudFormation.
- Docker, Kubernetes, AWS ECS (containerization).
- Big‑data processing, data modeling, schema design.
- Data quality engineering (validation, profiling).
- Familiarity with NoSQL databases, object storage.
**Required Education & Certifications:**
- Bachelor’s degree (or equivalent) in Computer Science, Data Engineering, Statistics, or related field.
- Professional certifications in data engineering, cloud platforms (AWS, Azure, GCP) or data arts (e.g., SnowPro, Databricks Delta Lake) are a plus.
---