cover image
ProSearch

AI Data Engineer

Remote

United states

Junior

Freelance

17-11-2025

Share this job:

Skills

Python Data Analysis SQL NoSQL Big Data Data Governance Data Engineering GitHub CI/CD Docker Kubernetes Monitoring AWS CloudFormation Test Research Architecture Data Architecture Machine Learning PyTorch TensorFlow Programming Databases git Organization Azure AWS cloud platforms Analytics Snowflake Data Science Hadoop Spark Databricks PySpark Terraform GitHub Actions

Job Specifications

We have partnered with a leading technology research organization to hire an AI Data Engineer. In this role, you will build scalable data pipelines, partner closely with Data Scientists and ML Engineers, and ensure the organization’s AI/ML models are fueled by high-quality, well-structured data. This is a fully remote opportunity to contribute to impactful AI initiatives that support scientific innovation, clinical solutions, and operational excellence.

About the Role

As an AI Data Engineer, you will support data science model validation, analytics workloads, and machine learning operations by building high-quality feature tables, analytical datasets, and automated workflows. You’ll collaborate with senior data staff, product owners, and AI/ML scientists to deliver reliable data assets that enhance model performance and accelerate R&D innovation.

You will work across core data streams, including discovery, imaging, clinical, and operational, and contribute to the pipelines that power next-generation AI products in veterinary and animal health.

Top Required Skills

SQL (advanced)
Python
R

Nice-to-Have Skills

dbt Core
Databricks
Data analysis experience

Technology Stack

Python • Databricks • dbt Core • Hadoop • TensorFlow • PyTorch • PySpark • Snowflake • AWS

What You’ll Do

Build scalable, reliable, distributed data pipelines to support machine learning operations and analytics workloads.
Partner with data scientists, ML engineers, analysts, and data product owners to understand requirements and deliver high-quality solutions.
Work with modern cloud and ML stacks, including Databricks, Snowflake, AWS, and Azure.
Use Databricks (pipelines, workflows, asset bundles) to streamline engineering processes.
Apply dbt Core for transformations, documentation, testing, and semantic consistency.
Maintain code quality using SQL/YAML linters (SQLFluff) and enforce standards through GitHub Actions CI/CD.
Develop solutions for data quality issues such as missing, duplicate, and inconsistent data.
Contribute to data warehouse, data lake, data lakehouse, and data mesh architectural patterns.
Build pipelines in Python to integrate diverse data types: structured tables, text documents, images, and more.
Implement CI/CD systems and IaC tools like Terraform or AWS CloudFormation.
Support data systems across the full lifecycle: exploration, production, monitoring, disaster recovery, and optimization.
Stay current on advanced data engineering practices, including emerging technologies like Generative AI.

What You Bring

You have a relevant technical degree and at least four (4) years of Data Engineering experience.

You are experienced with:

Cloud platforms (preferably AWS)
Big data technologies: Spark, Databricks, Delta Lake
Git and Git-based workflows
dbt Core and modern data modeling
SQL and NoSQL databases
Cloud object storage (e.g., S3)
Containerization (Docker, Kubernetes, AWS ECS)
Building, testing, and maintaining fault-tolerant data pipelines
Understanding data architecture concepts: warehouse, lake, lakehouse, mesh

You’re also eager to deepen your knowledge of AI/ML techniques, and it’s a plus if you have:

Experience developing APIs or web applications
Certifications in data engineering or AI/ML

Leveling Guide (Intermediate)

Build metadata and schemas based on logical models
Write scripts for physical data layout and load test data
Design and validate schemas
Use ER modeling tools for intermediate tasks
Adhere to data governance, naming conventions, testing principles
Resolve moderately complex data problems
Provide SQL and Python scripts for tuning and validation
Write intermediate-level database programming scripts
Contribute independently to team projects and semantic layer enhancements
Suggest improvements to standards and processes
Take new perspectives on solving moderately complex problems

Why This Role Matters

Your work will directly impact:

The performance of AI/ML models
The accuracy, reliability, and timeliness of analytics
The innovation of new data streams from R&D pipelines
The quality and discoverability of curated datasets
How the organization advances clinical AI technologies

Join Us

If you are an analytical, collaborative, and forward-thinking AI Data Engineer looking for a remote opportunity that combines modern data engineering with applied machine learning, we encourage you to apply. Your expertise will help shape the next generation of AI-driven products and scientific innovation.

About the Company

We've all heard the phrase "it's not what you know, it's who you know" dozens of times. But when it comes to finding your next career opportunity, a temporary or temp to hire gig, or an IT contract assignment in Northern New England or Maine- it really couldn't be truer. Whether you are just starting your career, new to the area, trying to re-enter the job market after taking some time away, or simply looking for that next career opportunity, we are known as the recruiting and staffing firm to get you to work. Our relationsh... Know more