- Company Name
- Clio
- Job Title
- Data Engineer
- Job Description
-
**Job Title:** Data Engineer
**Role Summary:**
Design, build, and maintain end‑to‑end data pipelines and data warehouses to enable real‑time and batch ingestion, transformation, and analytics. Work with cloud infrastructure, adopting industry best practices for data modeling, ETL/ELT, testing, and documentation. Collaborate with cross‑functional teams to deliver production‑ready data assets that power dashboards, reporting, and AI initiatives.
**Expectations:**
- Deliver clean, scalable data solutions in a fast‑moving environment.
- Produce production‑quality code while following rigorous testing, validation, and documentation standards.
- Prototype unknown or unproven technical solutions and iterate based on feedback.
- Communicate complex data concepts clearly to technical and non‑technical stakeholders.
**Key Responsibilities:**
- Implement real‑time and batch data ingestion using AWS services and modern data tools.
- Design and construct data transformation pipelines that consolidate heterogeneous sources into data lakes or warehouses.
- Build and optimize pipelines using Spark, Kafka, Airflow, and similar technologies.
- Deploy and maintain data models, schemas, and dashboards to support analytics and visualization.
- Enforce data integrity, lineage, and governance through proper testing and validation.
- Collaborate with software and product engineering teams on architecture, best practices, and tool selection.
- Develop proof‑of‑concept implementations for emerging use cases, including AI‑driven analytics.
- Mentor junior engineers and contribute to a culture of continuous improvement in data engineering practices.
**Required Skills:**
- Proficiency in Python, Scala, or Ruby and strong SQL skills.
- Hands‑on experience with AWS cloud services (e.g., S3, Redshift, Glue, EMR) and preferable knowledge of Terraform.
- Expertise in data engineering tools: Spark, Kafka, Airflow, and at least one data warehouse platform (Databricks, Snowflake, Redshift).
- Solid understanding of ETL/ELT processes, data modeling, and best‑practice pipeline design.
- Experience building production‑ready data pipelines that feed dashboards and visualizations.
- Excellent problem‑solving, collaboration, and communication abilities.
- Willingness to learn and apply emerging AI techniques to data workflows.
**Required Education & Certifications:**
- Bachelor’s degree in Computer Science, Engineering, Data Science, or a related technical field (or equivalent professional experience).
- No mandatory certifications required, but AWS or data engineering certifications are advantageous.