- Company Name
- AppDirect
- Job Title
- Ingénieur(e) de données principal(e) / Staff Data Engineer
- Job Description
-
Job title: Staff Data Engineer (Principal Data Engineer)
Role Summary: Lead the design, migration, and delivery of high‑quality data solutions within a modern, scalable data platform. Collaborate with engineering leadership, product owners, and stakeholders to transform legacy ETL processes into efficient streaming pipelines, empower self‑service analytics, and uphold stringent data governance and quality standards.
Expectations:
- Minimum 7 years of end‑to‑end data pipeline development and maintenance.
- Minimum 5 years of experience on cloud platforms (AWS or Azure).
- Proven expertise in Databricks and Intelligent Data Management Cloud.
- Advanced proficiency in at least one of Java, Scala, or Python with build tools (Gradle, Maven, SBT, Make).
- Expert‑level SQL knowledge.
- Hands‑on experience with distributed data processing engines (Apache Spark, Flink, Trino).
- Strong understanding of Data Lake concepts (Apache Iceberg, Delta Lake, Hudi).
- Deep knowledge of data schemas (JSON, Avro, Protobuf) and large‑scale complex data handling.
- Real‑time data pipeline experience using Kafka or Pulsar.
- Containerization and orchestration skills with Docker and Kubernetes.
- Solid grasp of data quality, governance, traceability, and validation practices.
- Ability to align data solutions with business structure and operational/analytical use cases.
- Code‑review and technical design discussion experience.
- Excellent self‑management, multitasking, and independent delivery.
- Clear communication of complex concepts to both technical and non‑technical audiences.
Key Responsibilities:
- Partner with the Data Engineering Lead, Product Manager, and Technical Leads to build a modern, scalable data platform.
- Migrate existing batch ETL workflows to more efficient, continuous pipelines.
- Enable internal stakeholders with self‑service data solutions to improve operational efficiency.
- Drive data quality initiatives through robust governance and validation processes.
- Solve complex data challenges, spearhead proof‑of‑concept projects to evaluate emerging technologies.
- Author and maintain high‑quality documentation to support knowledge sharing.
- Deliver first‑class data solutions and processes using CI/CD, automation, and best‑practice data management principles.
Required Skills:
- Data pipeline architecture and implementation (ETL/ELT, streaming).
- Cloud data services (AWS, Azure).
- Databricks, Intelligent Data Management Cloud.
- Programming: Java, Scala, Python; build tools (Gradle, Maven, SBT, Make).
- SQL (advanced).
- Distributed processing: Spark, Flink, Trino.
- Data Lake technologies: Iceberg, Delta Lake, Hudi.
- Data serialization: JSON, Avro, Protobuf.
- Real‑time streaming: Kafka, Pulsar.
- Containerization: Docker, Kubernetes.
- Data governance, quality, traceability, validation.
- Technical design review and code quality.
- Autonomous project and task management.
- Clear technical communication.
Required Education & Certifications:
- Bachelor’s or Master’s degree in Computer Science, Software Engineering, Data Engineering, or a related technical field (equivalent experience acceptable).
- Relevant certifications (e.g., AWS Certified Data Analytics, Azure Data Engineer Associate, Databricks Certified Data Engineer) are a plus.