Job Specifications
Role: Lead AWS Data Engineer
Location: Houston, TX-Onsite
Duration: Long Term Contract
Minimum Requirements:
Bachelor’s or Master’s degree in Computer Science, Information Systems, Engineering, or a related discipline plus at least 12 years of hands-on data engineering experience, or demonstrated equivalency of experience and/or education
5+ years in a technical-lead or team-lead capacity delivering enterprise-grade solutions.
Deep expertise in AWS data and analytics services: e.g.; S3, Glue, Redshift, Athena, EMR/Spark, Lambda, IAM, and Lake Formation.
Proficiency in Python/PySpark or Scala for data engineering, along with advanced SQL for warehousing and analytics workloads.
Demonstrated success designing and operating large-scale ELT/ETL pipelines, data lakes, and dimensional/columnar data warehouses.
Experience with workflow orchestration (e.g.; Airflow, Step Functions) and modern DevOps practices—CI/CD, automated testing, and infrastructure-as-code (e.g.; Terraform or CloudFormation).
Experience with data lakehouse architecture and frameworks (e.g.; Apache Iceberg).
Experience in integrating with enterprise (onprem, SaaS) systems (Oracle e-business, Salesforce, Workday)
Strong communication, stakeholder-management, and documentation skills; aptitude for translating business needs into technical roadmaps.
Preferred Qualifications:
Solid understanding of data modeling, data governance, security best practices (encryption, key management), and compliance requirements.
Experience working within similarly large, complex organizations
Experience building integrations for enterprise back-office applications
AWS Certified Data Analytics – Specialty or AWS Solutions Architect certification (or equivalent) preferred; experience with other cloud platforms is a plus.
Proficiency in modern data storage formats and table management systems, with a strong understanding of Apache Iceberg for managing large-scale datasets and Parquet for efficient, columnar data storage.
In-depth knowledge of data cataloging, metadata management, and lineage tools (AWS Glue Data Catalog, Apache Atlas, Amundsen) to bolster data discovery and governance.
Knowledge of how machine learning models are developed, trained, and deployed, as well as the ability to design data pipelines that support these processes.
Experience migrating on-prem data sources onto AWS.
Experience building high quality Data Products.