Job Specifications
Data Engineer - Data Bricks Greenfield Implementation. £800/ Day Inside IR35. 6 month rolling long term contract. Hybrid 2 Days/ week in Central London Office.
My client is a top tier energy trading firm. The Data team is responsible for providing business solutions aimed at extracting value from large amounts of data. It covers a broad range of activities such as collecting market data and building related analysis tools, processing of real-time data streams, data governance and data science.
This role will be part of project team working on the adoption of Databricks, helping to assess its key features and implementing working solutions in Azure.
Main responsibilities
Databricks implementation: Act as Databricks expert for the proof-of-concept, evaluating Databricks capabilities against business requirements. Help define success criteria, design test scenarios, and work on the implementation od the identified use cases.
Design and Implement Lakehouse Architecture: Contribute your expertise to architect and build a modern data platform on Azure Databricks, leveraging Delta Lake and open standards. Ensure scalability, performance optimisation, and cost efficiency.
Hybrid Data Integration: Work on the implementation of solutions to integrate on-premises data sources with Azure Databricks. Address connectivity, security, and performance challenges in hybrid environments.
Streaming and Near Real-Time Data Processing: Work on the implementation of streaming pipelines using Databricks Structured Streaming and other available techniques. Evaluate and demonstrate near real-time capabilities for ingestion and transformation.
Data Transformation and Workflow Design: Create robust, scalable ingestion and transformation workflows using Databricks notebooks and Spark SQL. Incorporate observability, logging, and error handling.
Data Lineage and Governance: Implement lineage tracking and governance using Unity Catalog or integrated tools. Ensure compliance with organisational security and regulatory standards. Work in close collaboration with the Data Governance team.
Data Quality and Consistency: Demonstrate how to apply schema enforcement, validation, and error handling across data pipelines to maintain high-quality data.
Collaborate on Data Modelling: Work with analytics and data science teams to design schemas optimised for advanced analytics and AI workloads within Databricks.
Participate in Agile Delivery: Engage in agile ceremonies and contribute to iterative delivery of PoC and subsequent implementation phases.
Required Skills and Experience
Azure Databricks Expertise: Proven experience designing and implementing solutions on Azure Databricks, including cluster configuration, workspace management, and optimisation for cost and performance. Ability to leverage Databricks features such as Delta Lake, Unity Catalog, and MLflow for data engineering workflows.
Lakehouse Architecture Design: Hands-on experience building and optimising lakehouse architectures using Databricks. Skilled in designing partitioning strategies, indexing, and compaction for performance at scale.
Hybrid Data Integration: Practical experience in integrating on-premises data sources with Azure-based platforms. Familiarity with secure connectivity patterns, data movement strategies, and performance considerations in hybrid environments.
Streaming Data: Knowledge of implementing real-time pipelines in Databricks using Structured Streaming and integrating with Kafka or Event Hubs.
Data Transformation & Lineage: Advanced skills in creating transformation pipelines using Databricks notebooks and Spark SQL. Experience implementing data lineage and observability within Databricks, leveraging tools such as Unity Catalog or integrating with external lineage solutions.
Distributed Data Processing: Deep understanding of Apache Spark within Databricks, including optimisation techniques for large-scale batch and streaming workloads.
SQL & Delta Lake: Strong SQL skills for data modelling and querying within Databricks, including experience with Delta Lake features like ACID transactions, schema enforcement, and time travel.
Version Control & CI/CD: Familiarity with Git-based workflows and implementing CI/CD for Databricks using tools like Azure DevOps or GitHub Actions.
Cloud Storage & Security: Expertise in Azure Data Lake Storage (ADLS Gen2) and integration with Databricks. Strong understanding of identity management, access control (e.g., Azure RBAC, Unity Catalog), and compliance in cloud environments.
Desirable Skills and Experience
Advanced Databricks Features: Implementation and performance tuning of BI and AI workloads.
Query Engines: Exposure to federated query engines such as Trino and integration with Databricks.
Workflow Orchestration: Familiarity with orchestrating Databricks jobs using Airflow, Dagster, Azure Data Factory, or similar tools.
Data Management Tools: Familiarity with data management tools, such as Microsoft Purview, and integ
About the Company
CommuniTech are an exciting name in Tech Recruitment, seamlessly connecting the client & candidate communities to deliver exceptional technical talent to tech-driven companies. Ensuring that together, they will thrive, exceed, and achieve.
By striving to intertwine the communities, we get to know our clients and candidates better than ever before. Providing recruitment solutions that deliver an individual experience tailored to your needs.
Know more