Job Specifications
Driving AI Workforce Growth at Fractal | Hiring Data Science, ML, GenAI, LLMOps & Analytics Talent | Connect for Opportunities
Fractal is a strategic AI partner to Fortune 500 companies with a vision to power every human decision in the enterprise. Fractal is building a world where individual choices, freedom, and diversity are the greatest assets; an ecosystem where human imagination is at the heart of every decision. Where no possibility is written off, only challenged to get better. We believe that a true Fractalite is the one who empowers imagination with intelligence. Fractal has been featured as a Great Place to Work by The Economic Times in partnership with the Great Place to Work® Institute and recognized as a Cool Vendor' and a Vendor to Watch' by Gartner.
Please visit Fractal | Intelligence for Imagination for more information about Fractal.
Seeking a visionary and hands-on Principal Architect to lead large-scale, complex technical initiatives leveraging Databricks within the healthcare payer domain. This role is pivotal in driving data modernization, advanced analytics, and AI/ML solutions for our clients. You will serve as a strategic advisor, technical leader, and delivery expert across multiple engagements. Responsibilities:
Design & Architecture of Scalable Data Platforms
Design, develop, and maintain large-scale data processing architectures on the Databricks Lakehouse Platform to support business needs such as sales forecasting, trade promotions, supply chain optimization, etc.
Architect multi-layer data models including Bronze (raw), Silver (cleaned), and Gold (curated) layers for various domains (eg, Retail Execution, Digital Commerce, Logistics, Category Management).
Leverage Delta Lake, Unity Catalog, and advanced features of Databricks for governed data sharing, versioning, and reproducibility.
Client & Business Stakeholder Engagement
Partner with business stakeholders to translate functional requirements into scalable technical solutions.
Conduct architecture workshops and solutioning sessions with enterprise IT and business teams to define data-driven use cases.
Collaborate with data engineers and data scientists to develop end-to-end pipelines using PySpark, SQL, DLT (Delta Live Tables), and Databricks Workflows.
Enable data ingestion from diverse sources such as ERP (SAP), POS data, Syndicated Data, CRM, E-commerce platforms, and third-party datasets.
Performance, Scalability, and Reliability
Optimize Spark jobs for performance tuning, cost efficiency, and scalability by configuring appropriate cluster sizing, caching, and query optimization techniques.
Implement monitoring and alerting using Databricks Observability, Ganglia, Cloud-native tools.
Security, Compliance & Governance
Design secure architectures using Unity Catalog, role-based access control (RBAC), encryption, token-based access, and data lineage tools to meet compliance policies.
Establish data governance practices including Data Fitness Index, Quality Scores, SLA Monitoring, and Metadata Cataloging.
Adoption of AI Copilots & Agentic Development
Utilize GitHub Copilot, Databricks Assistant, and other AI code agents for:
Writing PySpark, SQL, and Python code snippets for data engineering and ML tasks.
Generating documentation and test cases to accelerate pipeline development.
Interactive debugging and iterative code optimization within notebooks.
Advocate for agentic AI workflows that use specialized agents for:
Data profiling and schema inference.
Automated testing and validation.
Innovation and Continuous Learning
Stay abreast of emerging trends in Lakehouse architectures, Generative AI, and cloud-native tooling. Evaluate and pilot new features from Databricks releases and partner integrations for modern data stack improvements. Requirements:
Bachelor's or master's degree in computer science, Information Technology, or a related field.
12-18 years of hands-on experience in data engineering, with at least 5+ years on Databricks Architecture and Apache Spark.
Expertise in building high-throughput, low-latency ETL/ELT pipelines on Azure Databricks using PySpark, SQL, and Databricks-native features.
Familiarity with ingestion frameworks from structured/unstructured data sources including APIs, flat files, RDBMS, and cloud storage (Azure Data Lake Storage Gen2).
Experience designing Lakehouse architectures with bronze, silver, gold layering.
Expertise in optimizing Databricks performance using Delta Lake features such as OPTIMIZE, VACUUM, ZORDER, and Time Travel.
Strong understanding of data modelling concepts, star/snowflake schemas, dimensional modelling, and modern cloud-based data warehousing.
Experience with designing Data marts using Databricks SQL warehouse and integrating with BI tools (Power BI, Tableau, etc.).
Hands-on experience designing solutions using Workflows (Jobs), Delta Lake, Delta Live Tables (DLT), Unity Catalog, and MLflow.
Familiarity with Databricks REST APIs,