- Company Name
- Kraken Digital Asset Exchange
- Job Title
- Site Reliability Engineer - Data Platform
- Job Description
-
Job Title: Site Reliability Engineer – Data Platform
Role Summary
Senior SRE responsible for designing, deploying, and maintaining the data platform that supports crypto trading analytics, BI tools, and real‑time streaming services. The role ensures reliability, scalability, performance, and security across hybrid on‑premise and AWS cloud environments.
Expectations
- Deliver highly available and cost‑effective data infrastructure.
- Apply IaC best practices and automation to accelerate provisioning and reduce errors.
- Enforce data governance, compliance, and secure access controls.
- Resolve incidents proactively and support on‑call rotations.
Key Responsibilities
- Design and implement data lake governance, security, and compliance frameworks.
- Build and manage data ingestion pipelines, cataloging, and lineage tracking.
- Provision and maintain AWS and on‑prem infrastructure using Terraform, Terragrunt, and Atlantis.
- Develop shell and Python scripts for automation of deployments and operational tasks.
- Extend and maintain CI/CD pipelines for infrastructure and data services.
- Configure monitoring, alerting, and log aggregation for real‑time data streams.
- Implement RBAC and credential management across environments.
- Operate and optimize Kafka, Debezium, Flink, and Spark clusters for streaming analytics.
- Manage containerized workloads with Kubernetes (and Docker/Nomad as needed).
- Document architecture, processes, and best practices.
- Collaborate with data analysts, engineers, and AI/ML teams on infra requirements.
Required Skills
- 5+ years as Site Reliability, Infrastructure, or Data SRE.
- Hands‑on experience with Kafka, Debezium, Flink, Apache Airflow, Apache Spark, and BI tooling.
- Proficiency in AWS multi‑tenant environments and hybrid deployment models.
- Strong IaC expertise: Terraform, Terragrunt, Atlantis.
- Containerization & orchestration: Kubernetes, Docker, Nomad.
- Scripting: Bash/Shell; programming: Python or JVM languages.
- Expertise in data access management, RBAC, and certificate handling.
Required Education & Certifications
- Bachelor’s degree in Computer Science, Engineering, or related field.
- Relevant certifications preferred: AWS Certified DevOps Engineer, Kubernetes Certified Administrator, Terraform Associate.