- Company Name
- Xsolla
- Job Title
- Data Scientist
- Job Description
-
**Job title**
Data Scientist
**Role Summary**
Design, build, and optimize large‑scale data pipelines and analytical models on Snowflake. Own end‑to‑end data workflows, from ingestion via MySQL, BigQuery, Redis, Kafka, and GCP Storage to production‑ready machine‑learning feature stores. Mentor junior engineers, enforce data governance, and collaborate with product, legal, security, and engineering teams to deliver reliable, high‑performance data solutions that enable advanced analytics and recommendation systems.
**Expectations**
- 5+ years of experience in data science/data engineering, with at least 3 years hands‑on with Spark.
- Proven ownership of complex, scalable data infrastructure in a fast‑paced environment.
- Strong leadership and mentorship capability, and ability to collaborate across technical and business stakeholders.
- Demonstrated ability to balance velocity with reliability and maintain high data quality and compliance standards.
**Key Responsibilities**
1. **Architecture & Development**
- Build and optimize ETL/ELT pipelines in Snowflake using Snowpark, Streams/Tasks, and Snowpipe.
- Develop scalable data models and algorithms for 360 user views, churn prediction, and recommendation engine inputs.
- Integrate data from MySQL, BigQuery, Redis, Kafka, GCP Storage, an API Gateway, and other services.
- Implement CI/CD for data pipelines with Git, dbt, and automated testing.
2. **Leadership & Collaboration**
- Mentor junior data engineers on modeling, performance tuning, and Snowflake best practices.
- Partner with Data Science, ML, and Backend teams to production‑ize machine‑learning features in Snowflake.
- Coordinate with Legal, Security, and Infrastructure to ensure data privacy, governance, and compliance.
3. **Performance & Scalability**
- Tune query and algorithm performance; design partitioning, clustering, and materialized views for fast query execution.
- Build dashboards and monitors (Looker, Tableau, Snowsight) to track pipeline health, job success, and latency.
4. **Governance & Best Practices**
- Establish naming conventions, data lineage, and metadata standards across schemas.
- Lead code reviews, enforce documentation, and manage schema versioning.
- Contribute to evolving data mesh and streaming architecture visions.
**Required Skills**
- Advanced proficiency in SQL and Python for large‑scale ETL/ELT.
- Deep knowledge of Snowflake (Snowpark, Streams/Tasks, Snowpipe) and performance tuning.
- Experience with Spark, Kafka, and real‑time or batch ingestion on GCP or AWS.
- Familiarity with data modeling frameworks (Kimball, Data Vault, or hybrid).
- Expertise in data pipeline orchestration tools (Airflow, Prefect, dbt).
- Understanding of feature stores, data contracts, and governance (e.g., Feast, Tecton).
- Strong communication and mentoring skills, able to translate technical concepts to business stakeholders.
**Required Education & Certifications**
- Bachelor’s degree in Computer Science, Information Systems, or a related technical field (advanced degree preferred).
- Certifications in data engineering or cloud platforms (SnowPro, Google Cloud Data Engineering, AWS Big Data Specialty) are a plus.