- Company Name
- Ideogram
- Job Title
- Software Engineer, ML Data Infrastructure
- Job Description
-
Job title: Software Engineer, ML Data Infrastructure
Role Summary: Design and operate scalable, reliable data infrastructure that powers large‑scale foundation models and generative media pipelines. Build and maintain distributed systems, data processing workflows, and storage solutions at petabyte scale, working closely with research scientists to translate data requirements into production‑grade systems.
Expactations
- Own end‑to‑end data infrastructure projects from scoping through to production delivery.
- Demonstrate proactive ownership, initiative, and a bias toward rapid, high‑quality action.
- Collaborate in a fast‑moving, ambiguous environment and communicate effectively across teams.
- Apply first‑principle thinking to solve complex technical problems and continuously improve processes.
Key Responsibilities
- Develop and maintain distributed data pipelines that ingest, transform, and store multimodal training data.
- Optimize throughput and reliability at petabyte scale using GCP services, Kubernetes, Docker, and Terraform.
- Design storage architectures with Google Bigtable, BigQuery, Spanner, and Pub/Sub to support training workloads.
- Provision and manage TPU infrastructure and large‑scale storage systems for model training.
- Partner with research scientists to understand data requirements, perform data profiling, and develop data schemas.
- Lead performance tuning, fault tolerance, and monitoring for data services.
- Drive continuous integration/continuous deployment (CI/CD) of data infrastructure components.
Required Skills
- 2–5 years of experience building and shipping large‑scale distributed systems.
- Strong grasp of data structures, algorithms, and distributed system principles.
- Deep knowledge of database and storage architectures (NoSQL, OLAP, key‑value).
- Hands‑on experience with large‑scale data processing frameworks (e.g., Beam, Spark).
- Proficiency in Python and experience with Kubernetes, Docker, and Terraform.
- Familiarity with GCP services: Bigtable, BigQuery, Spanner, Pub/Sub, Vertex AI, TPU.
- Ability to translate research specifications into production‑ready solutions.
- Demonstrated project ownership: scoping, execution, iteration, and delivery.
- Excellent problem‑solving, communication, and collaborative skills.
Required Education & Certifications
- Bachelor’s degree in Computer Science, Engineering, or related technical field (or equivalent experience).
- Relevant certifications (e.g., Google Cloud Professional Data Engineer, GCP Associate Cloud Engineer) are a plus.