- Company Name
- Lavendo
- Job Title
- Senior AI/ML Specialist Solutions Architect (AI Infra & Cloud)
- Job Description
-
**Job Title**
Senior AI/ML Specialist Solutions Architect (AI Infra & Cloud)
**Role Summary**
Design, architect, and optimize large‑scale AI training and inference pipelines on cloud and high‑performance computing infrastructures. Provide technical leadership, drive the transition of proof‑of‑concepts to production, and influence product roadmaps while ensuring customer satisfaction and business value.
**Expectations**
- Build end‑to‑end AI solutions that scale across multi‑node, multi‑GPU environments.
- Lead customer engagements, deliver technical presentations and whitepapers, and mentor cross‑functional teams.
- Collaborate closely with engineering, product, and operations to prioritize feedback and shape future capabilities.
**Key Responsibilities**
- Architect distributed training/inference systems for large language and vision models.
- Scale ML pipelines from POC to high‑throughput production environments.
- Design performance‑optimized workloads using CUDA, NCCL, Infiniband, and Kubernetes/Slurm orchestration.
- Create and deliver technical content (presentations, webinars, whitepapers).
- Mentor teams on MLOps best practices and infrastructure deployment strategies.
- Partner with product and engineering to translate customer needs into roadmap items.
- Advocate for architecture principals that enhance reliability, security, and cost efficiency.
**Required Skills**
- 5+ years in cloud technologies and infrastructure (AWS, GCP, Azure).
- Proven experience scaling AI workloads across multi‑node, multi‑GPU clusters.
- Deep expertise with PyTorch, JAX, TensorFlow, HuggingFace, and Scikit‑learn.
- Strong knowledge of NVIDIA HPC ecosystem: CUDA, NCCL, Infiniband.
- Hands‑on with IaC (Terraform, Ansible), container orchestration (Kubernetes, Slurm), DevOps tools (Git, Docker, Helm).
- Experience with big data platforms (Spark, Kafka, Hadoop) and databases (SQL, NoSQL, vector).
- Programming fluency in Python; proficiency in Go, Java, C++.
- Excellent communication and stakeholder‑management skills.
- Legal authorization to work in the U.S. on a full‑time basis.
**Required Education & Certifications**
- Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, or related field.
- Certifications in cloud architecture (AWS Certified Solutions Architect, GCP Professional Cloud Architect, or Azure Solutions Architect), Kubernetes Administration, or MLOps are highly desirable.
San francisco, United states
On site
Senior
30-12-2025