Stability AI

1 Job

189 Employees

About the Company

Stability AI is the enterprise-ready creative partner for teams and creators, delivering professional-grade generative AI tools and solutions for media generation and editing across image, video, 3D, and audio to enable creative production at scale. Stability AI sparked the generative AI revolution with the release of Stable Diffusion in August 2022, putting generative technology in the hands of millions of creators globally and cementing its position as a leader in the field. Stable Diffusion models have since been downloaded more than 350 million times. Recognized by Fortune as one of the 50 AI Innovators and by TIME as one of the Most Influential Companies, with Stable Audio named to TIME’s Best Inventions list. In June 2024, Stability AI entered its next phase of growth with the appointment of a renowned leadership team: Sean Parker as Executive Chairman, Prem Akkaraju as CEO, and James Cameron as Board Member. For press inquiries, contact: press@stability.ai. For customer support, contact: support@stability.ai.

Listed Jobs

Company Name: Stability AI
Job Title: Senior Site Reliability Engineer
Job Description: **Job Title:** Senior Site Reliability Engineer **Role Summary:** Senior SRE responsible for designing, implementing, and maintaining highly available, resilient cloud infrastructure. Works cross‑functionally with engineering, IT, security, and product teams to embed SRE best practices, improve reliability, and automate operations across AWS and multi‑cloud environments. **Expectations:** - Lead SRE initiatives and enforce standards across the organization. - Deliver scalable, secure infrastructure solutions. - Mentor junior engineers and promote a culture of reliability. - Own incident response, root‑cause analysis, and continuous improvement. **Key Responsibilities:** - Define and enforce SRE best practices and standards. - Architect, deploy, and manage scalable systems in AWS (and other clouds) focusing on high availability. - Implement infrastructure‑as‑code using Terraform. - Build and maintain monitoring, logging, and alerting pipelines (Grafana, ELK or equivalents). - Drive incident management, post‑mortems, and reliability improvements. - Collaborate with development teams to enhance CI/CD pipelines. - Scale storage, networking, and compute resources for demanding workloads. - Utilize Kubernetes or comparable container orchestration platforms. - Ensure cloud security compliance and best practices. **Required Skills:** - Strong experience with AWS (or other major cloud providers) architecture and services. - Proficiency in Terraform (IaC) and automation/scripting (e.g., Python, Bash). - Hands‑on experience with Kubernetes or similar container orchestration. - Expertise in monitoring and observability tools (Grafana, ELK, Prometheus, etc.). - Solid understanding of CI/CD processes and tools. - Experience in incident response, root‑cause analysis, and reliability engineering. - Knowledge of cloud security principles and implementations. - Ability to mentor and guide junior engineers. **Required Education & Certifications:** - Bachelor’s degree in Computer Science, Engineering, Information Technology, or related field (or equivalent practical experience). - Preferred certifications: AWS Certified Solutions Architect/DevOps Engineer, Certified Kubernetes Administrator (CKA), or comparable cloud/containers certifications.