- Company Name
- Dailymotion
- Job Title
- Site Reliability Engineer (All Genders)
- Job Description
-
**Job Title**
Site Reliability Engineer
**Role Summary**
Design, build, and maintain a scalable Kubernetes‑based Platform‑as‑a‑Service that supports Dailymotion’s video hosting and advertising platforms. Deliver high reliability, cost‑effective operations, and secure deployment pipelines while collaborating with cross‑functional teams in AWS, GCP, and on‑prem environments.
**Expectations**
- 5+ years in DevOps/SRE roles with deep experience in container orchestration and cloud infrastructure.
- Proven track record of building and managing CI/CD pipelines (Flux, Tekton, Jenkins).
- Ability to own end‑to‑end reliability, cost optimization, and security for large‑scale distributed applications.
- Strong communication skills in English and a proactive, collaborative mindset.
**Key Responsibilities**
1. Ensure platform availability, performance, and scalability through automated testing, monitoring, and incident response.
2. Lead on‑call duties: triage incidents, manage escalations, and conduct root‑cause analyses.
3. Optimize Kubernetes cluster usage and FinOps practices in AWS and GCP, including cost‑efficient provisioning (Karpenter, autoscaling).
4. Maintain and evolve the CD platform (FluxCD), Hel moves, and deployment workflows.
5. Provide MLOps and data‑architecture guidance (Airflow, GPUs, TPUs, Inferentia).
6. Write and update engineering documentation, runbooks, and troubleshooting guides.
7. Mentor peers, champion DevOps best practices, and stay current on emerging tools and trends.
**Required Skills**
- **Containerization & Orchestration:** Docker, Kubernetes, Helm, EKS, GKE.
- **IaC & Automation:** Terraform, CD pipelines (Flux, Tekton, Jenkins, Jx3).
- **Cloud Platforms:** AWS, GCP, including S3/Cloud Storage, IAM, VPC, and cost‑management tooling.
- **Observability & FinOps:** Prometheus, Grafana, Datadog, Looker, Costory.
- **Data & ML Ops:** Airflow, Dataflow, Kestra, GPU/TPU deployment, inferencing infrastructure.
- **Programming & Scripting:** Python, Bash; familiarity with Go or Rust is a plus.
- **Datastores & Caching:** Aerospike, MongoDB, Redis, Druid, MySQL, PostgreSQL.
- **Security & Policy:** Cert‑manager, Kyverno, EntraID.
- **Soft Skills:** Strong collaboration, problem‑solving, and communication in a global team.
**Required Education & Certifications**
- Bachelor’s (or higher) degree in Computer Science, Engineering, or a related technical field.
- Professional cloud and Kubernetes certifications highly desirable (e.g., AWS Certified Solutions Architect, GCP Professional Cloud Architect, Certified Kubernetes Administrator).
Issy-les-moulineaux, France
Remote
Mid level
02-12-2025