- Company Name
- Avanciers Inc.
- Job Title
- Site Reliability Engineer
- Job Description
-
**Job Title**
Site Reliability Engineer (SRE)
**Role Summary**
Design, implement, and maintain highly available production environments for native server and data‑center systems. Lead incident response, automate operational workflows, and continuously enhance observability to meet stringent SLAs for engineering services.
**Expectations**
- 5–10 years in SRE, infrastructure, or related engineering roles.
- Strong experience in semiconductor or electronic software settings.
- Proficiency with bare‑metal management, automation, and cloud‑native observability tools.
- Commitment to rapid incident resolution, continuous improvement, and collaborative cross‑functional teamwork.
**Key Responsibilities**
1. **Service Reliability & Incident Management** – Monitor SLAs, run incident investigations, perform root‑cause analysis, and lead post‑mortem reviews.
2. **Observability & Monitoring** – Deploy Prometheus, Grafana, ELK Stack, and custom KPI pipelines (Jenkins, Python, ELK) to track system health and KPIs.
3. **Automation & Optimization** – Build scripts and CI/CD pipelines using Python, Go, Bash, Jenkins; drive capacity planning, performance tuning, and infrastructure optimization.
4. **Daily Operations** – Respond to alerts, investigate outages, and participate in on‑call shift rotations and war‑room sessions.
5. **Collaboration & Documentation** – Work with software, hardware, and infrastructure teams; maintain procedures, configurations, and troubleshooting guidelines.
**Required Skills**
- Bare‑metal data‑center tools: IPMI, Redfish, KVM.
- Automation & scripting: Jenkins, Python, Go, Bash.
- Container & monitoring: Kubernetes, Prometheus, Grafana, ELK.
- Databases: MySQL.
- Preferred hardware knowledge: GPUs, Tegra platforms.
**Required Education & Certifications**
- Bachelor’s degree in Computer Science, Engineering, or related field.
- Certifications such as Certified Kubernetes Administrator (CKA), Red Hat Certified Engineer (RHCE), or equivalent are highly valued.
Santa clara, United states
On site
03-11-2025