cover image
Avanciers Inc.

Site Reliability Engineer

On site

Santa clara, United states

Full Time

03-11-2025

Share this job:

Skills

Python Go Bash MySQL Incident Response Kubernetes Monitoring Jenkins Prometheus Grafana Server Management

Job Specifications

Avanciers is seeking a highly skilled Site Reliability Engineer for an exciting opportunity with one of our Fortune 500 clients.

Job Title: Site Reliability Engineer (SRE)

Location: Santa Clara, CA

Position Type: Full-time

Position Summary

We are seeking experienced Site Reliability Engineers (SREs) with strong expertise in native server environments and hands-on experience in Semiconductor or Electronic Software companies.

The ideal candidate will have a solid background in bare-metal data center management, automation, and observability, ensuring high reliability and performance across production systems.

Key Responsibilities

1. Service Reliability & Incident Management

Guard and maintain Service Level Agreements (SLAs) for critical engineering services.
Implement and manage monitoring, alerting, and incident response mechanisms.
Conduct root cause analysis and post-mortems for SLA breaches and critical incidents.

2. Observability & Monitoring

Set up and maintain monitoring tools such as Prometheus, Grafana, and ELK Stack to track system health and KPIs.
Develop and maintain KPI pipelines using Jenkins, Python, and ELK.
Create custom alerts to enhance system observability and proactive incident prevention.

3. Automation & Optimization

Develop automation scripts and workflows using Python, Go, Bash, and Jenkins.
Support capacity planning, infrastructure optimization, and performance tuning.
Improve system efficiency and reliability through automation and operational best practices.

4. Day-to-Day Operations

Monitor system alerts, investigate issues, and ensure timely resolutions.
Participate in WAR room sessions during major incidents or outages.

5. Collaboration & Documentation

Collaborate closely with software, hardware, and infrastructure teams.
Maintain detailed documentation for procedures, configurations, and troubleshooting steps.

Required Technical Skills

Baremetal data center machine management tools: IPMI, Redfish, KVM
Automation & Scripting: Jenkins, Python, Go, Bash
Infrastructure & Monitoring: Kubernetes, MySQL, Prometheus, Grafana, ELK
Preferred Hardware Exposure: GPUs, Tegra systems

Preferred Profile

5–10 years of hands-on experience as an SRE or Infrastructure Engineer.
Strong background in native server management and data center infrastructure.
Experience in Semiconductor or Electronic Software companies is highly preferred.
Proven ability to maintain reliable, scalable, and automated environments

About the Company

At Avanciers, we drive business transformation by delivering exceptional talent solutions and cutting-edge technology services. Since 2015, we've been a trusted partner to enterprises across North America and beyond, offering impactful services in Staffing, Salesforce Consulting, Google Cloud Solutions, UI/UX Design, and Web Development. As a woman-owned, diversity-driven organization and a certified Salesforce and Google Cloud Partner, our mission is to empower companies to scale, innovate, and deliver results faster. We d... Know more