- Company Name
- IQHector Technologies LLC.
- Job Title
- Site Reliability Engineering Manager
- Job Description
-
**Job Title:** Site Reliability Engineering Manager
**Role Summary:**
Lead and manage the Site Reliability Engineering (SRE) function for e‑commerce/retail customers, ensuring application reliability, performance, and availability. Act as the primary liaison between customers, Litmus7 leadership, and offshore support teams, driving incident response, monitoring, and continuous improvement initiatives.
**Expectations:**
- Demonstrated experience as an SRE Lead or senior SRE in e‑commerce/retail environments.
- Proven ability to manage 24x7 offshore support (India) and coordinate cross‑time‑zone teams.
- Strong communication skills to gather requirements, present reports, and lead P1 incident calls.
- Proactive “can‑do” attitude with the capacity to prioritize in fast‑moving settings.
**Key Responsibilities:**
- Oversee production application support (L1‑L3) for platforms such as Shopify, Blue Yonder, or similar.
- Design, implement, and maintain monitoring, logging, alerting, and dashboard solutions (e.g., New Relic, PagerDuty, AppDynamics, Splunk, Dynatrace, Datadog, CloudWatch, ELK, Prometheus).
- Define and enforce SRE metrics, SLAs/SLIs, and incident management processes per ITIL framework.
- Lead high‑severity (P1) incident response, conduct root‑cause analysis, and communicate outcomes to customers and senior leadership.
- Develop and maintain SOPs, runbooks, and documentation; generate weekly/monthly service reports from ITSM tools (JIRA, ServiceNow, BMC Remedy).
- Collaborate with development, cloud architecture, infrastructure, and project management teams across regions.
- Guide offshore teams in night‑time coverage, knowledge transfer, and escalation handling.
**Required Skills:**
- SRE principles: metrics, logging, availability, incident, change, and risk management.
- Hands‑on experience with monitoring tools (New Relic, PagerDuty preferred) and logging/alerting platforms.
- Proficiency in ITSM platforms (JIRA, ServiceNow, BMC Remedy) and reporting.
- Familiarity with API testing tools (Postman).
- Strong written and verbal communication; ability to translate technical concepts for non‑technical stakeholders.
- Leadership of distributed/offshore teams and cross‑functional collaboration.
- Ability to produce and present service health reports (WSR/MSR).
**Required Education & Certifications:**
- Bachelor’s degree in Computer Science, Information Technology, Engineering, or related field (or equivalent experience).
- ITIL Foundation certification preferred.
- Relevant certifications in cloud or monitoring platforms (e.g., AWS Certified DevOps Engineer, New Relic Certified, etc.) are a plus.