- Company Name
- Protingent
- Job Title
- Site Reliability Engineer
- Job Description
-
**Job Title**
Site Reliability Engineer
**Role Summary**
A remote, contract-based SRE responsible for designing, scaling, and maintaining observability platforms (Prometheus, Grafana, and Dynatrace) while applying SRE maturity practices, defining SLOs and error budgets, automating toil, and partnering with engineering, product, and operations teams to improve system reliability.
**Expectations**
- Deliver data‑driven insights that guide reliability improvements.
- Translate complex technical telemetry into actionable messages for diverse stakeholders.
- Lead automation initiatives that reduce manual toil and accelerate incident resolution.
**Key Responsibilities**
1. Design, scale, optimize, and manage Prometheus and Grafana environments for multi‑team use.
2. Craft advanced PromQL queries, dashboards, visualizations, and metric‑based calculations.
3. Configure and maintain Dynatrace dashboards, analytics, insights, and performance reports.
4. Analyze telemetry to identify meaningful metrics, drive actionable insights, and influence engineering decisions.
5. Apply and evolve an SRE Maturity Model to elevate observability, resilience, automation, and reliability across teams.
6. Define, implement, and monitor Service Level Objectives (SLOs) and error budgets for applications and services.
7. Collaborate with engineering, product, operations, and leadership to translate technical findings into clear, actionable communication.
8. Identify and reduce toil through automation, tooling enhancements, and process refinement.
9. Support incident analysis, reliability reviews, and continuous improvement initiatives.
**Required Skills**
- Strong knowledge of SRE principles, maturity models, and reliability roadmaps.
- Hands‑on experience with Prometheus, Grafana, and PromQL.
- Proficiency in Dynatrace, metric analysis, and observability practices.
- Excellent written and verbal communication skills.
- Analytical and problem‑solving mindset with a bias for action.
**Required Education & Certifications**
- Bachelor’s degree in Computer Science, Engineering, or related field, or equivalent industry experience.
- Preferred certifications: Prometheus, Grafana, Dynatrace, or SRE‑specific training.
**Preferred Qualifications**
- Experience with Kubernetes, cloud platforms (AWS, GCP, Azure), or CI/CD pipelines.
- Automation engineering experience.
- Exposure to large‑scale distributed systems or high‑availability architectures.