cover image
Black Rock Solutions INC

SRE Manager

On site

Miami, United states

$ 70 /hour

Freelance

07-10-2025

Share this job:

Skills

Leadership Python Go Incident Response CI/CD Docker Kubernetes Monitoring Jenkins Performance Testing Architecture AWS cloud platforms CI/CD Pipelines Terraform Prometheus Grafana

Job Specifications

About The Opportunity

An established player in Financial Technology and Enterprise Cloud Infrastructure, delivering resilient, high-throughput systems that support mission-critical institutional workloads. We operate large-scale distributed services and are investing in reliability, observability, and automation to meet aggressive SLAs across the business.

Location: United States (On-site)

Role & Responsibilities

Lead and grow a high-performing Site Reliability Engineering team responsible for production availability, incident response, and operational excellence.
Define and own SLIs, SLOs, SLA frameworks and a reliability roadmap; translate business requirements into measurable reliability targets.
Drive incident management and postmortem culture: lead major incidents, coordinate cross-functional response, and implement corrective actions to eliminate repeat failures.
Architect and implement observability, monitoring, and alerting solutions to provide actionable signal (metrics, logs, tracing) and reduce MTTD/MTTR.
Improve platform scalability and resilience through automation, CI/CD pipelines, infrastructure-as-code, capacity planning and performance testing.
Partner with Engineering, Security, and Product teams to influence architecture, deploy robust runbooks, and bake reliability into the development lifecycle.

Skills & Qualifications

Must-Have

Kubernetes
Docker
Prometheus
Grafana
Terraform
AWS

Preferred

Go
Python
Jenkins

Qualifications & Experience

Proven experience leading SRE/Platform teams in production; track record owning reliability for distributed systems.
Strong understanding of incident management, postmortem discipline, capacity planning, and on-call rotations.
Hands-on experience with cloud-native architectures, IaC, and CI/CD practices; able to both lead strategy and contribute technically.

Benefits & Culture Highlights

Opportunity to shape reliability for large-scale, mission-critical systems with measurable business impact.
Collaborative engineering culture that prioritizes automation, continuous improvement, and transparent postmortems.
On-site team environment focused on mentorship, career growth, and technical leadership.

We seek a strategic SRE leader who combines deep operational expertise with people leadership to drive measurable uptime and velocity improvements. If you are passionate about observability, incident prevention, and building reliable cloud platforms, we want to hear from you.

Skills: kubernetes,docker,prometheus,grafana,terraform,aws,ci,cd,cloud,reliability

About the Company

Black Rock Groups Inc | Elevating Human Resource Solutions in the U.S. At Black Rock Groups Inc, we specialize in providing top-tier human resource services tailored to meet the evolving needs of businesses across the United States. Our expertise spans talent acquisition, workforce management, employee engagement, compliance, and strategic HR consulting. We empower organizations by delivering customized HR solutions that drive efficiency, productivity, and long-term growth. Whether you're a startup looking to build a stron... Know more