cover image
Gotham Technology Group

Senior Site Reliability Engineer

Hybrid

Iselin, United states

Senior

Freelance

02-03-2026

Share this job:

Skills

Communication Python CI/CD Docker Kubernetes Monitoring Ansible Networking Programming Databases Organization Azure AWS CI/CD Pipelines Terraform Prometheus Grafana Infrastructure as Code

Job Specifications

Senior Site Reliability Engineer (SRE)

Duration: 6 Month Contract-to-Hire

Location: Hybrid in Iselin, NJ

Job Summary

We are seeking a hands-on Senior Site Reliability Engineer (SRE) to support the reliability, scalability, and performance of our production systems, with a strong focus on observability, automation, and operational excellence. This role partners closely with development, operations, and product teams to improve system health, automate workflows, and ensure resilient services.

Grafana expertise is a core requirement.

This role will own and support dashboards, metrics, and performance analysis used across the organization. You will also maintain and extend existing Python-based automation and data ingestion jobs that feed an infrastructure data warehouse used for reporting and operational insights.

This is a production-focused role emphasizing continuity, practical execution, and reliability over theory-heavy SRE practices.

Key Responsibilities

Design, implement, and support automated deployment, monitoring, and alerting solutions
Build, manage, and maintain scalable infrastructure using Infrastructure as Code (IaC) tools
Own and support Grafana dashboards, visualizations, and data sources for reliability and performance monitoring
Maintain and extend existing Python automation and data ingestion jobs integrating data from multiple systems into an infrastructure data warehouse
Improve observability through enhanced logging, metrics, monitoring, and alerting strategies
Diagnose and resolve production issues quickly while minimizing downtime and business impact
Partner with development and operations teams to improve system reliability, scalability, and efficiency
Create and maintain operational documentation, runbooks, and best practices
Support knowledge transfer and continuity for existing automation and monitoring solutions

Explicitly Out of Scope

Leading formal post-incident reviews
Primary ownership of on-call rotations

Qualifications

Bachelor’s degree in Computer Science, Engineering, or equivalent practical experience
Proven experience in an SRE, Platform Engineer, or similar production-focused role
Strong hands-on experience with Grafana for monitoring, visualization, and operational analysis
Strong programming or scripting skills, particularly in Python
Experience with Infrastructure as Code tools such as Terraform, Ansible, or equivalent
Experience designing or supporting automated deployment and monitoring solutions
Familiarity with CI/CD pipelines and related tooling
Experience working in cloud environments such as AWS, Azure, or Google Cloud
Strong troubleshooting skills and ability to operate calmly in production environments
Excellent communication skills and ability to collaborate across technical teams
Comfortable working in a fast-paced, evolving environment

Preferred Qualifications

Experience with Prometheus or other time-series databases
Knowledge of containerization and orchestration technologies such as Docker and Kubernetes
Understanding of networking and security best practices
Experience with database administration, performance tuning, or optimization
Exposure to data pipelines or operational reporting platforms

Team & Transition Context

Current team member supporting this area remains through the end of April
Knowledge transfer and overlap time is strongly preferred
Goal is rapid onboarding to avoid transferring an unsustainable workload

Work Schedule

Hybrid A/B schedule
Week A: In office Monday–Wednesday, remote Thursday–Friday
Week B: Remote Monday–Wednesday, in office Thursday–Friday
Schedule rotates every five business days

About the Company

Gotham Technology Group, LLC is in the business of providing guidance and direction to IT professionals. With sales offices in Connecticut, New Jersey, and New York City, Gotham serves clients based throughout the Northeastern United States, and delivers goods and services across the globe. Gotham has been Certified(tm) as a Great Place to Work four years in a row. Visit the link below to view our company profile. https://www.greatplacetowork.com/certified-company/7025230 Know more