- Company Name
- TekDoors Inc.
- Job Title
- Senior SRE Engineer
- Job Description
-
Job Title: Senior SRE Engineer
**Role Summary**
Lead site reliability engineering for a cloud‑native environment, ensuring high availability, performance, and cost efficiency of production systems across Windows and Linux platforms. Drive automation, monitoring, and incident response while collaborating with development, security, and operations teams.
**Expectations**
- Own service reliability, incident lifecycle, and root‑cause analysis.
- Deliver continuous improvement through automation, capacity planning, and resilience engineering.
- Communicate status, risks, and post‑mortem findings to stakeholders.
**Key Responsibilities**
- Design, implement, and maintain monitoring, logging and alerting solutions (Splunk, Datadog, Grafana).
- Configure and manage container orchestration (Kubernetes, PCF) and infrastructure (Terraform, Ansible).
- Script automation and optimization with PowerShell and Python; version control with Git.
- Perform database (NoSQL) operations and performance tuning; handle backup, recovery, and patch management.
- Conduct network troubleshooting, security compliance checks, and capacity planning.
- Lead incident response, root‑cause analysis, and post‑mortem documentation.
- Mentor junior SRE staff and promote best practices across teams.
**Required Skills**
- Windows System Administration – advanced.
- Public Cloud (AWS and/or GCP) – hands‑on experience.
- PowerShell (advanced) and Python programming.
- Docker, Kubernetes, PCF, and NoSQL database management.
- Monitoring & Log Analysis tools (Splunk, Datadog, Grafana).
- Infrastructure automation (Terraform, Ansible, configuration management).
- Incident response, RCA, performance & capacity monitoring.
- Backup & recovery, patch management, security compliance.
- Git version control, script optimization.
**Required Education & Certifications**
- Bachelor’s degree in Computer Science, Information Technology, or related field (or equivalent experience).
- Relevant cloud and DevOps certifications preferred: AWS Certified Solutions Architect, GCP Professional Cloud Architect, Certified Kubernetes Administrator (CKA), or similar.