- Company Name
- Piper Maddox
- Job Title
- DevOps Engineer
- Job Description
-
Job title: DevOps Engineer
Role Summary:
Design, provisioning, and operation of hybrid cloud (AWS & Azure) and on‑prem infrastructure to support an Energy Management System. Use IaC (Terraform), configuration management (Ansible), CI/CD pipelines, Kubernetes, and observability. Ensure security, compliance, cost efficiency, and reliability through SRE practices and incident response.
Expectations:
- Build scalable, secure, and automated production environments for data‑centric and ML workloads.
- Deliver observability, security, and cost controls across multi‑cloud deployments.
- Collaborate with backend, ML, and controls teams to define API, event, and data service requirements.
- Participate in on‑call rotation, root‑cause analysis, and post‑mortems.
Key Responsibilities:
- Provision and manage cloud/on‑prem environments with Terraform and Ansible.
- Develop and maintain CI/CD pipelines (GitHub Actions, Azure DevOps).
- Operate container workloads on Kubernetes (AKS/EKS), including cluster addons, ingress, autoscaling, upgrades.
- Implement observability stack: Prometheus, Grafana, OpenSearch/ELK, OpenTelemetry.
- Enforce cloud security baselines (IAM, least privilege, network segmentation, TLS, key management) and secrets management (Vault, Secrets Manager, Key Vault).
- Automate backups, DR (RTO/RPO), blue/green or canary releases, infrastructure testing (Terratest).
- Design and support scalable APIs, data services, event pipelines; define SLOs/SLIs.
- Support audits and compliance (SOC 2) via policy‑as‑code, documentation, and monitoring.
- On‑call and incident response: triage, RCA, post‑mortem, continuous improvement.
Required Skills:
- Terraform, Ansible, IaC & configuration management.
- AWS and/or Azure architecture (VPC/VNet, subnets, routing, load balancers, DNS).
- Docker, Kubernetes (AKS/EKS), cluster operations.
- CI/CD (GitHub Actions, Azure DevOps) and pipeline automation.
- Observability: metrics, logs, traces (Prometheus, Grafana, OpenSearch/ELK, OpenTelemetry).
- Cloud security: IAM, least privilege, secrets, encryption, compliance basics.
- Linux systems administration.
- SRE principles: monitoring, alerting, incident response, cost optimization.
- Strong communication, self‑management, and multitasking.
Preferred Skills:
- .NET / C# build workflows, Python automation.
- Messaging/data engines: Kafka, RabbitMQ, PostgreSQL, SQL Server, Redis.
- Energy/industrial domain experience (EMS, SCADA, BMS/DERMS).
- Policy‑as‑code (OPA), infrastructure testing (Terratest), GitOps (Argo CD/Flux).
- Backup/DR strategy, multi‑account/subscription patterns, cross‑cloud networking.
Required Education & Certifications:
- Bachelor’s degree in Computer Science, Software/Systems Engineering, or related field (or equivalent experience).
- 5+ years in DevOps/SRE/Platform Engineering with production systems.
(Note on Certifications: None explicitly required, but industry credentials such as AWS Certified DevOps Engineer, Azure DevOps Engineer Expert, or Terraform Associate are advantageous.)