- Company Name
- STAFFWORXS
- Job Title
- AI-Ops Engineer
- Job Description
-
**Job Title:**
AI‑Ops Engineer
**Role Summary**
Design, implement, and manage AI‑Ops platforms that automate IT operations, deliver business analytics, and enable continuous observability in a hybrid cloud environment.
**Expectations**
- Deliver end‑to‑end AIOps solutions that drive measurable performance improvements and operational cost savings.
- Collaborate cross‑functionally to align technology initiatives with business objectives.
- Stay current with emerging AI/ML and cloud technologies to continuously enhance platform capabilities.
**Key Responsibilities**
- Oversee and enhance AIOps platforms, integrating AI/ML models for predictive analytics, anomaly detection, and root cause analysis.
- Implement MLOps pipelines (model training, deployment, monitoring) using AWS services (Lambda, Glue, S3, Kinesis, SageMaker, CloudWatch).
- Build and maintain scalable hybrid cloud architectures, ensuring seamless integration between on‑prem and AWS environments.
- Administer IT operations tools (QuantumMetrics, Dynatrace, ServiceNow CMDB, Riverbed) and integrate them for end‑to‑end observability.
- Design, develop, and expose REST/GraphQL APIs to support event‑driven and integrated system architectures.
- Deploy observability frameworks, identify performance trends, and recommend optimizations to maximize uptime.
- Drive automation using auto‑discovery, CMDB‑driven workflows, and infrastructure‑as‑code practices.
- Partner with Operations, Cloud, and Security teams to co‑deliver AIOps initiatives and ensure security compliance.
- Serve as an advocate for AIOps adoption, communicating ROI and delivering data‑backed insights to stakeholders.
- Continuously scan and adopt emerging trends in AI/ML, cloud, and AIOps to sustain innovation.
**Required Skills**
- Strong knowledge of AI/ML concepts and experience applying them to operational analytics.
- Proficiency with MLOps tools and pipelines (e.g., SageMaker, MLflow, Kubeflow).
- Hands‑on experience building and managing AWS infrastructure (Lambda, Glue, S3, Kinesis, CloudWatch).
- Experience integrating and optimizing IT operations tools (Dynatrace, ServiceNow CMDB, Riverbed, QuantumMetrics).
- API design/development skills (REST/GraphQL) and experience with event‑driven architectures.
- Expertise in observability, monitoring, and performance tuning of hybrid cloud environments.
- Strong problem‑solving, data‑driven decision making, and communication abilities.
- Familiarity with DevOps practices, CI/CD, infrastructure‑as‑code (Terraform, CloudFormation).
**Required Education & Certifications**
- Bachelor’s degree in Computer Science, Information Technology, Data Science, or a related field.
- AWS Certified Solutions Architect or equivalent cloud certification.
- Relevant AI/ML or data engineering certifications (e.g., AWS Certified Machine Learning, Google Cloud ML Engineer).