Job Specifications
Job Summary – Sr. DevOps Engineer – AI Platform
Design, implement, and manage scalable, resilient AWS infrastructure for AI platforms.
Architect and maintain Windows/Linux environments, integrating with cloud platforms.
Develop and maintain infrastructure-as-code using AWS CDK, CloudFormation, Terraform, and OpenTofu.
Manage configuration of Windows & Linux servers using Chef.
Build and optimize CI/CD pipelines with GitLab CI/CD for .NET applications.
Integrate and support AI services, including orchestration with AWS Bedrock, Google Agentspace, and generative AI frameworks.
Enable and optimize AI/ML workflows for model training, inference, and deployment across AWS and GCP.
Automate model lifecycle management (training, deployment, monitoring) through CI/CD.
Collaborate with AI engineering teams to deliver scalable environments, APIs, and infrastructure for AI adoption.
Implement observability, security, data privacy, and cost-optimization strategies for AI workloads.
Enforce security best practices in infrastructure and deployment processes.
Troubleshoot and resolve infrastructure and deployment issues.
Implement and manage monitoring and logging solutions for proactive system visibility.
Contribute to development and documentation of DevOps standards and best practices.
Stay updated on cloud, DevOps, and security trends and technologies.
Provide mentorship and guidance to junior team members.
Collaborate closely with development teams, providing DevOps expertise and support.
Key Skills & Qualifications
5+ years in DevOps/SRE roles.
1+ years working with AI services & LLMs.
Extensive hands-on experience with AWS.
Strong background in Windows/Linux server administration and cloud integration.
Proven experience with AWS CDK, CloudFormation, Terraform.
Strong CI/CD pipeline design and implementation (GitLab CI/CD).
Experience deploying .NET applications in cloud environments.
Scripting skills: PowerShell, Python, Ruby, Bash.
Experience with monitoring/logging tools (NewRelic, CloudWatch).
Understanding of cloud security best practices.
Experience with Chef for configuration management.
Excellent troubleshooting, communication, and collaboration skills.
Experience with containerization (Docker, Kubernetes) is a plus.
AWS/GCP certifications are a plus.
Knowledge of AWS EC2 Autoscaling, WarmPools, Chocolaty, and Packer is preferred.
About the Company
At Largeton Group, we're passionate about empowering businesses to reach new heights through innovative staffing solutions and cutting-edge technology. For over a decade, we've been dedicated to delivering exceptional service and driving growth for our clients.
Founded in 2015, Largeton Group has evolved from a staffing agency to a comprehensive HR solutions provider. Our journey has been marked by a commitment to excellence, innovation, and customer satisfaction. Today, we're proud to be a trusted partner for businesses ...
Know more