- Company Name
- act digital
- Job Title
- Senior Cloud-Native Ops Engineer
- Job Description
-
Job Title: Senior Cloud-Native Ops Engineer
Role Summary: Design, build, and operate cloud-native platform services that enable data science, engineering, and AI/ML teams. Focus on scalable, automated AWS-based infrastructure, CI/CD pipelines, and model deployment and monitoring.
Expactations:
- Own end-to-end design and operation of AWS infrastructure for AI/ML workloads.
- Develop infrastructure-as-code with Terraform and maintain version control.
- Implement automated data pipelines with Airflow, Spark, and Python.
- Build and maintain CI/CD workflows using GitHub Actions, Jenkins, or AWS native tools.
- Provide incident response, root‑cause analysis, and user support; participate in on‑call rotations.
- Collaborate across international teams to improve system reliability, performance, and documentation.
Key Responsibilities:
1. Design and manage AWS services (Lambda, S3, Kinesis, API Gateway, ECS/EKS, networking, landing zones).
2. Create scalable platforms for model training, serving, versioning, monitoring, Jupyter notebooks, inference APIs, and Generative AI/LLM use cases.
3. Automate data ingestion and transformation pipelines using Airflow, Spark, and Python scripts.
4. Develop, test, and maintain Infrastructure as Code (IaC) using Terraform, including module reuse and governance.
5. Build CI/CD pipelines in GitHub Actions, Jenkins, or AWS CodePipeline for application and infrastructure deployments.
6. Monitor system health, conduct incident reviews, and drive continuous improvement on reliability and security.
7. Document architecture, operations procedures, and best practices for internal teams.
Required Skills:
- Extensive experience with AWS services (Lambda, S3, Kinesis, API Gateway, VPC, IAM, etc.).
- Strong proficiency in Python and the data science/ML ecosystem (NumPy, pandas, scikit‑learn, PyTorch/TensorFlow, etc.).
- Hands‑on experience with Docker, Kubernetes (EKS/AKS/GKE), and Terraform for IaC.
- Knowledge of API‑driven architectures and microservices.
- Practical use of Airflow for workflow orchestration; familiarity with Spark for big data processing.
- Expertise in CI/CD tooling (GitHub Actions, Jenkins, AWS CodePipeline, CodeBuild, CodeDeploy, or equivalent).
- Understanding of Generative AI concepts and Large Language Models (LLMs).
- Ability to troubleshoot complex cloud and application incidents, perform root‑cause analysis, and implement preventative measures.
Optional (Nice to Have):
- Experience with MLflow or other model management platforms.
- Working knowledge of AWS SageMaker for model training and deployment.
- Basic proficiency in Dutch language.
Required Education & Certifications:
- Bachelor’s or Master’s degree in Computer Science, Engineering, or related technical field.
- Relevant certifications such as AWS Certified Solutions Architect – Associate or Professional, Terraform Associate, or equivalent are highly desirable.