cover image
Alibaba Cloud

Alibaba Cloud

www.alibabacloud.com

7 Jobs

4,580 Employees

About the Company

Established in September 2009, Alibaba Cloud develops highly scalable cloud computing and data management services providing large and small businesses, financial institutions, governments and other organizations with flexible, cost-effective solutions to meet their networking and information needs. A business of Alibaba Group, one of the world’s largest e-commerce companies, Alibaba Cloud operates the network that powers Alibaba Group’s extensive online and mobile commerce ecosystem and sells a comprehensive suite of cloud computing services to support sellers and other third-party entities participating in this ecosystem.


Follow us:
Twitter: www.twitter.com/alibaba_cloud
Facebook: https://www.facebook.com/alibabacloud/

Listed Jobs

Company background Company brand
Company Name
Alibaba Cloud
Job Title
Quality Assurance Engineer
Job Description
**Job title** Quality Assurance Engineer – Construction Testing & Commissioning **Role Summary** Lead on‑site supervision and technical testing of electrical and mechanical facilities for large‑scale data center projects. Plan, execute, and document all facility testing and commissioning activities to ensure compliance with design specifications, safety regulations, and performance targets while engaging stakeholders and escalating issues as needed. **Expectations** - Allocate at least 30 % of work time to on‑site activities. - Deliver comprehensive testing plans, execute tests, and close out commissioning in schedule and cost. - Communicate findings clearly to both technical and non‑technical audiences. - Maintain rigorous documentation, reports, and audit trails. - Resolve complex technical problems and coordinate inter‑disciplinary team efforts. **Key Responsibilities** 1. Supervise and coordinate on‑site construction testing of electrical, mechanical, HVAC, fire, plumbing, and monitoring systems. 2. Develop and execute detailed testing and commissioning plans, including test scripts, qualification procedures, and acceptance criteria. 3. Collect, analyze, and document test results; produce concise reports and traceability matrices. 4. Ensure adherence to safety, quality, and regulatory standards; conduct inspections and verify corrective actions. 5. Escalate technical issues to project stakeholders; facilitate cross‑functional meetings and progress updates. 6. Collaborate with design, procurement, and construction teams to identify and mitigate risks. 7. Validate system performance against design specifications and operational requirements. **Required Skills** - Minimum 5 years of experience in facility testing and commissioning for large‑scale infrastructure projects. - Strong knowledge of construction quality management and on‑site supervision. - Proficient in creating and executing test plans, interpreting test results, and reporting. - Excellent stakeholder communication and presentation skills. - Analytical problem‑solving ability and strong documentation skills. - Project management and coordination aptitude; ability to work cross‑functionally. - Competence in electrical and mechanical systems: power, cooling, ventilation, fire‑fighting, plumbing, drainage, and monitoring. **Required Education & Certifications** - Bachelor’s degree in Electrical Engineering, Mechanical Engineering, or related field. - Master’s degree or Professional Engineer (PE) license preferred. - Any additional certifications (e.g., PMP, CPK, ASHRAE) advantageous.
Sunnyvale, United states
On site
Mid level
31-12-2025
Company background Company brand
Company Name
Alibaba Cloud
Job Title
Cloud Network SRE Engineer
Job Description
**Job Title:** Cloud Network SRE Engineer **Role Summary:** Assure the reliability, scalability, and performance of a large‑scale cloud networking platform. Design and maintain automated operations, monitor and troubleshoot incidents, and collaborate on architecture enhancements to keep business services available and secure. **Expectations:** * Deliver rapid resolution of network incidents within SLA windows. * Drive automation and process standardization to improve operational efficiency. * Actively research and apply emerging networking technologies to enhance stability. * Communicate effectively through clear documentation and cross‑team collaboration. **Key Responsibilities:** 1. Maintain and improve cloud networking stability, ensuring continuous user service. 2. Design, implement, and manage automated operations systems and tooling. 3. Monitor, alert, and troubleshoot network issues; respond quickly to incidents. 4. Participate in architecture reviews and performance optimization of network services. 5. Track industry trends in cloud networking to recommend innovative improvements. 6. Manage on‑call duties and service‑level incident resolution. **Required Skills:** * 3+ years of cloud computing or network operations experience. * Proficiency in scripting/programming (Python, Golang, Java, or similar). * Strong Linux system administration and command‑line proficiency. * Experience with public cloud platforms (AliCloud, AWS, Azure) and their networking products. * Knowledge of databases (MySQL, Redis) for troubleshooting. * Solid understanding of network protocols, distributed systems, and performance tuning. * Strong troubleshooting and incident response capabilities. * Excellent written and verbal communication for documentation and collaboration. **Required Education & Certifications:** * Bachelor’s degree in Computer Science, Information Technology, or related field. * Relevant certifications such as AWS Certified Solutions Architect, Azure Network Engineer Associate, or equivalent (preferred but not mandatory).
Sunnyvale, United states
On site
Junior
31-12-2025
Company background Company brand
Company Name
Alibaba Cloud
Job Title
Cloud Platform SRE
Job Description
Job Title: Cloud Platform SRE Role Summary: Ensure uninterrupted, highly available production environments for enterprise‑grade cloud services. Develop and enforce stability standards, manage incidents, automate reliability tooling, and support large‑scale customer events to maintain >99.99% uptime. Expectations: * 24/7 on‑call rotations with SLA‑compliant response. * Rapid incident triage, root‑cause analysis, and post‑mortem reviews. * Continuous improvement of stability metrics and automation pipelines. * Collaboration with R&D for production readiness and critical peak‑period support. Key Responsibilities: * Daily operations, monitoring, and maintenance of applications, databases, and middleware. * Incident response, cross‑team coordination, and root‑cause analysis. * Design and enforce stability standards, metrics, and governance campaigns. * Lead full‑stack disaster recovery, phased change rollouts, and emergency response drills (1‑5‑10 model). * Build and maintain automated change‑management, monitoring, and alerting platforms. * Support large‑scale events (e.g., Olympics, peak business periods) with technical and operational planning. * Perform risk and vulnerability inspections, and conduct red/blue team exercises. * Provide expertise in capacity planning, performance diagnostics, and system hardening. Required Skills: * 3+ years of SRE/DevOps experience in cloud environments. * Strong knowledge of cloud infrastructure (AWS, Azure, GCP) and automation (Python, Bash). * Proficiency with monitoring/alerting tools (Prometheus, Grafana, CloudWatch, etc.). * Incident management, root‑cause analysis, and post‑mortem documentation. * Expertise in change management, disaster recovery, and high‑availability design. * Excellent communication, problem‑solving, and teamwork across distributed teams. * Familiarity with container orchestration (Kubernetes) and AIOps practices. Required Education & Certifications: * Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience). * Cloud & DevOps certifications such as AWS Certified Solutions Architect, Azure Solutions Architect, GCP Professional Cloud Architect, or CNCF Certified Kubernetes Administrator (CKA). * Incident response or related professional certifications preferred.
Sunnyvale, United states
On site
19-01-2026
Company background Company brand
Company Name
Alibaba Cloud
Job Title
Cloud Infrastructure – Site Reliability Engineer (SRE)
Job Description
**Job Title:** Cloud Infrastructure – Site Reliability Engineer (SRE) **Role Summary:** Responsible for ensuring the stability, performance, and high‑availability of cloud‑native messaging middleware (e.g., Kafka, RocketMQ) on Kubernetes. Drive automation, incident response, and reliability engineering practices to deliver resilient, scalable services. **Expectations:** - Minimum 2 + years experience in distributed systems reliability or SRE. - Proficient in Python, Go, or Java for tooling and automation. - Hands‑on with Kubernetes deployment (Helm/Operator) and IaC (Terraform preferred). - Ability to design, implement, and maintain HA architectures and automate operational workflows. **Key Responsibilities:** - Design and maintain high‑availability, performance‑tuned middleware architecture. - Manage full lifecycle of containerized middleware on K8s (deployment, autoscaling, upgrades, resource optimization). - Lead incident response, root‑cause analysis, and post‑mortem reviews using logs, tracing, and monitoring. - Develop diagnostic and automation tools (Python/Go/Shell) for troubleshooting and disaster recovery. - Implement chaos engineering, capacity planning, and failover strategies. - Build and maintain IaC scripts and automation pipelines for deployment and monitoring. **Required Skills:** - Distributed systems reliability engineering (2+ years). - Strong scripting/programming in Python, Go, or Java. - Kubernetes orchestration (Helm, Operators) and container lifecycle management. - Experience with messaging platforms Kafka and/or RocketMQ. - Automation/IaC tools: Terraform, Shell scripting, CI/CD pipelines. - Monitoring & tracing tools, incident management, root‑cause analysis. **Required Education & Certifications:** - Bachelor’s degree in Computer Science, Engineering, or related technical field (or equivalent practical experience). - No specific certifications required; relevant SRE or cloud certifications are a plus.
Sunnyvale, United states
On site
Junior
19-01-2026