Cerebras Systems

www.cerebras.ai

2 Jobs

694 Employees

About the Company

Cerebras Systems is accelerating the future of generative AI.
We're a team of pioneering computer architects, deep learning researchers, and engineers building a new class of AI supercomputers from the ground up.

Our flagship system, Cerebras CS-3, is powered by the Wafer-Scale Engine-3--the world's largest and fastest AI processor. CS-3s are effortlessly clustered to create the largest AI supercomputers on Earth, while abstracting away the complexity of traditional distributed computing.

From sub-second inference speeds to breakthrough training performance, Cerebras makes it easier to build and deploy state-of-the-art AI--from proprietary enterprise models to open-source projects downloaded millions of times.

Here's what makes our platform different:
Sub-second reasoning - Instant intelligence and real-time responsiveness, even at massive scale
Blazing-fast inference - Up to 100x performance gains over traditional AI infrastructure
Agentic AI in action - Models that can plan, act, and adapt autonomously
Scalable infrastructure - Built to move from prototype to global deployment without friction

Cerebras solutions are available in the Cerebras Cloud or on-prem, serving leading enterprises, research labs, and government agencies worldwide.

Learn more: www.cerebras.aiJoin us: https://cerebras.net/careers/

Listed Jobs

Company Name: Cerebras Systems
Job Title: Security Operations Center Manager
Job Description: Job Title: Security Operations Center Manager Role Summary: Lead and manage day‑to‑day SOC operations for a 24/7 security environment. Own the operating rhythm, maintain investigation quality, manage escalation decisions, and drive continuous improvement. Expectations: Deliver high‑quality investigations, sustainable coverage across time zones, and decisive incident response. Serve as technical escalation lead for high‑severity events. Build and develop a competent SOC team. Key Responsibilities: - Design and mature a scalable 24/7 SOC model with resilient coverage. - Balance workloads and sustain high‑quality investigations. - Enforce standards for investigation quality, incident management, documentation, and escalation. - Lead response and serve as technical escalation point during high‑severity incidents. - Identify and implement initiatives to improve SOC performance, tooling, and maturity. - Hire, onboard, coach, and manage SOC analysts. - Define, track, and analyze SOC performance metrics to drive improvements. - Coordinate cross‑functionally and oversee post‑incident follow‑through. Required Skills: - 5+ years in security operations, incident response, or security engineering. - Proven leadership in SOC or shift operations. - Experience building or operating a 24/7 SOC. - Hands‑on investigation across endpoint, identity, cloud, and network telemetry. - Leadership of incident response for high‑severity events. - Working proficiency in one programming or scripting language. - Operational judgment and escalation decision ability under pressure. - Ability to define and use operational metrics. - Experience hiring, developing, and managing technical teams. - Strong written communication and documentation skills. Preferred: Experience with centralized security data lakes, production detections, automation, AI‑driven investigation, and translating findings into durable program improvements. Required Education & Certifications: Not specified.

Sunnyvale, United states

On site

Mid level

17-02-2026

Company Name: Cerebras Systems
Job Title: Applied Machine Learning Research Scientist
Job Description: Job Title: Applied Machine Learning Research Scientist Role Summary: Develop and optimize end‑to‑end machine learning pipelines for large language models (LLMs) on a wafer‑scale AI platform. Translate research ideas into production‑ready systems, focusing on training, fine‑tuning, reinforcement learning post‑training, evaluation, and performance tuning. Expectations: - Deliver high‑quality, maintainable code that scales to large datasets and distributed training. - Achieve measurable improvements in model performance and training efficiency. - Collaborate with researchers and senior engineers to translate research insights into scalable solutions. - Maintain rigorous testing and evaluation standards across tasks and domains. Key Responsibilities: - Apply post‑training techniques (RLVR, RLHF, GRPO, etc.) to boost model performance. - Build, maintain, and extend evaluation pipelines for multi‑domain LLM metrics. - Debug end‑to‑end ML stack issues, including data ingestion, training jobs, precision handling, and model outputs. - Design, implement, and scale pipelines for pretraining, fine‑tuning, alignment, and reinforcement learning stages. - Manage large datasets: generation, filtering, synthetic augmentation, and data pipeline optimization. - Optimize training and inference workflows for speed, efficiency, and reliability on wafer‑scale hardware. - Contribute high‑quality, maintainable code to shared infrastructure and open‑source projects where applicable. Required Skills: - Strong programming in Python with experience in PyTorch. - Solid foundation in machine learning fundamentals and deep‑learning architectures (especially transformers). - Ability to read, interpret, and implement concepts from contemporary ML research papers. - Experience debugging and optimizing ML systems across data pipelines, training jobs, and model inference. - Familiarity with distributed training frameworks (e.g., FSDP, Megatron). - Experience working with large‑scale datasets and data pipelines. - Knowledge of reinforcement learning concepts and post‑training methods. Required Education & Certifications: - Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field. - 0–5 years of experience (internships, research, or industry) in machine learning systems.

Sunnyvale, United states

On site

Fresher

05-03-2026