cover image
Together AI

Together AI

together.ai

5 Jobs

247 Employees

About the Company

Together AI is a research-driven AI cloud infrastructure provider. Our purpose-built GPU cloud platform empowers AI engineers and researchers to train, fine-tune, and run frontier class AI models. Our customers include leading SaaS companies such as Salesforce, Zoom, and Zomato, as well as pioneering AI startups like ElevenLabs, Hedra, and Cartesia. We advocate for open source AI and believe that transparent AI systems will drive innovation and create the best outcomes for society.

Listed Jobs

Company background Company brand
Company Name
Together AI
Job Title
Senior Software Development Engineer in Test
Job Description
**Job Title** Senior Software Development Engineer in Test **Role Summary** Lead the design, implementation, and maintenance of automated test frameworks using Typescript, Golang, and Python. Partner with engineering and product stakeholders to define quality objectives, drive test strategy, and ensure high‑quality releases. **Expectations** - Independent, self‑motivated execution of complex testing tasks. - Consistent delivery of robust automation that increases test coverage and efficiency. - Commitment to continuous improvement in quality and testing practices. **Key Responsibilities** - Develop and enforce a sustainable test automation strategy across teams. - Establish QA best practices, processes, and documentation aligned with roadmap and resource constraints. - Define test strategies and plans in collaboration with engineering and product groups. - Build and maintain test automation frameworks (e.g., Cypress) for functional, performance, and reliability testing. - Write, maintain, and execute automated scripts, including regression and API tests. - Conduct automated regression testing to validate software changes. - Document test processes, findings, and results for reporting and knowledge sharing. - Stay current on emerging testing tools, methodologies, and industry trends. **Required Skills** - Proficiency in Golang, Python, or TypeScript. - Strong background in software test automation and quality assurance. - Experience with Cypress, REST API testing, or K6. - Solid understanding of automation testing methodologies, tools, and best practices. - Excellent problem‑solving, attention to detail, and communication skills. - Ability to collaborate effectively with cross‑functional teams. **Required Education & Certifications** - Bachelor’s degree in Computer Science, Software Engineering, or related field *or* 5+ years of relevant industry experience. ---
San francisco, United states
Hybrid
Senior
03-11-2025
Company background Company brand
Company Name
Together AI
Job Title
Machine Learning Operations (MLOps) Engineer
Job Description
**Job Title:** Machine Learning Operations (MLOps) Engineer **Role Summary:** Design, develop, and maintain production‑grade ML inference and fine‑tuning pipelines for large language models (LLMs). Deliver scalable, automated systems that enable rapid deployment, evaluation, and operation of AI services for customers and internal teams. **Expectations:** - 5+ years of professional experience building ML training or inference systems at scale. - Demonstrated expertise in deploying LLMs and optimizing their runtime performance. - Proven knowledge of CI/CD, containerization, Kubernetes, and cloud infrastructure. **Key Responsibilities:** - Collaborate with engineering, research, and sales to deploy and operate inference pipelines for customers and internal use. - Build, maintain, and document tools, services, and automation workflows for testing, monitoring, and scaling ML workloads. - Analyze system performance, identify bottlenecks, and implement improvements to efficiency, reliability, and cost‑effectiveness. - Conduct design and code reviews to uphold code quality and best practices. - Participate in on‑call rotation to respond to production incidents and outages. **Required Skills:** - Strong background in machine learning with a focus on state‑of‑the‑art LLMs. - Proficiency in Python and at least one additional language (e.g., Go). - Experience with ML frameworks such as TensorFlow, PyTorch, or Scikit‑learn. - Deep familiarity with DevOps practices: CI/CD pipelines, automated testing, Docker containerization, Kubernetes orchestration. - Competence in cloud platforms: AWS, Google Cloud Platform, or Microsoft Azure. - Ability to design, implement, and document robust production‑grade APIs and services. **Required Education & Certifications:** - Bachelor’s degree in Computer Science, Engineering, or related field, or equivalent industry experience. - Valid certifications in cloud platforms (AWS, GCP, or Azure) are a plus but not mandatory.
San francisco, United states
On site
Mid level
12-11-2025
Company background Company brand
Company Name
Together AI
Job Title
Machine Learning Engineer - Inference
Job Description
**Job Title:** Machine Learning Engineer - Inference **Role Summary:** Design and optimize high-performance AI inference systems for large language models, collaborating with researchers and engineers to deliver scalable, production-ready solutions. **Expectations:** - 3+ years of production-quality code experience. - Proficiency in Python, PyTorch, and high-performance system design. - Strong understanding of low-level OS concepts (threading, memory, networking). **Key Responsibilities:** - Develop and optimize AI inference engine systems for reliability and scalability. - Build runtime services for large-scale AI applications. - Collaborate with cross-functional teams to implement research into production features. - Conduct design/code reviews to maintain high quality standards. - Create tools, documentation, and infrastructure for data ingestion and processing. **Required Skills:** - Python, PyTorch, and high-performance library/tooling development. - Low-level OS expertise: multi-threading, memory management, networking. - Prior experience with AI inference systems (e.g., TGI, vLLM) preferred. - Knowledge of inference techniques (e.g., speculative decoding) and CUDA/Triton programming. - Familiarity with Rust, Cython, or compilers a bonus. **Required Education & Certifications:** Not specified.
San francisco, United states
Hybrid
Junior
14-12-2025
Company background Company brand
Company Name
Together AI
Job Title
LLM Inference Frameworks and Optimization Engineer
Job Description
**Job Title** LLM Inference Frameworks and Optimization Engineer **Role Summary** Design, develop, and optimize large‑scale, low‑latency inference engines for text, image, and multimodal models. Focus on distributed parallelism, GPU/accelerator efficiency, and software‑hardware co‑design to deliver high‑throughput, fault‑tolerant AI deployment. **Expectations** - Lead end‑to‑end development of inference pipelines for LLMs and vision models at scale. - Demonstrate measurable improvements in latency, throughput, or cost per inference. - Collaborate cross‑functionally with hardware, research, and infrastructure teams. - Deliver production‑ready, maintainable code in Python/C++ with CUDA. - Communicate technical trade‑offs to stakeholders. **Key Responsibilities** - Build fault‑tolerant, high‑concurrency distributed inference engines for multimodal generation. - Engineer parallelism strategies (Mixture of Experts, tensor, pipeline parallelism). - Apply CUDA graph, TensorRT/TRT‑LLM, and PyTorch compilation (torch.compile) optimizations. - Perform cache system tuning (e.g., Mooncake, PagedAttention). - Conduct performance bottleneck analysis and co‑optimize GPU/TPU/custom accelerator workloads. - Integrate model execution plans into end‑to‑end serving pipelines. - Maintain code quality, documentation, and automated testing. **Required Skills** - 3+ years deep‑learning inference, distributed systems, or HPC experience. - Proficient in Python & C++/CUDA; familiarity with GPU programming (CUDA/Triton/TensorRT). - Deep knowledge of transformer, large‑language, vision, and diffusion model optimization. - Experience with LLM inference frameworks (TensorRT‑LLM, vLLM, SGLang, TGI). - Knowledge of model quantization, KV cache systems, and distributed scheduling. - Strong analytical, problem‑solving, and performance‑driven mindset. - Excellent collaboration and communication skills. **Nice‑to‑Have** - RDMA/RoCE, distributed filesystems (HDFS, Ceph), Kubernetes experience. - Contributions to open‑source inference projects. **Required Education & Certifications** - Bachelor’s degree (or higher) in Computer Science, Electrical Engineering, or related field. - Certifications in GPU programming or distributed systems are a plus.
San francisco, United states
On site
Junior
14-12-2025