Skills

Python Dynamics Test Research Training Machine Learning PyTorch Computer Vision Autonomy AWS Robotics

Job Specifications

At Toyota Research Institute (TRI), we’re on a mission to improve the quality of human life. We’re developing new tools and capabilities to amplify the human experience. To lead this transformative shift in mobility, we’ve built a world-class team advancing the state of the art in AI, robotics, driving, and material sciences.

The Mission

Make general-purpose robots a reality.

The Challenge

We envision a future where robots assist with household chores and cooking, aid the elderly in maintaining their independence, and enable people to spend more time on the activities they enjoy most. To achieve this, robots must be able to operate reliably in complex, unstructured environments. Our mission is to answer the question “What will it take to create truly general-purpose robots that can accomplish a wide variety of tasks in settings like human homes with minimal human supervision?” We believe that the answer lies in cultivating large-scale datasets of physical interaction from a variety of sources and building on the latest advances in machine learning to learn general purpose robot behaviors from this data.

The Team

The Learning From Videos (LFV) team in the Robotics division focuses on the development of foundation models capable of leveraging large-scale multi-modal (RGB, depth, flow, semantics, bounding boxes, tactile, audio, etc.) data from multiple domains (driving, robotics, indoors, outdoors, etc.) to improve the performance of downstream tasks. This paradigm targets training scalability, since data from multiple modalities can be equally leveraged to learn useful data-driven priors (3D geometry, physics, dynamics, etc) for world understanding. Our topics of interest include, but are not limited to, Video Generation, World Models, 4D Reconstruction, Multi-Modal Models, Multi-View Geometry, Data Augmentation, and Video-Language-Action models, with a primary focus on foundation models for embodied applications. We are aiming to make progress on some of the hardest scientific challenges around spatio-temporal reasoning, and how it can lead to the deployment of autonomous agents in real-world unstructured environments.

The Opportunity

Our Learning From Videos (LFV) team is looking for a Computer Vision Research Scientist with expertise in Video Generation, Spatio-temporal Representation Learning, World Models, Foundation Models, Multi-Modal Learning, Vision-as-Inverse-Graphics (including Differentiable Rendering), or related fields, to improve dynamic scene understanding for robots. We are working on some of the hardest scientific challenges around the safe and effective usage of large robotic fleets, simulation, and prior knowledge (geometry, physics, domain knowledge, behavioral science), not only for automation but also for human augmentation.

As a Research Scientist, you will work with a team proposing, conducting, and transferring innovative research. You will use large amounts of sensory data (real and synthetic) to address open problems, train models at scale, publish at top academic venues, and test your ideas in the real world (including on our robots). You will also work closely with other teams at TRI to transfer and ship our most successful algorithms and models towards world-scale long-term autonomy and advanced assistance systems.

Responsibilities
Conduct high-reaching research that solves problems of high value and validates them in well established benchmarks and systems.
Push the boundaries of knowledge and the state of the art in ML areas, including simulation, perception, prediction, and planning for autonomous driving and robotics.
Partner with a multidisciplinary team including other research scientists and engineers across the CV team, TRI, Toyota, and our university partners.
Present results in verbal and written communications, internally, at top international venues, and via open-source contributions to the community.
Work closely with robotics and machine learning researchers and engineers to understand theoretical and practical needs.
Lead collaborations with our external research partners and mentor research interns.
Follow best practices producing maintainable code, both for internal use as well as for open-sourcing to the scientific community.

Qualifications
PhD or equivalent years of experience in Machine Learning, Robotics, Computer Vision, or a related field.
Deep expertise in at least one key ML area among Computer Vision, Large-Scale Pre-Training, Multi-Modal Learning, World Models, 4D Reconstruction
Consistent record of publishing at high-impact conferences/journals (CVPR, ICLR, NeurIPS, RSS, ICRA, ICCV, ECCV, PAMI, IJCV, etc.) on the aforementioned topics.
Proficient at scientific Python, Unix, and a common DL framework (preferably PyTorch). Experience with distributed learning (especially on AWS) for large-scale training of foundation models is a plus.
You can identify, propose, and lead new research efforts, working in collaboration with other researchers

About the Company

At Toyota Research Institute (TRI), we're conducting research to amplify human ability, focusing on making our lives safer and more sustainable. Led by Dr. Gill Pratt, TRI's team of researchers develops technologies to advance automated driving, energy and materials, human-centered artificial intelligence, human interactive driving, large behavior models, and robotics. We're dedicated to building a world of "mobility for all" where everyone, regardless of age or ability, can live in harmony with technology to enjoy a better ... Know more