서브메뉴
검색
Generative Models of Vision and Action.
Generative Models of Vision and Action.
- 자료유형
- 학위논문
- Control Number
- 0017163752
- International Standard Book Number
- 9798342107396
- Dewey Decimal Classification Number
- 620
- Main Entry-Personal Name
- Gupta, Agrim.
- Publication, Distribution, etc. (Imprint
- [S.l.] : Stanford University., 2024
- Publication, Distribution, etc. (Imprint
- Ann Arbor : ProQuest Dissertations & Theses, 2024
- Physical Description
- 129 p.
- General Note
- Source: Dissertations Abstracts International, Volume: 86-04, Section: B.
- General Note
- Advisor: Li, Fei-Fei.
- Dissertation Note
- Thesis (Ph.D.)--Stanford University, 2024.
- Summary, Etc.
- 요약Animals and humans display remarkable ability at building internal representations of the world and using them to simulate, evaluate and select among different possible actions. This capability is learnt primarily from observation and without any supervision. Endowing autonomous agents with similar capabilities is a fundamental challenge in machine learning. In this thesis I will explore new algorithms that enable scalable representation learning from videos via prediction, generative models of visual data and their applications to robotics.To begin, I will discuss the challenges associated with using predictive learning objectives to learn visual representations. I'll introduce a simple predictive learning architecture and objective that enables learning visual representations capable of solving a wide range of visual correspondence tasks in a zero-shot manner. Subsequently, I'll present a transformer-based approach for photorealistic video generation via diffusion modeling. Our approach jointly compresses images and videos within a unified latent space, enabling training and generation across modalities. Finally, I will illustrate the practical applications of generative models for robot learning. Our non-autoregressive, action-conditioned video generation model can act as a world model, enabling embodied agents to plan using visual model-predictive control. Furthermore, I'll showcase a generalist agent trained via next token prediction to learn from diverse robotic experiences across various robots and tasks.
- Subject Added Entry-Topical Term
- Robots.
- Subject Added Entry-Topical Term
- Success.
- Subject Added Entry-Topical Term
- Failure analysis.
- Subject Added Entry-Topical Term
- Video recordings.
- Subject Added Entry-Topical Term
- Semantics.
- Subject Added Entry-Topical Term
- Film studies.
- Subject Added Entry-Topical Term
- Logic.
- Subject Added Entry-Topical Term
- Robotics.
- Added Entry-Corporate Name
- Stanford University.
- Host Item Entry
- Dissertations Abstracts International. 86-04B.
- Electronic Location and Access
- 로그인을 한후 보실 수 있는 자료입니다.
- Control Number
- joongbu:657555