서브메뉴
검색
Active Visual Recognition in Open World.
Active Visual Recognition in Open World.
상세정보
- 자료유형
- 학위논문
- Control Number
- 0017161464
- International Standard Book Number
- 9798382757544
- Dewey Decimal Classification Number
- 621.3
- Main Entry-Personal Name
- Fan, Lei.
- Publication, Distribution, etc. (Imprint
- [S.l.] : Northwestern University., 2024
- Publication, Distribution, etc. (Imprint
- Ann Arbor : ProQuest Dissertations & Theses, 2024
- Physical Description
- 184 p.
- General Note
- Source: Dissertations Abstracts International, Volume: 85-11, Section: B.
- General Note
- Advisor: Wu, Ying.
- Dissertation Note
- Thesis (Ph.D.)--Northwestern University, 2024.
- Summary, Etc.
- 요약Visual recognition, a core area of computer vision, aims to interpret semantic information from images and videos. With advancements in deep neural networks and substantial training data, this field has seen dramatic developments and diverse applications, ranging from augmented reality to autonomous robotics. Traditional research predominantly focuses on passive recognition, where pre-captured visual data are analyzed without interactive engagement from the system. This overlooks the potential of active visual recognition, where an agent actively alters its viewpoint to optimize recognition performance.Active visual recognition enables agents to actively gather visual information based on their own incentives. It offers a solution to overcome challenges such as visual occlusions, ambiguous viewpoints, poor lighting, and other undesired viewing conditions. By taking actions, agents can acquire novel and informative observations. Furthermore, active visual recognition is essential for various embodied AI applications, such as robotic grasping and navigation, enabling robots to move and explore their environments intelligently.Although different studies have explored active recognition behaviors and environmental modelings, these efforts have largely been confined to closed-world scenarios. In such settings, the agent's operational environment is restricted, lacking both expansiveness and dynamism, and the categories of objects it interacts with are limited to a predefined set. This dissertation aims to extend the boundaries of active visual recognition into more open, dynamic environments, thereby broadening the practical applicability of active recognition.In search of a more flexible and scalable solution to overcome the limitations of closed-world scenarios, we focus on three major challenges when developing active recognition in an open-world context. The first challenge is the training collapse issue in active recognition, where the agent fails to learn an effective recognition policy from an underdeveloped recognition module. The second challenge involves modeling visual uncertainty in active recognition, especially when the agent encounters unexpected observations during exploration. The third challenge is enabling the active recognition agent to handle novel object categories.In this dissertation, we model three open-world challenges from both theoretical and practical perspectives.To address the issue of training collapse, we propose integrating an additional adversarial policy that continually disturbs the recognition agent during training. This forms a competitive game, promoting active exploration and preventing the agent from converging on singular solutions. The reinforced adversary, rewarded when recognition fails, challenges the recognition agent by directing the camera towards difficult observations.For understanding uncertainty in active recognition, we first propose a novel method to assess recognition uncertainty in image-based recognition. We model two types of uncertainties from distinct sources using the Dempster-Shafer theory of evidence. We then deploy this uncertainty measurement method in active recognition agents and treat active recognition as a sequential evidence-gathering process. Additionally, to evaluate performance, we collect a dataset from an indoor simulator that includes various recognition challenges, such as distance, occlusion levels, and visibility. The third challenge involves enabling active recognition agents to identify newly discovered categories. The solution divides into two branches. The first explores the potential for continual learning of novel concepts with few training samples on the fly. The proposed agent integrates prototypes, a robust representation for limited training samples, into a reinforcement learning framework, encouraging the agent to move towards views that yield more discriminative features. Catastrophic forgetting during continual learning is mitigated through knowledge distillation. The second branch focuses on direct open-vocabulary active recognition, which ambitiously aims to recognize all nouns actively. We propose an open-vocabulary active recognition agent that combines current vision-language models with a semantic-agnostic policy to enable intelligent movement for open-vocabulary recognition.
- Subject Added Entry-Topical Term
- Electrical engineering.
- Subject Added Entry-Topical Term
- Computer science.
- Subject Added Entry-Topical Term
- Information technology.
- Index Term-Uncontrolled
- Active recognition
- Index Term-Uncontrolled
- Embodied AI
- Index Term-Uncontrolled
- Uncertainty estimation
- Index Term-Uncontrolled
- Visual recognition
- Added Entry-Corporate Name
- Northwestern University Electrical and Computer Engineering
- Host Item Entry
- Dissertations Abstracts International. 85-11B.
- Electronic Location and Access
- 로그인을 한후 보실 수 있는 자료입니다.
- Control Number
- joongbu:658288
MARC
008250224s2024 us ||||||||||||||c||eng d■001000017161464
■00520250211151400
■006m o d
■007cr#unu||||||||
■020 ▼a9798382757544
■035 ▼a(MiAaPQ)AAI31244241
■040 ▼aMiAaPQ▼cMiAaPQ
■0820 ▼a621.3
■1001 ▼aFan, Lei.
■24510▼aActive Visual Recognition in Open World.
■260 ▼a[S.l.]▼bNorthwestern University. ▼c2024
■260 1▼aAnn Arbor▼bProQuest Dissertations & Theses▼c2024
■300 ▼a184 p.
■500 ▼aSource: Dissertations Abstracts International, Volume: 85-11, Section: B.
■500 ▼aAdvisor: Wu, Ying.
■5021 ▼aThesis (Ph.D.)--Northwestern University, 2024.
■520 ▼aVisual recognition, a core area of computer vision, aims to interpret semantic information from images and videos. With advancements in deep neural networks and substantial training data, this field has seen dramatic developments and diverse applications, ranging from augmented reality to autonomous robotics. Traditional research predominantly focuses on passive recognition, where pre-captured visual data are analyzed without interactive engagement from the system. This overlooks the potential of active visual recognition, where an agent actively alters its viewpoint to optimize recognition performance.Active visual recognition enables agents to actively gather visual information based on their own incentives. It offers a solution to overcome challenges such as visual occlusions, ambiguous viewpoints, poor lighting, and other undesired viewing conditions. By taking actions, agents can acquire novel and informative observations. Furthermore, active visual recognition is essential for various embodied AI applications, such as robotic grasping and navigation, enabling robots to move and explore their environments intelligently.Although different studies have explored active recognition behaviors and environmental modelings, these efforts have largely been confined to closed-world scenarios. In such settings, the agent's operational environment is restricted, lacking both expansiveness and dynamism, and the categories of objects it interacts with are limited to a predefined set. This dissertation aims to extend the boundaries of active visual recognition into more open, dynamic environments, thereby broadening the practical applicability of active recognition.In search of a more flexible and scalable solution to overcome the limitations of closed-world scenarios, we focus on three major challenges when developing active recognition in an open-world context. The first challenge is the training collapse issue in active recognition, where the agent fails to learn an effective recognition policy from an underdeveloped recognition module. The second challenge involves modeling visual uncertainty in active recognition, especially when the agent encounters unexpected observations during exploration. The third challenge is enabling the active recognition agent to handle novel object categories.In this dissertation, we model three open-world challenges from both theoretical and practical perspectives.To address the issue of training collapse, we propose integrating an additional adversarial policy that continually disturbs the recognition agent during training. This forms a competitive game, promoting active exploration and preventing the agent from converging on singular solutions. The reinforced adversary, rewarded when recognition fails, challenges the recognition agent by directing the camera towards difficult observations.For understanding uncertainty in active recognition, we first propose a novel method to assess recognition uncertainty in image-based recognition. We model two types of uncertainties from distinct sources using the Dempster-Shafer theory of evidence. We then deploy this uncertainty measurement method in active recognition agents and treat active recognition as a sequential evidence-gathering process. Additionally, to evaluate performance, we collect a dataset from an indoor simulator that includes various recognition challenges, such as distance, occlusion levels, and visibility. The third challenge involves enabling active recognition agents to identify newly discovered categories. The solution divides into two branches. The first explores the potential for continual learning of novel concepts with few training samples on the fly. The proposed agent integrates prototypes, a robust representation for limited training samples, into a reinforcement learning framework, encouraging the agent to move towards views that yield more discriminative features. Catastrophic forgetting during continual learning is mitigated through knowledge distillation. The second branch focuses on direct open-vocabulary active recognition, which ambitiously aims to recognize all nouns actively. We propose an open-vocabulary active recognition agent that combines current vision-language models with a semantic-agnostic policy to enable intelligent movement for open-vocabulary recognition.
■590 ▼aSchool code: 0163.
■650 4▼aElectrical engineering.
■650 4▼aComputer science.
■650 4▼aInformation technology.
■653 ▼aActive recognition
■653 ▼aEmbodied AI
■653 ▼aUncertainty estimation
■653 ▼aVisual recognition
■690 ▼a0544
■690 ▼a0489
■690 ▼a0984
■690 ▼a0800
■71020▼aNorthwestern University▼bElectrical and Computer Engineering.
■7730 ▼tDissertations Abstracts International▼g85-11B.
■790 ▼a0163
■791 ▼aPh.D.
■792 ▼a2024
■793 ▼aEnglish
■85640▼uhttp://www.riss.kr/pdu/ddodLink.do?id=T17161464▼nKERIS▼z이 자료의 원문은 한국교육학술정보원에서 제공합니다.