중부대학교 도서관

본문 바로가기
탑 메뉴 바로가기
주 메뉴 바로가기
하단 바로가기

상세정보

Active Visual Recognition in Open World.

자료유형: 학위논문

Control Number: 0017161464

International Standard Book Number: 9798382757544

Dewey Decimal Classification Number: 621.3

Main Entry-Personal Name: Fan, Lei.

Publication, Distribution, etc. (Imprint: [S.l.] : Northwestern University., 2024

Publication, Distribution, etc. (Imprint: Ann Arbor : ProQuest Dissertations & Theses, 2024

Physical Description: 184 p.

General Note: Source: Dissertations Abstracts International, Volume: 85-11, Section: B.

General Note: Advisor: Wu, Ying.

Dissertation Note: Thesis (Ph.D.)--Northwestern University, 2024.

Summary, Etc.: 요약Visual recognition, a core area of computer vision, aims to interpret semantic information from images and videos. With advancements in deep neural networks and substantial training data, this field has seen dramatic developments and diverse applications, ranging from augmented reality to autonomous robotics. Traditional research predominantly focuses on passive recognition, where pre-captured visual data are analyzed without interactive engagement from the system. This overlooks the potential of active visual recognition, where an agent actively alters its viewpoint to optimize recognition performance.Active visual recognition enables agents to actively gather visual information based on their own incentives. It offers a solution to overcome challenges such as visual occlusions, ambiguous viewpoints, poor lighting, and other undesired viewing conditions. By taking actions, agents can acquire novel and informative observations. Furthermore, active visual recognition is essential for various embodied AI applications, such as robotic grasping and navigation, enabling robots to move and explore their environments intelligently.Although different studies have explored active recognition behaviors and environmental modelings, these efforts have largely been confined to closed-world scenarios. In such settings, the agent's operational environment is restricted, lacking both expansiveness and dynamism, and the categories of objects it interacts with are limited to a predefined set. This dissertation aims to extend the boundaries of active visual recognition into more open, dynamic environments, thereby broadening the practical applicability of active recognition.In search of a more flexible and scalable solution to overcome the limitations of closed-world scenarios, we focus on three major challenges when developing active recognition in an open-world context. The first challenge is the training collapse issue in active recognition, where the agent fails to learn an effective recognition policy from an underdeveloped recognition module. The second challenge involves modeling visual uncertainty in active recognition, especially when the agent encounters unexpected observations during exploration. The third challenge is enabling the active recognition agent to handle novel object categories.In this dissertation, we model three open-world challenges from both theoretical and practical perspectives.To address the issue of training collapse, we propose integrating an additional adversarial policy that continually disturbs the recognition agent during training. This forms a competitive game, promoting active exploration and preventing the agent from converging on singular solutions. The reinforced adversary, rewarded when recognition fails, challenges the recognition agent by directing the camera towards difficult observations.For understanding uncertainty in active recognition, we first propose a novel method to assess recognition uncertainty in image-based recognition. We model two types of uncertainties from distinct sources using the Dempster-Shafer theory of evidence. We then deploy this uncertainty measurement method in active recognition agents and treat active recognition as a sequential evidence-gathering process. Additionally, to evaluate performance, we collect a dataset from an indoor simulator that includes various recognition challenges, such as distance, occlusion levels, and visibility. The third challenge involves enabling active recognition agents to identify newly discovered categories. The solution divides into two branches. The first explores the potential for continual learning of novel concepts with few training samples on the fly. The proposed agent integrates prototypes, a robust representation for limited training samples, into a reinforcement learning framework, encouraging the agent to move towards views that yield more discriminative features. Catastrophic forgetting during continual learning is mitigated through knowledge distillation. The second branch focuses on direct open-vocabulary active recognition, which ambitiously aims to recognize all nouns actively. We propose an open-vocabulary active recognition agent that combines current vision-language models with a semantic-agnostic policy to enable intelligent movement for open-vocabulary recognition.

Subject Added Entry-Topical Term: Electrical engineering.

Subject Added Entry-Topical Term: Computer science.

Subject Added Entry-Topical Term: Information technology.

Index Term-Uncontrolled: Active recognition

Index Term-Uncontrolled: Embodied AI

Index Term-Uncontrolled: Uncertainty estimation

Index Term-Uncontrolled: Visual recognition

Added Entry-Corporate Name: Northwestern University Electrical and Computer Engineering

Host Item Entry: Dissertations Abstracts International. 85-11B.

Electronic Location and Access: 로그인을 한후 보실 수 있는 자료입니다.

Control Number: joongbu:658288

008250224s2024        us  ||||||||||||||c||eng  d
■001000017161464
■00520250211151400
■006m          o    d
■007cr#unu||||||||
■020    ▼a9798382757544
■035    ▼a(MiAaPQ)AAI31244241
■040    ▼aMiAaPQ▼cMiAaPQ
■0820  ▼a621.3
■1001  ▼aFan,  Lei.
■24510▼aActive  Visual  Recognition  in  Open  World.
■260    ▼a[S.l.]▼bNorthwestern  University.  ▼c2024
■260  1▼aAnn  Arbor▼bProQuest  Dissertations  &  Theses▼c2024
■300    ▼a184  p.
■500    ▼aSource:  Dissertations  Abstracts  International,  Volume:  85-11,  Section:  B.
■500    ▼aAdvisor:  Wu,  Ying.
■5021  ▼aThesis  (Ph.D.)--Northwestern  University,  2024.
■520    ▼aVisual  recognition,  a  core  area  of  computer  vision,  aims  to  interpret  semantic  information  from  images  and  videos.  With  advancements  in  deep  neural  networks  and  substantial  training  data,  this  field  has  seen  dramatic  developments  and  diverse  applications,  ranging  from  augmented  reality  to  autonomous  robotics.  Traditional  research  predominantly  focuses  on  passive  recognition,  where  pre-captured  visual  data  are  analyzed  without  interactive  engagement  from  the  system.  This  overlooks  the  potential  of  active  visual  recognition,  where  an  agent  actively  alters  its  viewpoint  to  optimize  recognition  performance.Active  visual  recognition  enables  agents  to  actively  gather  visual  information  based  on  their  own  incentives.  It  offers  a  solution  to  overcome  challenges  such  as  visual  occlusions,  ambiguous  viewpoints,  poor  lighting,  and  other  undesired  viewing  conditions.  By  taking  actions,  agents  can  acquire  novel  and  informative  observations.  Furthermore,  active  visual  recognition  is  essential  for  various  embodied  AI  applications,  such  as  robotic  grasping  and  navigation,  enabling  robots  to  move  and  explore  their  environments  intelligently.Although  different  studies  have  explored  active  recognition  behaviors  and  environmental  modelings,  these  efforts  have  largely  been  confined  to  closed-world  scenarios.  In  such  settings,  the  agent's  operational  environment  is  restricted,  lacking  both  expansiveness  and  dynamism,  and  the  categories  of  objects  it  interacts  with  are  limited  to  a  predefined  set.  This  dissertation  aims  to  extend  the  boundaries  of  active  visual  recognition  into  more  open,  dynamic  environments,  thereby  broadening  the  practical  applicability  of  active  recognition.In  search  of  a  more  flexible  and  scalable  solution  to  overcome  the  limitations  of  closed-world  scenarios,  we  focus  on  three  major  challenges  when  developing  active  recognition  in  an  open-world  context.  The  first  challenge  is  the  training  collapse  issue  in  active  recognition,  where  the  agent  fails  to  learn  an  effective  recognition  policy  from  an  underdeveloped  recognition  module.  The  second  challenge  involves  modeling  visual  uncertainty  in  active  recognition,  especially  when  the  agent  encounters  unexpected  observations  during  exploration.  The  third  challenge  is  enabling  the  active  recognition  agent  to  handle  novel  object  categories.In  this  dissertation,  we  model  three  open-world  challenges  from  both  theoretical  and  practical  perspectives.To  address  the  issue  of  training  collapse,  we  propose  integrating  an  additional  adversarial  policy  that  continually  disturbs  the  recognition  agent  during  training.  This  forms  a  competitive  game,  promoting  active  exploration  and  preventing  the  agent  from  converging  on  singular  solutions.  The  reinforced  adversary,  rewarded  when  recognition  fails,  challenges  the  recognition  agent  by  directing  the  camera  towards  difficult  observations.For  understanding  uncertainty  in  active  recognition,  we  first  propose  a  novel  method  to  assess  recognition  uncertainty  in  image-based  recognition.  We  model  two  types  of  uncertainties  from  distinct  sources  using  the  Dempster-Shafer  theory  of  evidence.  We  then  deploy  this  uncertainty  measurement  method  in  active  recognition  agents  and  treat  active  recognition  as  a  sequential  evidence-gathering  process.  Additionally,  to  evaluate  performance,  we  collect  a  dataset  from  an  indoor  simulator  that  includes  various  recognition  challenges,  such  as  distance,  occlusion  levels,  and  visibility. The  third  challenge  involves  enabling  active  recognition  agents  to  identify  newly  discovered  categories.  The  solution  divides  into  two  branches.  The  first  explores  the  potential  for  continual  learning  of  novel  concepts  with  few  training  samples  on  the  fly.  The  proposed  agent  integrates  prototypes,  a  robust  representation  for  limited  training  samples,  into  a  reinforcement  learning  framework,  encouraging  the  agent  to  move  towards  views  that  yield  more  discriminative  features.  Catastrophic  forgetting  during  continual  learning  is  mitigated  through  knowledge  distillation.  The  second  branch  focuses  on  direct  open-vocabulary  active  recognition,  which  ambitiously  aims  to  recognize  all  nouns  actively.  We  propose  an  open-vocabulary  active  recognition  agent  that  combines  current  vision-language  models  with  a  semantic-agnostic  policy  to  enable  intelligent  movement  for  open-vocabulary  recognition.
■590    ▼aSchool  code:  0163.
■650  4▼aElectrical  engineering.
■650  4▼aComputer  science.
■650  4▼aInformation  technology.
■653    ▼aActive  recognition
■653    ▼aEmbodied  AI
■653    ▼aUncertainty  estimation
■653    ▼aVisual  recognition
■690    ▼a0544
■690    ▼a0489
■690    ▼a0984
■690    ▼a0800
■71020▼aNorthwestern  University▼bElectrical  and  Computer  Engineering.
■7730  ▼tDissertations  Abstracts  International▼g85-11B.
■790    ▼a0163
■791    ▼aPh.D.
■792    ▼a2024
■793    ▼aEnglish
■85640▼uhttp://www.riss.kr/pdu/ddodLink.do?id=T17161464▼nKERIS▼z이  자료의  원문은  한국교육학술정보원에서  제공합니다.

New Books MORE

Related books MORE

최근 3년간 통계입니다.

הזמנה
캠퍼스간 도서대출
서가에 없는 책 신고
보존서고대출신청
התיקיה שלי

גשמי
Reg No.	Call No.	מיקום	מצב	להשאיל מידע
TQ0034606	T	원문자료	열람가능/출력가능	열람가능/출력가능 마이폴더 부재도서신고

* הזמנות זמינים בספר ההשאלה. כדי להזמין, נא לחץ על כפתור ההזמנה

본문

서브메뉴

검색

상세정보

MARC

미리보기

내보내기

chatGPT토론

Ai 추천 관련 도서

New Books MORE

Related books MORE

최근 3년간 통계입니다.

פרט מידע

해당 도서를 다른 이용자가 함께 대출한 도서

Related books

Related Popular Books

도서위치

QUICK LINK