본문

서브메뉴

Active Visual Recognition in Open World.
Active Visual Recognition in Open World.

상세정보

자료유형  
 학위논문
Control Number  
0017161464
International Standard Book Number  
9798382757544
Dewey Decimal Classification Number  
621.3
Main Entry-Personal Name  
Fan, Lei.
Publication, Distribution, etc. (Imprint  
[S.l.] : Northwestern University., 2024
Publication, Distribution, etc. (Imprint  
Ann Arbor : ProQuest Dissertations & Theses, 2024
Physical Description  
184 p.
General Note  
Source: Dissertations Abstracts International, Volume: 85-11, Section: B.
General Note  
Advisor: Wu, Ying.
Dissertation Note  
Thesis (Ph.D.)--Northwestern University, 2024.
Summary, Etc.  
요약Visual recognition, a core area of computer vision, aims to interpret semantic information from images and videos. With advancements in deep neural networks and substantial training data, this field has seen dramatic developments and diverse applications, ranging from augmented reality to autonomous robotics. Traditional research predominantly focuses on passive recognition, where pre-captured visual data are analyzed without interactive engagement from the system. This overlooks the potential of active visual recognition, where an agent actively alters its viewpoint to optimize recognition performance.Active visual recognition enables agents to actively gather visual information based on their own incentives. It offers a solution to overcome challenges such as visual occlusions, ambiguous viewpoints, poor lighting, and other undesired viewing conditions. By taking actions, agents can acquire novel and informative observations. Furthermore, active visual recognition is essential for various embodied AI applications, such as robotic grasping and navigation, enabling robots to move and explore their environments intelligently.Although different studies have explored active recognition behaviors and environmental modelings, these efforts have largely been confined to closed-world scenarios. In such settings, the agent's operational environment is restricted, lacking both expansiveness and dynamism, and the categories of objects it interacts with are limited to a predefined set. This dissertation aims to extend the boundaries of active visual recognition into more open, dynamic environments, thereby broadening the practical applicability of active recognition.In search of a more flexible and scalable solution to overcome the limitations of closed-world scenarios, we focus on three major challenges when developing active recognition in an open-world context. The first challenge is the training collapse issue in active recognition, where the agent fails to learn an effective recognition policy from an underdeveloped recognition module. The second challenge involves modeling visual uncertainty in active recognition, especially when the agent encounters unexpected observations during exploration. The third challenge is enabling the active recognition agent to handle novel object categories.In this dissertation, we model three open-world challenges from both theoretical and practical perspectives.To address the issue of training collapse, we propose integrating an additional adversarial policy that continually disturbs the recognition agent during training. This forms a competitive game, promoting active exploration and preventing the agent from converging on singular solutions. The reinforced adversary, rewarded when recognition fails, challenges the recognition agent by directing the camera towards difficult observations.For understanding uncertainty in active recognition, we first propose a novel method to assess recognition uncertainty in image-based recognition. We model two types of uncertainties from distinct sources using the Dempster-Shafer theory of evidence. We then deploy this uncertainty measurement method in active recognition agents and treat active recognition as a sequential evidence-gathering process. Additionally, to evaluate performance, we collect a dataset from an indoor simulator that includes various recognition challenges, such as distance, occlusion levels, and visibility. The third challenge involves enabling active recognition agents to identify newly discovered categories. The solution divides into two branches. The first explores the potential for continual learning of novel concepts with few training samples on the fly. The proposed agent integrates prototypes, a robust representation for limited training samples, into a reinforcement learning framework, encouraging the agent to move towards views that yield more discriminative features. Catastrophic forgetting during continual learning is mitigated through knowledge distillation. The second branch focuses on direct open-vocabulary active recognition, which ambitiously aims to recognize all nouns actively. We propose an open-vocabulary active recognition agent that combines current vision-language models with a semantic-agnostic policy to enable intelligent movement for open-vocabulary recognition.
Subject Added Entry-Topical Term  
Electrical engineering.
Subject Added Entry-Topical Term  
Computer science.
Subject Added Entry-Topical Term  
Information technology.
Index Term-Uncontrolled  
Active recognition
Index Term-Uncontrolled  
Embodied AI
Index Term-Uncontrolled  
Uncertainty estimation
Index Term-Uncontrolled  
Visual recognition
Added Entry-Corporate Name  
Northwestern University Electrical and Computer Engineering
Host Item Entry  
Dissertations Abstracts International. 85-11B.
Electronic Location and Access  
로그인을 한후 보실 수 있는 자료입니다.
Control Number  
joongbu:658288

MARC

 008250224s2024        us  ||||||||||||||c||eng  d
■001000017161464
■00520250211151400
■006m          o    d                
■007cr#unu||||||||
■020    ▼a9798382757544
■035    ▼a(MiAaPQ)AAI31244241
■040    ▼aMiAaPQ▼cMiAaPQ
■0820  ▼a621.3
■1001  ▼aFan,  Lei.
■24510▼aActive  Visual  Recognition  in  Open  World.
■260    ▼a[S.l.]▼bNorthwestern  University.  ▼c2024
■260  1▼aAnn  Arbor▼bProQuest  Dissertations  &  Theses▼c2024
■300    ▼a184  p.
■500    ▼aSource:  Dissertations  Abstracts  International,  Volume:  85-11,  Section:  B.
■500    ▼aAdvisor:  Wu,  Ying.
■5021  ▼aThesis  (Ph.D.)--Northwestern  University,  2024.
■520    ▼aVisual  recognition,  a  core  area  of  computer  vision,  aims  to  interpret  semantic  information  from  images  and  videos.  With  advancements  in  deep  neural  networks  and  substantial  training  data,  this  field  has  seen  dramatic  developments  and  diverse  applications,  ranging  from  augmented  reality  to  autonomous  robotics.  Traditional  research  predominantly  focuses  on  passive  recognition,  where  pre-captured  visual  data  are  analyzed  without  interactive  engagement  from  the  system.  This  overlooks  the  potential  of  active  visual  recognition,  where  an  agent  actively  alters  its  viewpoint  to  optimize  recognition  performance.Active  visual  recognition  enables  agents  to  actively  gather  visual  information  based  on  their  own  incentives.  It  offers  a  solution  to  overcome  challenges  such  as  visual  occlusions,  ambiguous  viewpoints,  poor  lighting,  and  other  undesired  viewing  conditions.  By  taking  actions,  agents  can  acquire  novel  and  informative  observations.  Furthermore,  active  visual  recognition  is  essential  for  various  embodied  AI  applications,  such  as  robotic  grasping  and  navigation,  enabling  robots  to  move  and  explore  their  environments  intelligently.Although  different  studies  have  explored  active  recognition  behaviors  and  environmental  modelings,  these  efforts  have  largely  been  confined  to  closed-world  scenarios.  In  such  settings,  the  agent's  operational  environment  is  restricted,  lacking  both  expansiveness  and  dynamism,  and  the  categories  of  objects  it  interacts  with  are  limited  to  a  predefined  set.  This  dissertation  aims  to  extend  the  boundaries  of  active  visual  recognition  into  more  open,  dynamic  environments,  thereby  broadening  the  practical  applicability  of  active  recognition.In  search  of  a  more  flexible  and  scalable  solution  to  overcome  the  limitations  of  closed-world  scenarios,  we  focus  on  three  major  challenges  when  developing  active  recognition  in  an  open-world  context.  The  first  challenge  is  the  training  collapse  issue  in  active  recognition,  where  the  agent  fails  to  learn  an  effective  recognition  policy  from  an  underdeveloped  recognition  module.  The  second  challenge  involves  modeling  visual  uncertainty  in  active  recognition,  especially  when  the  agent  encounters  unexpected  observations  during  exploration.  The  third  challenge  is  enabling  the  active  recognition  agent  to  handle  novel  object  categories.In  this  dissertation,  we  model  three  open-world  challenges  from  both  theoretical  and  practical  perspectives.To  address  the  issue  of  training  collapse,  we  propose  integrating  an  additional  adversarial  policy  that  continually  disturbs  the  recognition  agent  during  training.  This  forms  a  competitive  game,  promoting  active  exploration  and  preventing  the  agent  from  converging  on  singular  solutions.  The  reinforced  adversary,  rewarded  when  recognition  fails,  challenges  the  recognition  agent  by  directing  the  camera  towards  difficult  observations.For  understanding  uncertainty  in  active  recognition,  we  first  propose  a  novel  method  to  assess  recognition  uncertainty  in  image-based  recognition.  We  model  two  types  of  uncertainties  from  distinct  sources  using  the  Dempster-Shafer  theory  of  evidence.  We  then  deploy  this  uncertainty  measurement  method  in  active  recognition  agents  and  treat  active  recognition  as  a  sequential  evidence-gathering  process.  Additionally,  to  evaluate  performance,  we  collect  a  dataset  from  an  indoor  simulator  that  includes  various  recognition  challenges,  such  as  distance,  occlusion  levels,  and  visibility. The  third  challenge  involves  enabling  active  recognition  agents  to  identify  newly  discovered  categories.  The  solution  divides  into  two  branches.  The  first  explores  the  potential  for  continual  learning  of  novel  concepts  with  few  training  samples  on  the  fly.  The  proposed  agent  integrates  prototypes,  a  robust  representation  for  limited  training  samples,  into  a  reinforcement  learning  framework,  encouraging  the  agent  to  move  towards  views  that  yield  more  discriminative  features.  Catastrophic  forgetting  during  continual  learning  is  mitigated  through  knowledge  distillation.  The  second  branch  focuses  on  direct  open-vocabulary  active  recognition,  which  ambitiously  aims  to  recognize  all  nouns  actively.  We  propose  an  open-vocabulary  active  recognition  agent  that  combines  current  vision-language  models  with  a  semantic-agnostic  policy  to  enable  intelligent  movement  for  open-vocabulary  recognition.
■590    ▼aSchool  code:  0163.
■650  4▼aElectrical  engineering.
■650  4▼aComputer  science.
■650  4▼aInformation  technology.
■653    ▼aActive  recognition
■653    ▼aEmbodied  AI
■653    ▼aUncertainty  estimation
■653    ▼aVisual  recognition
■690    ▼a0544
■690    ▼a0489
■690    ▼a0984
■690    ▼a0800
■71020▼aNorthwestern  University▼bElectrical  and  Computer  Engineering.
■7730  ▼tDissertations  Abstracts  International▼g85-11B.
■790    ▼a0163
■791    ▼aPh.D.
■792    ▼a2024
■793    ▼aEnglish
■85640▼uhttp://www.riss.kr/pdu/ddodLink.do?id=T17161464▼nKERIS▼z이  자료의  원문은  한국교육학술정보원에서  제공합니다.

미리보기

내보내기

chatGPT토론

Ai 추천 관련 도서


    New Books MORE
    Related books MORE
    최근 3년간 통계입니다.

    פרט מידע

    • הזמנה
    • 캠퍼스간 도서대출
    • 서가에 없는 책 신고
    • התיקיה שלי
    גשמי
    Reg No. Call No. מיקום מצב להשאיל מידע
    TQ0034606 T   원문자료 열람가능/출력가능 열람가능/출력가능
    마이폴더 부재도서신고

    * הזמנות זמינים בספר ההשאלה. כדי להזמין, נא לחץ על כפתור ההזמנה

    해당 도서를 다른 이용자가 함께 대출한 도서

    Related books

    Related Popular Books

    도서위치