본문

서브메뉴

Unified Compositional Models for Visual Recognition
Unified Compositional Models for Visual Recognition

상세정보

자료유형  
 학위논문
Control Number  
0015492017
International Standard Book Number  
9781088315606
Dewey Decimal Classification Number  
004
Main Entry-Personal Name  
Tang, Wei.
Publication, Distribution, etc. (Imprint  
[Sl] : Northwestern University, 2019
Publication, Distribution, etc. (Imprint  
Ann Arbor : ProQuest Dissertations & Theses, 2019
Physical Description  
100 p
General Note  
Source: Dissertations Abstracts International, Volume: 81-05, Section: B.
General Note  
Advisor: Wu, Ying.
Dissertation Note  
Thesis (Ph.D.)--Northwestern University, 2019.
Restrictions on Access Note  
This item must not be sold to any third party vendors.
Summary, Etc.  
요약A core problem in many computer vision applications is visual recognition (including object classification, detection and localization). Recent advances in artificial neural networks (aka "deep learning") have significantly pushed forward the state-of-the-art visual recognition performances. However, due to the lack of semantic structure modeling, most current deep learning approaches do not have explicit mechanisms for visual inference and reasoning. As a result, they are unable to explain, interpret, and understand the relations among visual entities. Moreover, since deep learning tries to fit a highly nonlinear function, it is data hungry and can overfit when the data is not big enough.This thesis studies deep and unified computational modeling for visual compositionality. It is a new mechanism to unify semantic structure modeling and deep learning into an effective learning framework for robust visual recognition. Visual compositionality refers to the decomposition of complex visual patterns into hierarchies of simpler ones. It not only embraces much stronger pattern expression powers, but also helps resolve the ambiguities in the smaller and lower-level visual patterns via larger and higher-level visual patterns.We first present a unified framework for compositional pattern modeling, inference and learning. Represented by And-Or graphs (AOGs), it jointly models the compositional structure, parts, features, and composition/sub-configuration relationships. We show that the inference algorithm of the proposed framework is equivalent to a feedforward network. Thus, all the parameters can be learned efficiently via the highly-scalable back-propagation (BP) in an end-to-end fashion. We validate the model via the task of handwritten digit recognition. By visualizing the processes of bottom-up composition and top-down parsing, we show that our model is fully interpretable, being able to learn the hierarchical compositions from visual primitives to visual patterns at increasingly higher levels. We apply this new compositional model to natural scene character recognition and generic object detection. Experimental results have demonstrated its effectiveness. We then introduce a novel deeply learned compositional model for human pose estimation (HPE). It exploits deep neural networks to learn the compositionality of human bodies. This results in a novel network with a hierarchical compositional architecture and bottom-up/top-down inference stages. In addition, we propose a novel bone-based part representation. It not only compactly encodes orientations, scales and shapes of parts, but also avoids their potentially large state spaces. With significantly lower complexities, our approach outperforms state-of-the-art methods on three benchmark datasets. Finally, we study how features can be learned in a compositional fashion. The motivation is that HPE is inherently a homogeneous multi-task learning problem, with the localization of each body part as a different task. Recent HPE approaches universally learn a shared representation for all parts, from which their locations are linearly regressed. However, our statistical analysis indicates not all parts are related to each other. As a result, such a sharing mechanism can lead to negative transfer and deteriorate the performance. To resolve this issue, we first propose a data-driven approach to group related parts based on how much information they share. Then a part-based branching network (PBN) is introduced to learn representations specific to each part group. We further present a multi-stage version of this network to repeatedly refine intermediate features and pose estimates. Ablation experiments indicate learning specific features significantly improves the localization of occluded parts and thus benefits HPE. Our approach also outperforms all state-of-the-art methods on two benchmark datasets, with an outstanding advantage when occlusion occurs.
Subject Added Entry-Topical Term  
Electrical engineering
Subject Added Entry-Topical Term  
Computer science
Added Entry-Corporate Name  
Northwestern University Electrical and Computer Engineering
Host Item Entry  
Dissertations Abstracts International. 81-05B.
Host Item Entry  
Dissertation Abstract International
Electronic Location and Access  
로그인을 한후 보실 수 있는 자료입니다.
Control Number  
joongbu:565673

MARC

 008200131s2019                                          c    eng  d
■001000015492017
■00520200217181514
■020    ▼a9781088315606
■035    ▼a(MiAaPQ)AAI13899092
■040    ▼aMiAaPQ▼cMiAaPQ
■0820  ▼a004
■1001  ▼aTang,  Wei.
■24510▼aUnified  Compositional  Models  for  Visual  Recognition
■260    ▼a[Sl]▼bNorthwestern  University▼c2019
■260  1▼aAnn  Arbor▼bProQuest  Dissertations  &  Theses▼c2019
■300    ▼a100  p
■500    ▼aSource:  Dissertations  Abstracts  International,  Volume:  81-05,  Section:  B.
■500    ▼aAdvisor:  Wu,  Ying.
■5021  ▼aThesis  (Ph.D.)--Northwestern  University,  2019.
■506    ▼aThis  item  must  not  be  sold  to  any  third  party  vendors.
■520    ▼aA  core  problem  in  many  computer  vision  applications  is  visual  recognition  (including  object  classification,  detection  and  localization).  Recent  advances  in  artificial  neural  networks  (aka  "deep  learning")  have  significantly  pushed  forward  the  state-of-the-art  visual  recognition  performances.  However,  due  to  the  lack  of  semantic  structure  modeling,  most  current  deep  learning  approaches  do  not  have  explicit  mechanisms  for  visual  inference  and  reasoning.  As  a  result,  they  are  unable  to  explain,  interpret,  and  understand  the  relations  among  visual  entities.  Moreover,  since  deep  learning  tries  to  fit  a  highly  nonlinear  function,  it  is  data  hungry  and  can  overfit  when  the  data  is  not  big  enough.This  thesis  studies  deep  and  unified  computational  modeling  for  visual  compositionality.  It  is  a  new  mechanism  to  unify  semantic  structure  modeling  and  deep  learning  into  an  effective  learning  framework  for  robust  visual  recognition.  Visual  compositionality  refers  to  the  decomposition  of  complex  visual  patterns  into  hierarchies  of  simpler  ones.  It  not  only  embraces  much  stronger  pattern  expression  powers,  but  also  helps  resolve  the  ambiguities  in  the  smaller  and  lower-level  visual  patterns  via  larger  and  higher-level  visual  patterns.We  first  present  a  unified  framework  for  compositional  pattern  modeling,  inference  and  learning.  Represented  by  And-Or  graphs  (AOGs),  it  jointly  models  the  compositional  structure,  parts,  features,  and  composition/sub-configuration  relationships.  We  show  that  the  inference  algorithm  of  the  proposed  framework  is  equivalent  to  a  feedforward  network.  Thus,  all  the  parameters  can  be  learned  efficiently  via  the  highly-scalable  back-propagation  (BP)  in  an  end-to-end  fashion.  We  validate  the  model  via  the  task  of  handwritten  digit  recognition.  By  visualizing  the  processes  of  bottom-up  composition  and  top-down  parsing,  we  show  that  our  model  is  fully  interpretable,  being  able  to  learn  the  hierarchical  compositions  from  visual  primitives  to  visual  patterns  at  increasingly  higher  levels.  We  apply  this  new  compositional  model  to  natural  scene  character  recognition  and  generic  object  detection.  Experimental  results  have  demonstrated  its  effectiveness.  We  then  introduce  a  novel  deeply  learned  compositional  model  for  human  pose  estimation  (HPE).  It  exploits  deep  neural  networks  to  learn  the  compositionality  of  human  bodies.  This  results  in  a  novel  network  with  a  hierarchical  compositional  architecture  and  bottom-up/top-down  inference  stages.  In  addition,  we  propose  a  novel  bone-based  part  representation.  It  not  only  compactly  encodes  orientations,  scales  and  shapes  of  parts,  but  also  avoids  their  potentially  large  state  spaces.  With  significantly  lower  complexities,  our  approach  outperforms  state-of-the-art  methods  on  three  benchmark  datasets.  Finally,  we  study  how  features  can  be  learned  in  a  compositional  fashion.  The  motivation  is  that  HPE  is  inherently  a  homogeneous  multi-task  learning  problem,  with  the  localization  of  each  body  part  as  a  different  task.  Recent  HPE  approaches  universally  learn  a  shared  representation  for  all  parts,  from  which  their  locations  are  linearly  regressed.  However,  our  statistical  analysis  indicates  not  all  parts  are  related  to  each  other.  As  a  result,  such  a  sharing  mechanism  can  lead  to  negative  transfer  and  deteriorate  the  performance.  To  resolve  this  issue,  we  first  propose  a  data-driven  approach  to  group  related  parts  based  on  how  much  information  they  share.  Then  a  part-based  branching  network  (PBN)  is  introduced  to  learn  representations  specific  to  each  part  group.  We  further  present  a  multi-stage  version  of  this  network  to  repeatedly  refine  intermediate  features  and  pose  estimates.  Ablation  experiments  indicate  learning  specific  features  significantly  improves  the  localization  of  occluded  parts  and  thus  benefits  HPE.  Our  approach  also  outperforms  all  state-of-the-art  methods  on  two  benchmark  datasets,  with  an  outstanding  advantage  when  occlusion  occurs.
■590    ▼aSchool  code:  0163.
■650  4▼aElectrical  engineering
■650  4▼aComputer  science
■690    ▼a0544
■690    ▼a0984
■71020▼aNorthwestern  University▼bElectrical  and  Computer  Engineering.
■7730  ▼tDissertations  Abstracts  International▼g81-05B.
■773    ▼tDissertation  Abstract  International
■790    ▼a0163
■791    ▼aPh.D.
■792    ▼a2019
■793    ▼aEnglish
■85640▼uhttp://www.riss.kr/pdu/ddodLink.do?id=T15492017▼nKERIS▼z이  자료의  원문은  한국교육학술정보원에서  제공합니다.
■980    ▼a202002▼f2020

미리보기

내보내기

chatGPT토론

Ai 추천 관련 도서


    New Books MORE
    Related books MORE
    최근 3년간 통계입니다.

    Buch Status

    • Reservierung
    • 캠퍼스간 도서대출
    • 서가에 없는 책 신고
    • Meine Mappe
    Sammlungen
    Registrierungsnummer callnumber Standort Verkehr Status Verkehr Info
    TQ0005700 T   원문자료 열람가능/출력가능 열람가능/출력가능
    마이폴더 부재도서신고

    * Kredite nur für Ihre Daten gebucht werden. Wenn Sie buchen möchten Reservierungen, klicken Sie auf den Button.

    해당 도서를 다른 이용자가 함께 대출한 도서

    Related books

    Related Popular Books

    도서위치