중부대학교 도서관

본문 바로가기
탑 메뉴 바로가기
주 메뉴 바로가기
하단 바로가기

Contents Info

Grounding Language in Images and Videos.

자료유형: 학위논문

Control Number: 0017161970

International Standard Book Number: 9798382652443

Dewey Decimal Classification Number: 004

Main Entry-Personal Name: Sadhu, Arka.

Publication, Distribution, etc. (Imprint: [S.l.] : University of Southern California., 2024

Publication, Distribution, etc. (Imprint: Ann Arbor : ProQuest Dissertations & Theses, 2024

Physical Description: 244 p.

General Note: Source: Dissertations Abstracts International, Volume: 85-11, Section: A.

General Note: Advisor: Nevatia, Ramakant.

Dissertation Note: Thesis (Ph.D.)--University of Southern California, 2024.

Summary, Etc.: 요약While machine learning research has traditionally explored image, video and text understanding as separate fields, the surge in multi-modal content in today's digital landscape underscores the importance of computation models that adeptly navigate complex interactions between text, images and videos. This dissertation addresses this challenge of grounding language in visual media - the task of associating linguistic symbols with perceptual experiences and actions. The overarching goal of this dissertation is to bridge the gap between language and vision as a means to a "deeper understanding" of images and videos to allow developing models capable of reasoning over longer-time horizons such as hour-long movies, or a collection of images, or even multiple videos.A pivotal contribution of my work is the use of Semantic Roles for images, videos and text. Unlike previous works that primarily focused on recognizing single entities or generating holistic captions, the use of Semantic Roles facilitates a fine-grained understanding of "who did what to whom" in a structured format. It maintains the advantages of having free-form language phrases and at the same time also being comprehensive and complete like entity recognition, thus enriching the model's interpretive capabilities.In this thesis, we will introduce the various vision-language tasks developed during my Ph.D. This includes grounding unseen words, spatio-temporal localization of entities in a video, video question answering, visual semantic role labeling in videos, reasoning across more than one image or a video, and finally, weakly-supervised open-vocabulary object detection. Each task is accompanied by the creation and development of dedicated datasets, evaluation protocols, and model frameworks. These tasks aim to investigate a particular phenomenon inherent in image or video understanding in isolation, develop corresponding datasets and model frameworks, and outline evaluation protocols robust to data priors.The resulting models can be used for other downstream tasks like obtaining common-sense knowledge graphs from instructional videos or drive end-user applications like Retrieval, Question Answering, and Captioning. By facilitating the deeper integration of language and vision, this dissertation represents a step-forward in machine learning models capable of finer-understanding of the world around us.

Subject Added Entry-Topical Term: Computer science.

Subject Added Entry-Topical Term: Computer engineering.

Subject Added Entry-Topical Term: Linguistics.

Index Term-Uncontrolled: Computer vision

Index Term-Uncontrolled: Image understanding

Index Term-Uncontrolled: Machine learning

Index Term-Uncontrolled: Natural language processing

Index Term-Uncontrolled: Video understanding

Added Entry-Corporate Name: University of Southern California Computer Science

Host Item Entry: Dissertations Abstracts International. 85-11A.

Electronic Location and Access: 로그인을 한후 보실 수 있는 자료입니다.

Control Number: joongbu:657024

New Books MORE

최근 3년간 통계입니다.

הזמנה
캠퍼스간 도서대출
서가에 없는 책 신고
보존서고대출신청
התיקיה שלי

גשמי
Reg No.	Call No.	מיקום	מצב	להשאיל מידע
TQ0033242	T	원문자료	열람가능/출력가능	열람가능/출력가능 마이폴더 부재도서신고

* הזמנות זמינים בספר ההשאלה. כדי להזמין, נא לחץ על כפתור ההזמנה

본문

서브메뉴

검색

New Books MORE

최근 3년간 통계입니다.

פרט מידע

해당 도서를 다른 이용자가 함께 대출한 도서

Related books

Related Popular Books

도서위치

QUICK LINK