중부대학교 도서관

본문 바로가기
탑 메뉴 바로가기
주 메뉴 바로가기
하단 바로가기

コンテンツ情報

Language Supervision for Computer Vision.

자료유형: 학위논문

Control Number: 0017162812

International Standard Book Number: 9798382739380

Dewey Decimal Classification Number: 004

Main Entry-Personal Name: Desai, Karan P.

Publication, Distribution, etc. (Imprint: [S.l.] : University of Michigan., 2024

Publication, Distribution, etc. (Imprint: Ann Arbor : ProQuest Dissertations & Theses, 2024

Physical Description: 179 p.

General Note: Source: Dissertations Abstracts International, Volume: 85-12, Section: B.

General Note: Advisor: Johnson, Justin C.

Dissertation Note: Thesis (Ph.D.)--University of Michigan, 2024.

Summary, Etc.: 요약Representation learning lies at the core of modern Artificial Intelligence. In computer vision, labeled image datasets like ImageNet have been the standard choice for representation learning. Despite being empirically successful, this approach is expensive to scale due to labeling costs. Moreover, the representation quality is limited by the size and diversity of datasets and their associated label ontologies.My research explores using natural language supervision for computer vision. Using natural language allows us to go beyond fixed label ontologies and scale up to more general sources such as internet data. Toward this goal, my dissertation explores four problems - (1) Learning representations: I propose one of the first methods for language-supervised visual learning that uses image captioning as the training objective, showing its efficacy compared to ImageNet-trained methods on downstream tasks like object detection and segmentation. (2) Scaling data: I explore social media as a rich source of high-quality image descriptions and curate a dataset of 12 million image-text pairs while ensuring responsible curation practices. (3) Understanding data: It is difficult to comprehend the diversity of visual concepts present in millions of image-text pairs. I posit that images and text naturally organize into a tree-like hierarchy and propose an approach for learning representations that capture this hierarchy using tools from hyperbolic geometry. (4) Transfer to downstream tasks: Large vision-language models show impressive zero-shot transfer capabilities on image-level tasks like classification and retrieval. However, their transferability to pixel-level tasks like object detection and segmentation has relied on expensive labeled mask annotations. I propose an object detector to efficiently transfer pre-trained vision models to segment and classify visual objects without any fine-tuning, unlike existing detectors that train using orders of magnitude more labeled masks to achieve high performance.In summary, my research affirms that using language supervision can drive the next leap of progress in computer vision and has immense utility in practical applications.

Subject Added Entry-Topical Term: Computer science.

Subject Added Entry-Topical Term: Computer engineering.

Index Term-Uncontrolled: Computer vision

Index Term-Uncontrolled: Representation learning

Index Term-Uncontrolled: Hyperbolic geometry

Index Term-Uncontrolled: Natural language

Added Entry-Corporate Name: University of Michigan Computer Science & Engineering

Host Item Entry: Dissertations Abstracts International. 85-12B.

Electronic Location and Access: 로그인을 한후 보실 수 있는 자료입니다.

Control Number: joongbu:657780

New Books MORE

최근 3년간 통계입니다.

予約
캠퍼스간 도서대출
서가에 없는 책 신고
보존서고대출신청
私のフォルダ

資料
登録番号	請求記号	場所	ステータス	情報を貸す
TQ0033998	T	원문자료	열람가능/출력가능	열람가능/출력가능 마이폴더 부재도서신고

*ご予約は、借入帳でご利用いただけます。予約をするには、予約ボタンをクリックしてください

본문

서브메뉴

검색

New Books MORE

최근 3년간 통계입니다.

詳細情報

해당 도서를 다른 이용자가 함께 대출한 도서

Related books

Related Popular Books

도서위치

QUICK LINK