중부대학교 도서관

본문 바로가기
탑 메뉴 바로가기
주 메뉴 바로가기
하단 바로가기

Inhalt Info

Representation Learning for Music and Audio Intelligence.

자료유형: 학위논문

Control Number: 0017161301

International Standard Book Number: 9798383188774

Dewey Decimal Classification Number: 004

Main Entry-Personal Name: Chen, Ke.

Publication, Distribution, etc. (Imprint: [S.l.] : University of California, San Diego., 2024

Publication, Distribution, etc. (Imprint: Ann Arbor : ProQuest Dissertations & Theses, 2024

Physical Description: 122 p.

General Note: Source: Dissertations Abstracts International, Volume: 86-01, Section: B.

General Note: Advisor: Dubnov, Shlomo;Berg-Kirkpatrick, Taylor.

Dissertation Note: Thesis (Ph.D.)--University of California, San Diego, 2024.

Summary, Etc.: 요약With recent breakthroughs in machine learning, the pursuit of efficient and effective feature representation has gradually taken center stage, igniting groundbreaking possibilities for various downstream applications. While significant progress has been made in the domains of natural language processing and computer vision, there arises an imperative need to construct a robust audio representation model that empowers advanced audio applications.In this dissertation, we begin from an initial design of an innovative audio transformer as the cornerstone, HTS-AT, that employs imperative designs to capture semantic and acoustic information of audio data. We present a step-by-step demonstration on how we unleash the power of HTS-AT to unlock a wide range of advanced audio downstream applications in audio understanding and audio generative AI. Specifically, we first adapt HTS-AT to audio event classification, assessing its prowess in comprehending the semantics of audio tracks. Subsequently, we leverage the audio embedding of HTS-AT into audio source separation, evaluating its capability to conceive the acoustic feature of audio. To embrace more applications in conjunction with other modalities, we propose a contrastive language-audio pretraining model (CLAP) that combines HTS-AT with the language understanding model to incorporate the shared information between audio and text representations. From all above explorations, we achieve the target of content creation by proposing MusicLDM, a latent diffusion model that leverages the embeddings of CLAP to perform the text-to-music generation.Throughout all designs, experiments, and application studies, we achieve successful adaptations and superior performance of different audio downstream tasks rising from a simple audio transformer. Besides, more potential applications in the field of audio content extraction and creation are awaiting, as we will touch upon our ongoing and forthcoming endeavors in addressing their challenges and realizing their full potential.

Subject Added Entry-Topical Term: Computer science.

Subject Added Entry-Topical Term: Music.

Subject Added Entry-Topical Term: Audiology.

Index Term-Uncontrolled: Audio signal processing

Index Term-Uncontrolled: Deep learning

Index Term-Uncontrolled: Music signal processing

Index Term-Uncontrolled: Representation learning

Added Entry-Corporate Name: University of California, San Diego Computer Science and Engineering

Host Item Entry: Dissertations Abstracts International. 86-01B.

Electronic Location and Access: 로그인을 한후 보실 수 있는 자료입니다.

Control Number: joongbu:658677

New Books MORE

최근 3년간 통계입니다.

Reservierung
캠퍼스간 도서대출
서가에 없는 책 신고
보존서고대출신청
Meine Mappe

Sammlungen
Registrierungsnummer	callnumber	Standort	Verkehr Status	Verkehr Info
TQ0034995	T	원문자료	열람가능/출력가능	열람가능/출력가능 마이폴더 부재도서신고

* Kredite nur für Ihre Daten gebucht werden. Wenn Sie buchen möchten Reservierungen, klicken Sie auf den Button.

본문

서브메뉴

검색

New Books MORE

최근 3년간 통계입니다.

Buch Status

해당 도서를 다른 이용자가 함께 대출한 도서

Related books

Related Popular Books

도서위치

QUICK LINK