본문

서브메뉴

Representation Learning for Music and Audio Intelligence.
Inhalt Info
Representation Learning for Music and Audio Intelligence.
자료유형  
 학위논문
Control Number  
0017161301
International Standard Book Number  
9798383188774
Dewey Decimal Classification Number  
004
Main Entry-Personal Name  
Chen, Ke.
Publication, Distribution, etc. (Imprint  
[S.l.] : University of California, San Diego., 2024
Publication, Distribution, etc. (Imprint  
Ann Arbor : ProQuest Dissertations & Theses, 2024
Physical Description  
122 p.
General Note  
Source: Dissertations Abstracts International, Volume: 86-01, Section: B.
General Note  
Advisor: Dubnov, Shlomo;Berg-Kirkpatrick, Taylor.
Dissertation Note  
Thesis (Ph.D.)--University of California, San Diego, 2024.
Summary, Etc.  
요약With recent breakthroughs in machine learning, the pursuit of efficient and effective feature representation has gradually taken center stage, igniting groundbreaking possibilities for various downstream applications. While significant progress has been made in the domains of natural language processing and computer vision, there arises an imperative need to construct a robust audio representation model that empowers advanced audio applications.In this dissertation, we begin from an initial design of an innovative audio transformer as the cornerstone, HTS-AT, that employs imperative designs to capture semantic and acoustic information of audio data. We present a step-by-step demonstration on how we unleash the power of HTS-AT to unlock a wide range of advanced audio downstream applications in audio understanding and audio generative AI. Specifically, we first adapt HTS-AT to audio event classification, assessing its prowess in comprehending the semantics of audio tracks. Subsequently, we leverage the audio embedding of HTS-AT into audio source separation, evaluating its capability to conceive the acoustic feature of audio. To embrace more applications in conjunction with other modalities, we propose a contrastive language-audio pretraining model (CLAP) that combines HTS-AT with the language understanding model to incorporate the shared information between audio and text representations. From all above explorations, we achieve the target of content creation by proposing MusicLDM, a latent diffusion model that leverages the embeddings of CLAP to perform the text-to-music generation.Throughout all designs, experiments, and application studies, we achieve successful adaptations and superior performance of different audio downstream tasks rising from a simple audio transformer. Besides, more potential applications in the field of audio content extraction and creation are awaiting, as we will touch upon our ongoing and forthcoming endeavors in addressing their challenges and realizing their full potential.
Subject Added Entry-Topical Term  
Computer science.
Subject Added Entry-Topical Term  
Music.
Subject Added Entry-Topical Term  
Audiology.
Index Term-Uncontrolled  
Audio signal processing
Index Term-Uncontrolled  
Deep learning
Index Term-Uncontrolled  
Music signal processing
Index Term-Uncontrolled  
Representation learning
Added Entry-Corporate Name  
University of California, San Diego Computer Science and Engineering
Host Item Entry  
Dissertations Abstracts International. 86-01B.
Electronic Location and Access  
로그인을 한후 보실 수 있는 자료입니다.
Control Number  
joongbu:658677
New Books MORE
최근 3년간 통계입니다.

Buch Status

  • Reservierung
  • 캠퍼스간 도서대출
  • 서가에 없는 책 신고
  • Meine Mappe
Sammlungen
Registrierungsnummer callnumber Standort Verkehr Status Verkehr Info
TQ0034995 T   원문자료 열람가능/출력가능 열람가능/출력가능
마이폴더 부재도서신고

* Kredite nur für Ihre Daten gebucht werden. Wenn Sie buchen möchten Reservierungen, klicken Sie auf den Button.

해당 도서를 다른 이용자가 함께 대출한 도서

Related books

Related Popular Books

도서위치