본문

서브메뉴

Representation Learning for Music and Audio Intelligence.
Contents Info
Representation Learning for Music and Audio Intelligence.
자료유형  
 학위논문
Control Number  
0017161301
International Standard Book Number  
9798383188774
Dewey Decimal Classification Number  
004
Main Entry-Personal Name  
Chen, Ke.
Publication, Distribution, etc. (Imprint  
[S.l.] : University of California, San Diego., 2024
Publication, Distribution, etc. (Imprint  
Ann Arbor : ProQuest Dissertations & Theses, 2024
Physical Description  
122 p.
General Note  
Source: Dissertations Abstracts International, Volume: 86-01, Section: B.
General Note  
Advisor: Dubnov, Shlomo;Berg-Kirkpatrick, Taylor.
Dissertation Note  
Thesis (Ph.D.)--University of California, San Diego, 2024.
Summary, Etc.  
요약With recent breakthroughs in machine learning, the pursuit of efficient and effective feature representation has gradually taken center stage, igniting groundbreaking possibilities for various downstream applications. While significant progress has been made in the domains of natural language processing and computer vision, there arises an imperative need to construct a robust audio representation model that empowers advanced audio applications.In this dissertation, we begin from an initial design of an innovative audio transformer as the cornerstone, HTS-AT, that employs imperative designs to capture semantic and acoustic information of audio data. We present a step-by-step demonstration on how we unleash the power of HTS-AT to unlock a wide range of advanced audio downstream applications in audio understanding and audio generative AI. Specifically, we first adapt HTS-AT to audio event classification, assessing its prowess in comprehending the semantics of audio tracks. Subsequently, we leverage the audio embedding of HTS-AT into audio source separation, evaluating its capability to conceive the acoustic feature of audio. To embrace more applications in conjunction with other modalities, we propose a contrastive language-audio pretraining model (CLAP) that combines HTS-AT with the language understanding model to incorporate the shared information between audio and text representations. From all above explorations, we achieve the target of content creation by proposing MusicLDM, a latent diffusion model that leverages the embeddings of CLAP to perform the text-to-music generation.Throughout all designs, experiments, and application studies, we achieve successful adaptations and superior performance of different audio downstream tasks rising from a simple audio transformer. Besides, more potential applications in the field of audio content extraction and creation are awaiting, as we will touch upon our ongoing and forthcoming endeavors in addressing their challenges and realizing their full potential.
Subject Added Entry-Topical Term  
Computer science.
Subject Added Entry-Topical Term  
Music.
Subject Added Entry-Topical Term  
Audiology.
Index Term-Uncontrolled  
Audio signal processing
Index Term-Uncontrolled  
Deep learning
Index Term-Uncontrolled  
Music signal processing
Index Term-Uncontrolled  
Representation learning
Added Entry-Corporate Name  
University of California, San Diego Computer Science and Engineering
Host Item Entry  
Dissertations Abstracts International. 86-01B.
Electronic Location and Access  
로그인을 한후 보실 수 있는 자료입니다.
Control Number  
joongbu:658677
New Books MORE
최근 3년간 통계입니다.

detalle info

  • Reserva
  • 캠퍼스간 도서대출
  • 서가에 없는 책 신고
  • Mi carpeta
Material
número de libro número de llamada Ubicación estado Prestar info
TQ0034995 T   원문자료 열람가능/출력가능 열람가능/출력가능
마이폴더 부재도서신고

* Las reservas están disponibles en el libro de préstamos. Para hacer reservaciones, haga clic en el botón de reserva

해당 도서를 다른 이용자가 함께 대출한 도서

Related books

Related Popular Books

도서위치