서브메뉴
검색
Representation Learning for Music and Audio Intelligence.
Representation Learning for Music and Audio Intelligence.
- 자료유형
- 학위논문
- Control Number
- 0017161301
- International Standard Book Number
- 9798383188774
- Dewey Decimal Classification Number
- 004
- Main Entry-Personal Name
- Chen, Ke.
- Publication, Distribution, etc. (Imprint
- [S.l.] : University of California, San Diego., 2024
- Publication, Distribution, etc. (Imprint
- Ann Arbor : ProQuest Dissertations & Theses, 2024
- Physical Description
- 122 p.
- General Note
- Source: Dissertations Abstracts International, Volume: 86-01, Section: B.
- General Note
- Advisor: Dubnov, Shlomo;Berg-Kirkpatrick, Taylor.
- Dissertation Note
- Thesis (Ph.D.)--University of California, San Diego, 2024.
- Summary, Etc.
- 요약With recent breakthroughs in machine learning, the pursuit of efficient and effective feature representation has gradually taken center stage, igniting groundbreaking possibilities for various downstream applications. While significant progress has been made in the domains of natural language processing and computer vision, there arises an imperative need to construct a robust audio representation model that empowers advanced audio applications.In this dissertation, we begin from an initial design of an innovative audio transformer as the cornerstone, HTS-AT, that employs imperative designs to capture semantic and acoustic information of audio data. We present a step-by-step demonstration on how we unleash the power of HTS-AT to unlock a wide range of advanced audio downstream applications in audio understanding and audio generative AI. Specifically, we first adapt HTS-AT to audio event classification, assessing its prowess in comprehending the semantics of audio tracks. Subsequently, we leverage the audio embedding of HTS-AT into audio source separation, evaluating its capability to conceive the acoustic feature of audio. To embrace more applications in conjunction with other modalities, we propose a contrastive language-audio pretraining model (CLAP) that combines HTS-AT with the language understanding model to incorporate the shared information between audio and text representations. From all above explorations, we achieve the target of content creation by proposing MusicLDM, a latent diffusion model that leverages the embeddings of CLAP to perform the text-to-music generation.Throughout all designs, experiments, and application studies, we achieve successful adaptations and superior performance of different audio downstream tasks rising from a simple audio transformer. Besides, more potential applications in the field of audio content extraction and creation are awaiting, as we will touch upon our ongoing and forthcoming endeavors in addressing their challenges and realizing their full potential.
- Subject Added Entry-Topical Term
- Computer science.
- Subject Added Entry-Topical Term
- Music.
- Subject Added Entry-Topical Term
- Audiology.
- Index Term-Uncontrolled
- Audio signal processing
- Index Term-Uncontrolled
- Deep learning
- Index Term-Uncontrolled
- Music signal processing
- Index Term-Uncontrolled
- Representation learning
- Added Entry-Corporate Name
- University of California, San Diego Computer Science and Engineering
- Host Item Entry
- Dissertations Abstracts International. 86-01B.
- Electronic Location and Access
- 로그인을 한후 보실 수 있는 자료입니다.
- Control Number
- joongbu:658677
detalle info
- Reserva
- 캠퍼스간 도서대출
- 서가에 없는 책 신고
- Mi carpeta