서브메뉴
검색
Self-Supervised Representation Learning for Molecular Property Predictions- [electronic resource]
Self-Supervised Representation Learning for Molecular Property Predictions- [electronic resource]
- 자료유형
- 학위논문
- Control Number
- 0016932088
- International Standard Book Number
- 9798379703776
- Dewey Decimal Classification Number
- 004
- Main Entry-Personal Name
- Wang, Yuyang.
- Publication, Distribution, etc. (Imprint
- [S.l.] : Carnegie Mellon University., 2023
- Publication, Distribution, etc. (Imprint
- Ann Arbor : ProQuest Dissertations & Theses, 2023
- Physical Description
- 1 online resource(187 p.)
- General Note
- Source: Dissertations Abstracts International, Volume: 84-12, Section: B.
- General Note
- Advisor: Farimani, Amir Barati.
- Dissertation Note
- Thesis (Ph.D.)--Carnegie Mellon University, 2023.
- Restrictions on Access Note
- This item must not be sold to any third party vendors.
- Summary, Etc.
- 요약Deep learning (DL) has been widely implemented in molecular modeling for property predictions. However, there are two major challenges in DL for molecules. (1) The chemical space of potentially active molecules is gigantic. (2) Labeled data of molecular properties is limited due to expensive and time-consuming simulations and experiments. DL models trained on such limited data in a supervised-learning manner struggle to perform well on novel molecules. Recently, self-supervised learning (SSL), gathers growing attention for learning representations from unlabeled data via obtaining supervisory objectives from the data itself. Unlike supervised learning, SSL can leverage massive data without manually annotated labels, which bears the promise of learning generic molecular representations for various applications.In this dissertation, we study self-supervised molecular representation learning that makes use of large unlabeled data for better molecular property predictions. This dissertation consists of three parts, where we investigate SSL with different representations of molecules for different applications. In Part I, we introduce contrastive learning (CL) to learn representation from 2D molecular graphs with graph neural networks (GNNs). We further improve the CL framework via faulty negative mitigation with fingerprints as well as fragment-level contrasting between decomposed molecular motifs. A wide variety of property prediction tasks concerning small organic molecules, including physiology, biophysics, physical chemistry, and quantum mechanics, have been investigated in this part. In Part II, we investigate SSL methods that leverage 3D molecular geometries. In particular, denoising pre-training is proposed which significantly improves the accuracy of molecular potential predictions with equivariant GNNs. Notably, our models pre-trained on small molecules demonstrate remarkable transferability, improving performance when fine-tuned on diverse molecular systems, including different elements, charged molecules, biomolecules, and larger systems. Lastly in Part III, we investigate the development of structure-agnostic language models, especially Transformers, in chemical science. We propose chemical-aware tokenization and adapt masked language modeling for polymer property predictions. Moreover, we utilize the multimodalities of metal-organic frameworks (MOFs) through jointly training two branches of string representations encoded by Transformers and 3D geometric representations encoded by alignment. Overall, our research advances self-supervised molecular representation learning for improved prediction accuracy of various molecular properties, with potential implications for accelerating drug and material discovery.
- Subject Added Entry-Topical Term
- Computer science.
- Index Term-Uncontrolled
- Deep learning
- Index Term-Uncontrolled
- Molecular modeling
- Index Term-Uncontrolled
- Property prediction
- Index Term-Uncontrolled
- Self-supervised learning
- Index Term-Uncontrolled
- Metal-organic frameworks
- Index Term-Uncontrolled
- Contrastive learning
- Added Entry-Corporate Name
- Carnegie Mellon University Mechanical Engineering
- Host Item Entry
- Dissertations Abstracts International. 84-12B.
- Host Item Entry
- Dissertation Abstract International
- Electronic Location and Access
- 로그인을 한후 보실 수 있는 자료입니다.
- Control Number
- joongbu:643685