서브메뉴
검색
Large Language Models for Automatic Peer Review and Revision in Scientific Documents.
Large Language Models for Automatic Peer Review and Revision in Scientific Documents.
- 자료유형
- 학위논문
- Control Number
- 0017160152
- International Standard Book Number
- 9798381975161
- Dewey Decimal Classification Number
- 004
- Main Entry-Personal Name
- D'Arcy, Mike.
- Publication, Distribution, etc. (Imprint
- [S.l.] : Northwestern University., 2024
- Publication, Distribution, etc. (Imprint
- Ann Arbor : ProQuest Dissertations & Theses, 2024
- Physical Description
- 170 p.
- General Note
- Source: Dissertations Abstracts International, Volume: 85-10, Section: B.
- General Note
- Advisor: Downey, Douglas C.
- Dissertation Note
- Thesis (Ph.D.)--Northwestern University, 2024.
- Summary, Etc.
- 요약In this dissertation, we seek to evaluate LLM capabilities for reviewing and revising scientific documents and to develop new methods to improve them. The capabilities of large language models (LLMs) have advanced dramatically in recent years, performing on par with humans in some tasks. However, the ability of models to comprehend and produce long, highly technical text-such as that of scientific papers-remains under-explored.We construct ARIES, a dataset of scientific paper drafts, their associated peer reviews, and the new drafts after reviews, and we link individual feedback comments to specific edits that address them. Using ARIES, we study the ability of LLMs to edit scientific papers in response to feedback and to generate feedback comments.Our findings suggest that LLMs do show potential for generating feedback comments and edits for papers, but still suffer from significant limitations when attempting to comprehend or produce nuanced and technical text, often exhibiting surface-level reasoning and producing generic outputs. When revising a document in response to feedback, LLMs often write edits by quoting or paraphrasing the given feedback (48% of the time, compared to 4% for humans) and tend to include less technical detail (38% of model edits vs 53% of human edits had technical details). Similarly, when generating feedback comments for papers, baseline methods using GPT-4 were rated by users as producing generic or very generic comments more than half the time, and only 1.5 comments per paper were rated as good overall in the best baseline. We explore ways to mitigate these shortcomings and develop MARG-S, an approach for generating paper feedback using multiple specialized LLM instances that engage in internal discussion. We show that MARG-S substantially improves the ability of GPT-4 to generate specific and helpful feedback, reducing the rate of generic comments from 51% to 17% and generating 4.2 good comments per paper (a 2.8x improvement).
- Subject Added Entry-Topical Term
- Computer science.
- Index Term-Uncontrolled
- Language modeling
- Index Term-Uncontrolled
- Machine learning
- Index Term-Uncontrolled
- Natural language processing
- Index Term-Uncontrolled
- Peer review
- Index Term-Uncontrolled
- Writing assistance
- Added Entry-Corporate Name
- Northwestern University Computer Science
- Host Item Entry
- Dissertations Abstracts International. 85-10B.
- Electronic Location and Access
- 로그인을 한후 보실 수 있는 자료입니다.
- Control Number
- joongbu:655024