본문

서브메뉴

Data Analysis Tools for Statistical Non-Experts- [electronic resource]
내용보기
Data Analysis Tools for Statistical Non-Experts- [electronic resource]
자료유형  
 학위논문
Control Number  
0016934856
International Standard Book Number  
9798380333542
Dewey Decimal Classification Number  
004
Main Entry-Personal Name  
Jun, Eunice.
Publication, Distribution, etc. (Imprint  
[S.l.] : University of Washington., 2023
Publication, Distribution, etc. (Imprint  
Ann Arbor : ProQuest Dissertations & Theses, 2023
Physical Description  
1 online resource(244 p.)
General Note  
Source: Dissertations Abstracts International, Volume: 85-03, Section: B.
General Note  
Advisor: Heer, Jeffrey;Just, Rene.
Dissertation Note  
Thesis (Ph.D.)--University of Washington, 2023.
Restrictions on Access Note  
This item must not be sold to any third party vendors.
Summary, Etc.  
요약Data analysis is critical to science, public policy, and business. Despite their importance, statistical analyses are difficult to author, especially for researchers with expertise outside of statistics. Existing statistical tools, prioritizing mathematical expressivity and computational control, are low-level while researchers' motivating questions and hypotheses are high-level. The process of translating researchers' questions and hypotheses into low-level statistical code is error-prone.This thesis views statistical analysis authoring as a sensemaking process that involves grappling with domain knowledge, statistics, and programming concerns. To this end, I develop a framework characterizing the cognitive and operational steps involved in translating research questions into statistical analysis code, a process I term hypothesis formalization. I also design, implement, and evaluate three new domain-specific languages (DSLs) and runtimes that embody hypothesis formalization. The DSLs leverage automated reasoning to compile high-level specifications of analysis intent into analysis code.The first of these, Tea, is used to author Null Hypothesis Significance Tests. Analysts specify their study design, assumptions about data, and hypotheses in Tea's DSL. Tea represents statistical test selection as constraint satisfaction, so it compiles an analyst's specification into a system of constraints to identify a set of valid statistical tests. A benchmark comparison found that Tea's test selection is comparable to that of experts and better than a naive test selection approach.I also introduce Tisane, a system for authoring generalized linear models with or without mixed effects. Analysts specify their domain knowledge in the form of a conceptual model, data collection details, and focus of analysis in Tisane's DSL. Internally, Tisane represents this conceptual model as a graph. Tisane traverses the graph to derive a space of statistical models based on causal reasoning recommendations. Then, in an interactive disambiguation process, Tisane involves analysts in narrowing the space of possible statistical models to one final output statistical modeling script. In case studies, we found that Tisane shifted researchers' focus from analysis details to their research questions and streamlined the analysis authoring process. To further improve the usability of the Tisane DSL, I conducted an exploratory elicitation study using Tisane as a probe, designed and implemented an improved version of Tisane as rTisane, and then evaluated rTisane in a controlled lab study. The summative evaluation demonstrated that rTisane's DSL helped analysts introspect on their implicit domain assumptions more deeply, stay true to their analysis intent, and produce statistical models that better fit the data. In all, these systems and evaluations provide evidence that conceptually focused DSLs coupled with automated reasoning can lower the barriers to valid analyses.
Subject Added Entry-Topical Term  
Computer science.
Subject Added Entry-Topical Term  
Statistics.
Subject Added Entry-Topical Term  
Cognitive psychology.
Subject Added Entry-Topical Term  
Computer engineering.
Index Term-Uncontrolled  
Data analysis
Index Term-Uncontrolled  
End-user software engineering
Index Term-Uncontrolled  
Human-computer interaction
Index Term-Uncontrolled  
Programming languages
Index Term-Uncontrolled  
Sensemaking
Index Term-Uncontrolled  
Statistical software
Added Entry-Corporate Name  
University of Washington Computer Science and Engineering
Host Item Entry  
Dissertations Abstracts International. 85-03B.
Host Item Entry  
Dissertation Abstract International
Electronic Location and Access  
로그인을 한후 보실 수 있는 자료입니다.
Control Number  
joongbu:643018
신착도서 더보기
최근 3년간 통계입니다.

소장정보

  • 예약
  • 캠퍼스간 도서대출
  • 서가에 없는 책 신고
  • 나의폴더
소장자료
등록번호 청구기호 소장처 대출가능여부 대출정보
TQ0028928 T   원문자료 열람가능/출력가능 열람가능/출력가능
마이폴더 부재도서신고

* 대출중인 자료에 한하여 예약이 가능합니다. 예약을 원하시면 예약버튼을 클릭하십시오.

해당 도서를 다른 이용자가 함께 대출한 도서

관련도서

관련 인기도서

도서위치