JOONGBU UNIVERSITY LIBRARY

본문 바로가기
탑 메뉴 바로가기
주 메뉴 바로가기
하단 바로가기

상세정보

Data Analysis Tools for Statistical Non-Experts- [electronic resource]

Material Type: 학위논문

: 0016934856

Date and Time of Latest Transaction: 20240214101703

ISBN: 9798380333542

DDC: 004

Author: Jun, Eunice.

Title/Author: Data Analysis Tools for Statistical Non-Experts - [electronic resource]

Publish Info: [S.l.] : University of Washington., 2023

Publish Info: Ann Arbor : ProQuest Dissertations & Theses, 2023

Material Info: 1 online resource(244 p.)

General Note: Source: Dissertations Abstracts International, Volume: 85-03, Section: B.

General Note: Advisor: Heer, Jeffrey;Just, Rene.

학위논문주기: Thesis (Ph.D.)--University of Washington, 2023.

Restrictions on Access Note: This item must not be sold to any third party vendors.

Abstracts/Etc: 요약Data analysis is critical to science, public policy, and business. Despite their importance, statistical analyses are difficult to author, especially for researchers with expertise outside of statistics. Existing statistical tools, prioritizing mathematical expressivity and computational control, are low-level while researchers' motivating questions and hypotheses are high-level. The process of translating researchers' questions and hypotheses into low-level statistical code is error-prone.This thesis views statistical analysis authoring as a sensemaking process that involves grappling with domain knowledge, statistics, and programming concerns. To this end, I develop a framework characterizing the cognitive and operational steps involved in translating research questions into statistical analysis code, a process I term hypothesis formalization. I also design, implement, and evaluate three new domain-speciﬁc languages (DSLs) and runtimes that embody hypothesis formalization. The DSLs leverage automated reasoning to compile high-level speciﬁcations of analysis intent into analysis code.The ﬁrst of these, Tea, is used to author Null Hypothesis Significance Tests. Analysts specify their study design, assumptions about data, and hypotheses in Tea's DSL. Tea represents statistical test selection as constraint satisfaction, so it compiles an analyst's specification into a system of constraints to identify a set of valid statistical tests. A benchmark comparison found that Tea's test selection is comparable to that of experts and better than a naive test selection approach.I also introduce Tisane, a system for authoring generalized linear models with or without mixed eﬀects. Analysts specify their domain knowledge in the form of a conceptual model, data collection details, and focus of analysis in Tisane's DSL. Internally, Tisane represents this conceptual model as a graph. Tisane traverses the graph to derive a space of statistical models based on causal reasoning recommendations. Then, in an interactive disambiguation process, Tisane involves analysts in narrowing the space of possible statistical models to one ﬁnal output statistical modeling script. In case studies, we found that Tisane shifted researchers' focus from analysis details to their research questions and streamlined the analysis authoring process. To further improve the usability of the Tisane DSL, I conducted an exploratory elicitation study using Tisane as a probe, designed and implemented an improved version of Tisane as rTisane, and then evaluated rTisane in a controlled lab study. The summative evaluation demonstrated that rTisane's DSL helped analysts introspect on their implicit domain assumptions more deeply, stay true to their analysis intent, and produce statistical models that better ﬁt the data. In all, these systems and evaluations provide evidence that conceptually focused DSLs coupled with automated reasoning can lower the barriers to valid analyses.

Subject Added Entry-Topical Term: Computer science.

Subject Added Entry-Topical Term: Statistics.

Subject Added Entry-Topical Term: Cognitive psychology.

Subject Added Entry-Topical Term: Computer engineering.

Index Term-Uncontrolled: Data analysis

Index Term-Uncontrolled: End-user software engineering

Index Term-Uncontrolled: Human-computer interaction

Index Term-Uncontrolled: Programming languages

Index Term-Uncontrolled: Sensemaking

Index Term-Uncontrolled: Statistical software

Added Entry-Corporate Name: University of Washington Computer Science and Engineering

Host Item Entry: Dissertations Abstracts International. 85-03B.

Host Item Entry: Dissertation Abstract International

Electronic Location and Access: 로그인을 한후 보실 수 있는 자료입니다.

소장사항: 202402 2024

Control Number: joongbu:643018

008240221s2023        ulk                      00        kor
■001000016934856
■00520240214101703
■006m          o    d
■007cr#unu||||||||
■020    ▼a9798380333542
■035    ▼a(MiAaPQ)AAI30635922
■040    ▼aMiAaPQ▼cMiAaPQ
■0820  ▼a004
■1001  ▼aJun,  Eunice.
■24510▼aData  Analysis  Tools  for  Statistical  Non-Experts▼h[electronic  resource]
■260    ▼a[S.l.]▼bUniversity  of  Washington.  ▼c2023
■260  1▼aAnn  Arbor▼bProQuest  Dissertations  &  Theses▼c2023
■300    ▼a1  online  resource(244  p.)
■500    ▼aSource:  Dissertations  Abstracts  International,  Volume:  85-03,  Section:  B.
■500    ▼aAdvisor:  Heer,  Jeffrey;Just,  Rene.
■5021  ▼aThesis  (Ph.D.)--University  of  Washington,  2023.
■506    ▼aThis  item  must  not  be  sold  to  any  third  party  vendors.
■520    ▼aData  analysis  is  critical  to  science,  public  policy,  and  business.  Despite  their  importance,  statistical  analyses  are  difficult  to  author,  especially  for  researchers  with  expertise  outside  of  statistics.  Existing  statistical  tools,  prioritizing  mathematical  expressivity  and  computational  control,  are  low-level  while  researchers'  motivating  questions  and  hypotheses  are  high-level.  The  process  of  translating  researchers'  questions  and  hypotheses  into  low-level  statistical  code  is  error-prone.This  thesis  views  statistical  analysis  authoring  as  a  sensemaking  process  that  involves  grappling  with  domain  knowledge,  statistics,  and  programming  concerns.  To  this  end,  I  develop  a  framework  characterizing  the  cognitive  and  operational  steps  involved  in  translating  research  questions  into  statistical  analysis  code,  a  process  I  term  hypothesis  formalization.  I  also  design,  implement,  and  evaluate  three  new  domain-speciﬁc  languages  (DSLs)  and  runtimes  that  embody  hypothesis  formalization.  The  DSLs  leverage  automated  reasoning  to  compile  high-level  speciﬁcations  of  analysis  intent  into  analysis  code.The  ﬁrst  of  these,  Tea,  is  used  to  author  Null  Hypothesis  Significance  Tests.  Analysts  specify  their  study  design,  assumptions  about  data,  and  hypotheses  in  Tea's  DSL.  Tea  represents  statistical  test  selection  as  constraint  satisfaction,  so  it  compiles  an  analyst's  specification  into  a  system  of  constraints  to  identify  a  set  of  valid  statistical  tests.  A  benchmark  comparison  found  that  Tea's  test  selection  is  comparable  to  that  of  experts  and  better  than  a  naive  test  selection  approach.I  also  introduce  Tisane,  a  system  for  authoring  generalized  linear  models  with  or  without  mixed  eﬀects.  Analysts  specify  their  domain  knowledge  in  the  form  of  a  conceptual  model,  data  collection  details,  and  focus  of  analysis  in  Tisane's  DSL.  Internally,  Tisane  represents  this  conceptual  model  as  a  graph.  Tisane  traverses  the  graph  to  derive  a  space  of  statistical  models  based  on  causal  reasoning  recommendations.  Then,  in  an  interactive  disambiguation  process,  Tisane  involves  analysts  in  narrowing  the  space  of  possible  statistical  models  to  one  ﬁnal  output  statistical  modeling  script.  In  case  studies,  we  found  that  Tisane  shifted  researchers'  focus  from  analysis  details  to  their  research  questions  and  streamlined  the  analysis  authoring  process.  To  further  improve  the  usability  of  the  Tisane  DSL,  I  conducted  an  exploratory  elicitation  study  using  Tisane  as  a  probe,  designed  and  implemented  an  improved  version  of  Tisane  as  rTisane,  and  then  evaluated  rTisane  in  a  controlled  lab  study.  The  summative  evaluation  demonstrated  that  rTisane's  DSL  helped  analysts  introspect  on  their  implicit  domain  assumptions  more  deeply,  stay  true  to  their  analysis  intent,  and  produce  statistical  models  that  better  ﬁt  the  data.  In  all,  these  systems  and  evaluations  provide  evidence  that  conceptually  focused  DSLs  coupled  with  automated  reasoning  can  lower  the  barriers  to  valid  analyses.
■590    ▼aSchool  code:  0250.
■650  4▼aComputer  science.
■650  4▼aStatistics.
■650  4▼aCognitive  psychology.
■650  4▼aComputer  engineering.
■653    ▼aData  analysis
■653    ▼aEnd-user  software  engineering
■653    ▼aHuman-computer  interaction
■653    ▼aProgramming  languages
■653    ▼aSensemaking
■653    ▼aStatistical  software
■690    ▼a0984
■690    ▼a0463
■690    ▼a0633
■690    ▼a0464
■71020▼aUniversity  of  Washington▼bComputer  Science  and  Engineering.
■7730  ▼tDissertations  Abstracts  International▼g85-03B.
■773    ▼tDissertation  Abstract  International
■790    ▼a0250
■791    ▼aPh.D.
■792    ▼a2023
■793    ▼aEnglish
■85640▼uhttp://www.riss.kr/pdu/ddodLink.do?id=T16934856▼nKERIS▼z이  자료의  원문은  한국교육학술정보원에서  제공합니다.
■980    ▼a202402▼f2024

New Books MORE

최근 3년간 통계입니다.

Reservation
캠퍼스간 도서대출
서가에 없는 책 신고
보존서고대출신청
My Folder

Material
Reg No.	Call No.	Location	Status	Lend Info
TQ0028928	T	원문자료	열람가능/출력가능	열람가능/출력가능 마이폴더 부재도서신고

* Reservations are available in the borrowing book. To make reservations, Please click the reservation button

본문

서브메뉴

검색

상세정보

MARC

미리보기

내보내기

chatGPT토론

Ai 추천 관련 도서

New Books MORE

최근 3년간 통계입니다.

Detail Info.

해당 도서를 다른 이용자가 함께 대출한 도서

Related books

Related Popular Books

도서위치

QUICK LINK