서브메뉴
검색
Towards Cloud-Scale Debugging.
Towards Cloud-Scale Debugging.
- 자료유형
- 학위논문
- Control Number
- 0017162178
- International Standard Book Number
- 9798382778426
- Dewey Decimal Classification Number
- 004
- Main Entry-Personal Name
- Dogga, Pradeep.
- Publication, Distribution, etc. (Imprint
- [S.l.] : University of California, Los Angeles., 2024
- Publication, Distribution, etc. (Imprint
- Ann Arbor : ProQuest Dissertations & Theses, 2024
- Physical Description
- 178 p.
- General Note
- Source: Dissertations Abstracts International, Volume: 85-12, Section: B.
- General Note
- Advisor: Netravali, Ravi Arun;Varghese, George.
- Dissertation Note
- Thesis (Ph.D.)--University of California, Los Angeles, 2024.
- Summary, Etc.
- 요약Cloud computing is an integral part of today's world: it primarily enables individuals and enterprises to provision and manage resources such as compute, storage, etc., for their needs with the click of a button. Modular approach to software development enabled cloud providers to rapidly evolve and deliver increasing number of services to users rendering clouds mission-critical. To insure prompt serviceability of this Achilles' Heel from facing incidents, cloud providers employ significant human resources. However, with the ever increasing number of services offered by clouds and growing types of workloads such as the proliferation of Machine Learning workloads in recent times, it is no longer viable for cloud providers to scale their human resources at this pace to insure prompt serviceability of their clouds.In this dissertation, I present my work towards improving the serviceability of clouds by leveraging insights from my experience with real debugging workflows employed at the three largest clouds today. I present techniques from Machine Learning and Natural Language Processing to leverage the vast amount of historical debugging data in clouds to develop tools that provide assistance to their engineers. I present a 'Coarsening' framework that enables transition towards a centralized debugging plane and discuss practical evaluations of tools built using this framework.I present Revelio, a tool that can generate debugging queries for engineers to execute over system-wide logged data, whose results can likely hint them of the root cause of an incident. To enable benchmarking many techniques, I also built a distributed systems debugging testbed that can inject faults into services, interface with human users and collect execution logs across the system. I present AutoARTS, a tool that can tag a lengthy postmortem report of an incident in the cloud with all root causes from an extensive taxonomy and can also highlight key pieces of information from a postmortem for ease of analysis. I present PerfRCA, a tool that can scale causal discovery to production-scale telemetry to reason performance degradations. I conclude with my vision for a centralized approach to automatically extract generalizable debugging assistance to engineers across a cloud.
- Subject Added Entry-Topical Term
- Computer science.
- Subject Added Entry-Topical Term
- Computer engineering.
- Index Term-Uncontrolled
- Cloud computing
- Index Term-Uncontrolled
- Computer networks
- Index Term-Uncontrolled
- Debugging
- Index Term-Uncontrolled
- Distributed systems
- Index Term-Uncontrolled
- Machine Learning
- Index Term-Uncontrolled
- Natural Language Processing
- Added Entry-Corporate Name
- University of California, Los Angeles Computer Science 0201
- Host Item Entry
- Dissertations Abstracts International. 85-12B.
- Electronic Location and Access
- 로그인을 한후 보실 수 있는 자료입니다.
- Control Number
- joongbu:657921