Clustering server properties and syntactic structures in state machines for hyperscale data center operations

DiVA Portal - Digital Scientific Archive

Abstract

In hyperscale data center operations, automation is applied in many ways as it is becomes very hard to scale otherwise. There are however areas relating to understanding, grouping and diagnosing of error reports that are done manually at Facebook today. This master’s thesis investigates solutions for applying unsupervised clustering methods to server error reports, server properties and historical data to speed up and enhance the process of finding and root causing systematic issues. By utilizing data representations that can embed both key-value data and historical event log data, the thesis shows that clustering algorithms together with data representations that capture syntactic and semantic structures in the data can be applied with good results in a real-world scenario.

Featured Publications