Comparing Expected and Real–Time Spotify Service Topology

University essay from KTH/Kommunikationssystem, CoS

Abstract: Spotify is a music streaming service that allows users to listen to their favourite music. Due to the rapid growth in the number of users, the amount of processing that must be provided by the company’s data centers is also growing. This growth in the data centers is necessary, despite the fact that much of the music content is actually sourced by other users based on a peer-to-peer model. Spotify’s backend (the infrastructure that Spotify operates to provide their music streaming service) consists of a number of different services, such as track search, storage, and others. As this infrastructure grows, some service may behave not as expected. Therefore it is important not only for Spotify’s operations (footnote: Also known as the Service Reliability Engineers Team (SRE)) team, but also for developers, to understand exactly how the various services are actually communicating. The problem is challenging because of the scale of the backend network and its rate of growth. In addition, the company aims to grow and expects to expand both the number of users and the amount of content that is available. A steadily increasing feature-set and support of additional platforms adds to the complexity. Another major challenge is to create tools which are useful to the operations team by providing information in a readily comprehensible way and hopefully integrating these tools into their daily routine. The ultimate goal is to design, develop, implement, and evaluate a tool which would help the operations team (and developers) to understand the behavior of the services that are deployed on Spotify’s backend network. The most critical information is to alert the operations staff when services are not operating as expected. Because different services are deployed on different servers the communication between these services is reflected in the network communication between these servers. In order to understand how the services are behaving when there are potentially many thousands of servers we will look for the patterns in the topology of this communication, rather than looking at the individual servers. This thesis describes the tools that successfully extract these patterns in the topology and compares them to the expected behavior.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)