Design Spotify

What does it do?

Spotify is an app where users can perform following tasks:

  • Can create user account, manage subscriptions

  • search music based on movie, actor, singer, composer, album, etc.

  • listen to music / music streaming

  • follow singers, composers, etc.

  • can listen podcasts on topics of interest

  • Others

    • Create, edit, share playlist; add to favorites

    • Follow playlists created by other users or curators

    • Offline playback (for downloaded content)

    • Quality settings (e.g., high, medium, low)

    • Shuffle, repeat, and crossfade options

    • get music recommendations

  • Music Discovery

    • Curated playlists by genre, mood, or activity

    • Charts and trending music

    • New releases and featured albums

  • Ads for non-premium or free users

Narrowing down the Scope

What are the bare minimum functionalities do we want to have?

  • User account

  • Streaming of songs / listening songs

  • User searches a song, based on results, select and listen the song

Understanding the Scale

  • How many users overall?

    • 1 Billion

    • For each user, we are storing 1 KB data ~= 1 Billion KB ~= 1 TB data

  • How many Songs overall?

    • 100 Million songs overall

    • 5 MB for each audio file, hence space required to store songs: 0.5 PB data

    • Metadata, 100 B for each song, hence overall: 10 GB of metadata

  • Keep replica (replica = 3) == need 1.5 PB to store songs

  • What is the total space requirement?

  • Number of users active per day?

  • Number of songs 1 users hear per day?

Designing System

Considerations and Components

  • To make the user experience better, instead of loading whole song at once, we can stream the song

  • We will store song / audio file to s3 bucket or some cloud storage and metadata to a NoSQL database. Along with song metadata, we will also store the location /url of the song in cloud storage

  • Use caching layer

  • Given the number of users, we will spin up multiple web servers

  • Need load balancer to manage the requests across servers

  • This is a I/O heavy app rather than compute heavy, hence load balancer to be used accordingly (example, if too many reads / more than server can handle, load balancer should be smart enough to handle it)

  • This is a read heavy application, write will be handled separately at backend, user has no involvement for write in case of songs.

  • For fault tolerance, we are keeping the replica of 3, hence our data is distributed. Also, we will keep the replicas of songs closer to the geographies where users are.

Architecture Diagram

Resources

Last updated

Was this helpful?