Design Spotify
What does it do?
Spotify is an app where users can perform following tasks:
Can create user account, manage subscriptions
search music based on movie, actor, singer, composer, album, etc.
listen to music / music streaming
follow singers, composers, etc.
can listen podcasts on topics of interest
Others
Create, edit, share playlist; add to favorites
Follow playlists created by other users or curators
Offline playback (for downloaded content)
Quality settings (e.g., high, medium, low)
Shuffle, repeat, and crossfade options
get music recommendations
Music Discovery
Curated playlists by genre, mood, or activity
Charts and trending music
New releases and featured albums
Ads for non-premium or free users
Narrowing down the Scope
What are the bare minimum functionalities do we want to have?
User account
Streaming of songs / listening songs
User searches a song, based on results, select and listen the song
Understanding the Scale
How many users overall?
1 Billion
For each user, we are storing 1 KB data ~= 1 Billion KB ~= 1 TB data
How many Songs overall?
100 Million songs overall
5 MB for each audio file, hence space required to store songs: 0.5 PB data
Metadata, 100 B for each song, hence overall: 10 GB of metadata
Keep replica (replica = 3) == need 1.5 PB to store songs
What is the total space requirement?
Number of users active per day?
Number of songs 1 users hear per day?
Designing System
Considerations and Components
To make the user experience better, instead of loading whole song at once, we can stream the song
We will store song / audio file to s3 bucket or some cloud storage and metadata to a NoSQL database. Along with song metadata, we will also store the location /url of the song in cloud storage
Use caching layer
Given the number of users, we will spin up multiple web servers
Need load balancer to manage the requests across servers
This is a I/O heavy app rather than compute heavy, hence load balancer to be used accordingly (example, if too many reads / more than server can handle, load balancer should be smart enough to handle it)
This is a read heavy application, write will be handled separately at backend, user has no involvement for write in case of songs.
For fault tolerance, we are keeping the replica of 3, hence our data is distributed. Also, we will keep the replicas of songs closer to the geographies where users are.
Architecture Diagram
Resources
Last updated
Was this helpful?