Caching
Last updated
Was this helpful?
Last updated
Was this helpful?
Caching is a technique used to store copies of data in a temporary storage location so that future requests for that data can be served faster. It improves system performance by reducing data retrieval times and lowering the load on the underlying resources.
Works on the locality of reference principle: recently requested data or the data we speculate would be requested again and often is kept in memory which is faster to retrieve
It’s like short-term memory which has limited space but is faster and contains the most recently accessed information
We can not use all the RAM for storage, RAM is for processing, therefore the decision is important: what to be cached and for how long it to be cached
Where to Cache, decide if caching should be client-side, server-side, or distributed (e.g., CDNs).
What to Cache, identify frequently accessed data or expensive computations that benefit
TTL (Time-to-Live), for how long to cache? Set appropriate expiration times to balance freshness and performance.
How much to Cache, allocate enough memory without overcommitting resources.
Eviction Policies, logic to replace earlier cache with new one?, Choose between LRU, LFU, FIFO, etc., based on usage patterns.
Consistency, ensure cached data is synchronized with the source, especially in distributed systems.
Cache Invalidation, decide when and how to invalidate or update cached data.
Monitoring and Metrics, Monitor hit/miss rates and adjust above strategies as needed.
Security, ensure sensitive data is not cached insecurely.
The cache size must not be unbounded, the cache should not grow at will. Unlimited growth can lead to memory exhaustion bugs in programs
Put a limit on the amount of data that are kept in the cache at a time
having a hard limit on the cache size
by defining an expiration policy that evicts old items from the cache at some point.
Cache Eviction Policy
LRU - Least Recently Used
LFU - Least Frequently Used
MRU - Most Recently Used
FIFO - First In First Out
Suitable Candidate For Caching
The function is called frequently
Output not changing very often
It takes a long time to execute the function
It is important to know when to invalidate and reload the cache with the fresh data
Caching Should Be Faster Than Getting The Data From The Current Data Source
The memory footprint of the application will increase when we cache the results therefore it’s crucial to choose the appropriate data structures and only cache the attributes of the data that are required to be cached.
Frequently Accessed Data, data that is repeatedly requested, such as user profiles or product information.
Expensive Computations, results of complex calculations or operations that are resource-intensive.
Static Resources, images, stylesheets, and scripts that do not change often.
Database Query Results, common query results that remain unchanged for a period.
Configuration Settings, settings that are read frequently but rarely changed.
The cache can be used in every layer: hardware, OS, web browser, web application, but are often found nearest to the UI to make the response fastest.
In memory
In disk storage
CDN (Content Delivery Network)
Are cookies some type of caching?
Persistence: when the application re-starts, we lose the in-memory cache but in-disk storage is persistent. Redis provides an option to keep the cache persistent.
Networked: in-memory cache is not networked and hence would not be available across machine in a distributed system with proxy servers. Redis is networked.
Scalability: disk-cache and redis type cache allow replicas for fault tolerance and not constrained by the memory of the application, unlike in-memory cache
Speed: in-memory cache is faster compared to disk-cache as no-network is involved
Browser Cache"Browser caching involves a visitor’s browser downloading your website’s resources, (e.g., HTML files, JavaScript files and images) to their local drive. This speeds up page load times during subsequent visits and reduces bandwidth usage.
The process is governed by a browser’s internal cache policy, which indicates the resources that are to be cached and for what period of time."
Cache and cookies are different
Cache is used to store online page resources during a browser for the long run purpose or to decrease the loading time.
Cookies are employed to store user choices such as browsing session to trace the user preferences
Caching is temporary data, stored for fast retrieval; this temporary data needs to be in sync with the main database. This syncing is done through write policy.
Write Through
The write to the DB happens through the cache. Every time a new data is written in the cache it gets updated in the DB.
Advantages - There won’t be a mismatch of data between cache and storage
Disadvantage - Both the cache and the storage needs to be updated creating an overhead instead of increasing the performance
Write Back
the cache asynchronously updates the values to the DB at set intervals
This method swaps the advantage and disadvantage of Write through. Though writing to a cache is faster but Data loss and inconsistency
Write Around
Write the data directly to storage and load the cache only when data is read
Advantages
Cache not overloaded with data that is not read immediately after write
Reduces the latency of the write-through method
Disadvantages
Reading recently written data will cause a cache miss and is not suitable for such use-cases.
Reading about different types of memories would be helpful :
Browser Cache and Cookies, in most of the cases, is front end developer task, and probably we can skip it for now. If you want to go deeper, refer the section