Memory Leak
What is it?
a
What does Memory Leak cost me?
It is almost impossible to scale up when there is a memory leak
We hamper the experience of current users as well when there is a memory leakage
Example:
In my current organization, we are using 5 node clusters for different services
There was leakage in one of the applications, deployed on 1 server
Since there was a memory leakage in that application, it eats up all the RAM
Hence, other applications were working only on 4 nodes
And the current application was needed to restart whenever eating up all of the RAM
Hampering the current user experience, costing us unnecessarily on the bigger server, and we are facing the scaling issue as well
Possible Memory Leak cases?
some low-level C library is leaking
We can skip as of now (this is from the point, how python has been created)
Python code has global lists or dicts that grow over time and forgot to remove the objects after use
Need to figure out the source, and rectify this leakage
There are some reference cycles in the app
Automatically taken care of by garbage collector for python
Creating multiple instances of a very heavy package within an application
This can be rectified by creating the instance globally and using it everywhere
How to identify Memory Leak?
The most common way to detect memory leakage is when the server runs out of free space
Track the available RAM space for the server for 2-3 days after the application has been deployed
Use profiling tool that tells which part of application using how much RAM memory
Log profiler stats and observe which part of the application has incremental memory usage
Few common profilers are:
mem_top: https://pypi.org/project/mem_top/
How to handle Memory Leak?
Manually disposing-off resources no more needed (but reference is still available for the resource).
Nearly all languages include resource types that aren’t automatically freed.
Need to write specific code that tells the application that the resource’s work has finished
Most of the languages are equipped with an automatic memory management system called a garbage collector which frees up memory that the application doesn’t need.
That is when references count to a variable is zero, the gc frees up space
Within an application, if a single instance of a package can work, then create it globally in a config file. Call the instance wherever required
Example: spacy "en core web lg" is a fairly large package, close to 3GB in size, we were using 3 instances of it earlier. When identified the issue, started using the single instance, declared globally
Few More Concepts
To check the memory location, we can use: hex(id(<value>)), it will give the location of the value
# If the values of x and y are same, it will point to the same memory location
x = 1
y = 1
hex(id(x))
hex(id(y))
Python uses the process called "interning", python only stores one object on Heap memory and ask different variables to point to this memory address if they use those objects
Interning does not apply to other types of objects such as large integers, most strings, floats, lists, dictionaries, tuples.
Common Ways to Reduce the Space Complexity
to be updated....
References for Further Reading
Last updated
Was this helpful?