Hash-table and Hash-map
What is it?
A hash function
is any function that can be used to map data of arbitrary size to fixed-size values
is a function that computes an index value that in turn holds the elements to be searched, inserted, removed, etc.
This makes it easy and fast to access data.
This key is the same as what referred to the key-value pairs for the hash table
The values returned by a hash function are called hash values, hash codes, digests, or hashes
The hash values are usually used to index a fixed-size table called a hash table. Use of a hash function to index a hash table is called hashing or scatter storage addressing
Hashing is the process of generating an index value using a hash function
The Hash table data structure stores elements in key-value pairs where
Key - unique integer that is used for indexing the values
Value - data that are associated with keys
The index generated using hashing is referred to as hashcode, used as a key in the hash table
Different objects can have the same hashcode, within the same hashcodes objects are arranged in a LinkedList. Thus, the best hashing function would be the one that gives all unique keys.
The time complexity of get and put operation is mentioned as O(1), but this is not always the case. The assumption is that key-value pairs are well distributed across the buckets. It means the hashcode implemented is good (almost unique key for each object)
Load Factor
In general, we need to decide the number of buckets initially itself so that it is better for hash function to generate the hashcodes.
When the total number of items in hashmap goes on increasing keeping the default initial capacity of hashmap 16, At one point of time, hashmap performance will start degrading and need to increase buckets for improving performance.
Load Factor is a measure, which decides when exactly to increase the hashmap capacity(buckets) to maintain get and put operation complexity of O(1)
Default load factor of Hashmap is 0.75f (i.e 75% of current map size)
After increasing the bucket numbers, the hashcode for all the older objects needs to be re-generated to keep the distribution uniform.
Hash Collision and Resolution
When one or more hash values compete with a single hash table slot, collisions occur.
In other words, when we use a hash function that generates a single key (hashcode) for multiple objects, we face the issue of hash collision
To resolve this, the next available empty slot is assigned to the current hash value. The most common methods are:
Chaining
If one slot gets more than 1 element (example, 92, 85, 50), then the slot (hashcode) is assigned a LinkedList. The head of the LinkedList is at the start of the slot and a new element in the same slot is placed at the head.
The time complexity of the get and put operation is O(elements in the LinkedList at the slot). That is slot is found in O(1), but then LinkedList needs to be looked in.
Probabilistic Hashing
This is memory-based hashing that implements caching
When a collision occurs, either the old record is replaced or the new record may be dropped
Although this scenario has a risk of losing data, it is still preferred due to its ease of implementation and high performance
Open Addressing
This technique depends on space usage and can be done with linear or quadratic probing techniques. As the name says, this technique tries to find an available slot to store the record. It can be done in one of the 3 ways –
Linear probing – Here, the next probe interval is fixed to 1. It supports best caching but miserably fails at clustering.
Quadratic probing – the probe distance is calculated based on the quadratic equation. This is considerably a better option as it balances clustering and caching.
Double hashing – Here, the probing interval is fixed for each record by a second hashing function. This technique has poor cache performance although it does not have any clustering issues.
Perfect Hashing
When the slots are uniquely mapped, there is a small chance of collision. However, it can be done where there is a lot of spare memory.
Coalesced Hashing Technique
Types of Hash Functions
Hash Function Confusion with Other Concepts
Difference between Dictionary and Hashtable
Dictionary is generic whereas Hashtable is not Generic. That is, we can add any type of object to Hashtable, but while retrieving we need to cast it to the required type.
So, a hashtable is not type-safe. But to the dictionary, while declaring itself we can specify the type of key and value, so there is no need to cast while retrieving
A hash table is one way to implement a dictionary. A typical one at that, but it's not by definition the only one. You could equally well implement a dictionary using a linked list or a search tree, it just wouldn't be as efficient (for some metric of efficiency)
Hashtable is thread-safe for use by multiple reader threads and a single writing thread.
In the dictionary, public static members are thread-safe, but any instance members are not guaranteed to be thread-safe
In dictionary request to non-existing key throws exception but in hashtable request to non-existing key returns null
Dictionary is potentially a bit faster for value types but in the hash-table, it is a bit slower (needs boxing/unboxing) for value types
In normal case, multiple values to a key is not possible both in hash table and dictionary, replaces the older value with the newer value. However, as read, in case of hash collision, we have the solutions
Check required: In a dictionary, it is possible to get the key from value, but not possible in the case of hashtable
Use cases of Hash Function
References for Further Reading
Why redis is so fast
Last updated
Was this helpful?