Hash-table and Hash-map

What is it?

  • A hash function

    • is any function that can be used to map data of arbitrary size to fixed-size values

    • is a function that computes an index value that in turn holds the elements to be searched, inserted, removed, etc.

      • This makes it easy and fast to access data.

      • This key is the same as what referred to the key-value pairs for the hash table

    • The values returned by a hash function are called hash values, hash codes, digests, or hashes

  • The hash values are usually used to index a fixed-size table called a hash table. Use of a hash function to index a hash table is called hashing or scatter storage addressing

  • Hashing is the process of generating an index value using a hash function

  • The Hash table data structure stores elements in key-value pairs where

    • Key - unique integer that is used for indexing the values

    • Value - data that are associated with keys

  • The index generated using hashing is referred to as hashcode, used as a key in the hash table

  • Different objects can have the same hashcode, within the same hashcodes objects are arranged in a LinkedList. Thus, the best hashing function would be the one that gives all unique keys.

  • The time complexity of get and put operation is mentioned as O(1), but this is not always the case. The assumption is that key-value pairs are well distributed across the buckets. It means the hashcode implemented is good (almost unique key for each object)

Load Factor

In general, we need to decide the number of buckets initially itself so that it is better for hash function to generate the hashcodes.

When the total number of items in hashmap goes on increasing keeping the default initial capacity of hashmap 16, At one point of time, hashmap performance will start degrading and need to increase buckets for improving performance.

Load Factor is a measure, which decides when exactly to increase the hashmap capacity(buckets) to maintain get and put operation complexity of O(1)

Default load factor of Hashmap is 0.75f (i.e 75% of current map size)

After increasing the bucket numbers, the hashcode for all the older objects needs to be re-generated to keep the distribution uniform.

Hash Collision and Resolution

  • When one or more hash values compete with a single hash table slot, collisions occur.

  • In other words, when we use a hash function that generates a single key (hashcode) for multiple objects, we face the issue of hash collision

To resolve this, the next available empty slot is assigned to the current hash value. The most common methods are:

Chaining

If one slot gets more than 1 element (example, 92, 85, 50), then the slot (hashcode) is assigned a LinkedList. The head of the LinkedList is at the start of the slot and a new element in the same slot is placed at the head.

The time complexity of the get and put operation is O(elements in the LinkedList at the slot). That is slot is found in O(1), but then LinkedList needs to be looked in.

Probabilistic Hashing

  • This is memory-based hashing that implements caching

  • When a collision occurs, either the old record is replaced or the new record may be dropped

  • Although this scenario has a risk of losing data, it is still preferred due to its ease of implementation and high performance

Open Addressing

This technique depends on space usage and can be done with linear or quadratic probing techniques. As the name says, this technique tries to find an available slot to store the record. It can be done in one of the 3 ways –

  • Linear probing – Here, the next probe interval is fixed to 1. It supports best caching but miserably fails at clustering.

  • Quadratic probing – the probe distance is calculated based on the quadratic equation. This is considerably a better option as it balances clustering and caching.

  • Double hashing – Here, the probing interval is fixed for each record by a second hashing function. This technique has poor cache performance although it does not have any clustering issues.

Perfect Hashing

When the slots are uniquely mapped, there is a small chance of collision. However, it can be done where there is a lot of spare memory.

Coalesced Hashing Technique

Types of Hash Functions

Hash Function Confusion with Other Concepts

Hash functions are related to (and often confused with) checksums, check digits, fingerprints, lossy compression, randomization functions, error-correcting codes, and ciphers. Although the concepts overlap to some extent, each one has its own uses and requirements and is designed and optimized differently. The hash functions differ from the concepts numbered mainly in terms of data integrity.

Difference between Dictionary and Hashtable

  • Dictionary is generic whereas Hashtable is not Generic. That is, we can add any type of object to Hashtable, but while retrieving we need to cast it to the required type.

    So, a hashtable is not type-safe. But to the dictionary, while declaring itself we can specify the type of key and value, so there is no need to cast while retrieving

  • A hash table is one way to implement a dictionary. A typical one at that, but it's not by definition the only one. You could equally well implement a dictionary using a linked list or a search tree, it just wouldn't be as efficient (for some metric of efficiency)

  • Hashtable is thread-safe for use by multiple reader threads and a single writing thread.

    In the dictionary, public static members are thread-safe, but any instance members are not guaranteed to be thread-safe

  • In dictionary request to non-existing key throws exception but in hashtable request to non-existing key returns null

  • Dictionary is potentially a bit faster for value types but in the hash-table, it is a bit slower (needs boxing/unboxing) for value types

  • In normal case, multiple values to a key is not possible both in hash table and dictionary, replaces the older value with the newer value. However, as read, in case of hash collision, we have the solutions

  • Check required: In a dictionary, it is possible to get the key from value, but not possible in the case of hashtable

Use cases of Hash Function

References for Further Reading

Hashing Collision and Resolution
Hashtable, dictionary, and set; well explained and detailed article

Why redis is so fast

Last updated

Was this helpful?