Java Hashmap

Java Hashmap

What is a Hashmap and why do we need it?

The Java hashmap is a data structure that allows efficient access and storage of elements in them. The elements are stored in a <key, value> format. It implements the Map interface and it is important to note that the key must be unique. This essentially means that if you insert <key = 5, value = 100> first and then insert <key = 5, value = 101> into the same hashmap, only the latter remains. This is illustrated in the code below. It allows us to access its elements in O(1) and should be used when the order of accessing elements is not important.

package com.company;
import java.util.HashMap;
import java.util.Map;
public class Main {
    public static void main(String[] args) {
        Map<Integer, Integer> hashMap = new HashMap<>();
        hashMap.put(1, 1001);
        hashMap.put(2, 1002);
        hashMap.put(3, 1003);
        hashMap.put(4, 1005);
        hashMap.put(5, 100);
		  // This gives output as 100
        System.out.println(hashMap.get(5));
		  hashMap.put(5, 101);
		  // This gives output as 101 because the previous entry is removed
        System.out.println(hashMap.get(5));
    }
}

How are elements inserted into the HashMap?

Let us understand a few keywords first to better understand the HashMap.

  • Key
    A key is an element in a hashmap that helps us uniquely identify entries in it. In the example above, {1, 2, 3, 4, 5} are the keys.
  • Hash
    As Wikipedia describes it, a hash function is a function that can be used to map keys of arbitrary size to fixed-size values. The values returned by a hash function are called hash values, hash codes, digests, or simply hashes.
  • Values
    Values are the elements that correspond to a key. In the example above, {1001, 1002, 1003, 1005, 100, 101} are the values.

Now that we are done with these terms, let us dive into how inserting elements into a hashmap works. Hashmap entries are done inside a table which is nothing but an array of Nodes. A Node is made up of Hash, Key, Value and some other components. When an element has to be inserted into the Hashmap, the key’s hash is calculated which gives us the index of the table where the element has to be inserted.

By now you might have some questions in your mind! What if two keys have the same hash or in other words what if there is a collision in the map? What is the size of the table?

Let us first address the issue of collision! Whenever there is a collision due to hash values, the values are inserted in a linked list. However, when there are more than 8 elements in the linked list for a particular hash value, the list is refactored to a self balanced binary search tree or as we better know them, the Red-Black trees. This is done to avoid the linear search time in case of a lengthy linked list.

Now comes the size of the table. This part is pretty interesting! So why waste our time? Let us dive in right away!

The initial size of the table is 16 and the load factor is 0.75. The load factor is a parameter that will help us determine when we should increase the size of the table and recalculate the hash values. This happens when we reach the threshold value, which can be calculated by the equation,
Threshold = load factor * capacity

This threshold is set at 12 (16 * 0.75) initially. You can always change the load factor according to your needs but that won’t be necessary in most of the cases. The size of the table is doubled and the hashes are recalculate to populate the new table once the threshold capacity is reached.

Why was the table size doubled though?

This question struck me when I was exploring the Hashmap and there is a pretty interesting reason for this decision. The hash function is f(x) = key % table_capacity and every time the table is resized, this tedious calculation must be done for each and every node. Now, the makers of the hashmap came up with a pretty neat bitwise hack; when we increase the capacity by a factor of 2, the hash value either remains the same or gets increased by the old value of table_capacity which can be done easily with bit manipulation rather than the costly operation of calculating the hash via the modulo operation. This stuff might seem a bit abstract at first but this code will clear your confusion.

package com.company;
public class Main {
    public static void main(String[] args) {
		  // calculating hash for keys from 1 to 64
        for (int i = 1; i <= 64; i++) {
			   // hash with table size 16 and 32 (16 * 2 = 32)
            System.out.println(i + " " + i % 16 + " " + i % 32);
        }
    }
}

Get and remove operations are quite trivial and you must have already understood them after reading this article! I am pretty sure you gained a good insight into the working of a hashmap 😀

Now go, get them! 😀

No Comments

Post A Comment