Why accessing Java HashMap may cause infinite loop in concurrent environment

  sonic0002        2020-03-29 01:47:00       8,921        0    

HashMap in Java is frequently used to handle key/value pairs. But it is not a good candidate to be used in concurrent environment where ConcurrentHashMap should be used instead. If used in concurrent environment, it may cause lots of unexpected behavior even including make the program getting into an infinite loop situation.

To understand how this can happen, let's first discuss how HaspMap works internally. In this post we will use implementation of HashMap before Java 8 as example, Java 8 provides a different version of HashMap implementation which is more efficient but more complicated as well. The rationale behind them are more or less the same though.

As many of you may know that HashMap is a key/value pair store and the key is stored based on their hash values. Internally, there are two data structures in a HashMap: array and linked list. It has an array table which has a fixed length and each entry of the array is a linked list which stores all the elements which have the same index. The index calculation is based on the hash value of the key. 

To provide a better performance on locating an element in the HashMap, there is some dynamic resizing going on internally when the number of key/value pairs increase beyond a threshold which is determined by the load factor and capacity of the HashMap. The resize of the HashMap is basically to recreate the array with a larger size and rehash all keys and move the entries from the old array to the new array .

The source code of resize in Java 7 is:

void resize(int newCapacity) {
    Entry[] oldTable = table;
    int oldCapacity = oldTable.length;
    if (oldCapacity == MAXIMUM_CAPACITY) {
        threshold = Integer.MAX_VALUE;
        return;
    }

    Entry[] newTable = new Entry[newCapacity];
    transfer(newTable);
    table = newTable;
    threshold = (int)(newCapacity * loadFactor);
}

The transfer() method will move all entries in old table to the new table.

void transfer(Entry[] newTable) {
    Entry[] src = table;
    int newCapacity = newTable.length;
    for (int j = 0; j < src.length; j++) {
        Entry<K,V> e = src[j];
        if (e != null) {
            src[j] = null;
            do {
                Entry<K,V> next = e.next;
                int i = indexFor(e.hash, newCapacity);
                e.next = newTable[i];
                newTable[i] = e;
                e = next;
            } while (e != null);
        }
    }
}

How transfer works is that it will loop through all entries in the old table(the for loop), and for each entry, it loops through the linked list(the while loop) until it reaches the end of the list. For each element in the list, it will recalculate the index in the new table where to put the element to. After getting the new index, it then takes the current head at the new index of the new table and set it as its next element and at the same time make the head of the list as the current element being handled. 

Let's take a simple example to elaborate this resize process. Assume the hash function is as simple as just using the key value to mod the length of the table(key%length). And the table size is 2 initially, there are three keys 5, 7 and 3, after hashing, the index becomes 1 for all three which means they will fall into the same list. Now assume the load factor is 1 which means that the table will resize when the number of elements in the map is the same as the table length. In this case the HashMap will resize and its new table length will become 4(it needs to be doubled) when putting key(3) in the map.

After rehashing, the index for key 3 and 7 will be 3 while the index for key 5 will be 1 hence they will be in different linked lists afterwards.

Since the resizing is changing the internal table and all the elements are rearranged, there are risks of accessing the same resource in a concurrent environment if no synchronization or lock is enforced. 

Below we show one example which might cause the HashMap access goes into an infinite loop in a concurrent environment.

public class HashMapInfiniteLoop {  
    private static HashMap<Integer,String> map = new HashMap<Integer,String>(2,0.75f);  
    
    public static void main(String[] args) {  
        map.put(5, "C");  

        new Thread("Thread1") {  
            public void run() {  
                map.put(7, "B");  
                System.out.println(map);  
            };  
        }.start();  
        new Thread("Thread2") {  
            public void run() {  
                map.put(3, "A);  
                System.out.println(map);  
            };  
        }.start();        
    }  
}

Similar to above example, the initial size is 2 but load factor is 0.75 now. Hence the table resize threshold is 2*0.75 = 1. This means the table will resize when putting the second element. 

When above program runs, the two threads may run at the same time and start to put key 7 and key 3 respectively and resize the table. Now assume thread 1 has started to execute transfer() function statement Entry next = e.next;, and now thread 2 starts to execute to resize. 

In thread 1, e points to the element with key 3 while e.next points to the element with key 7,  and this element points to the head of the linked list after rehashing in thread 2(thread 2 exits). 

Now the execution switches back to thread 1, it executes newTalbe[i] = e, and then e = next, now e points to the element with key 7, the next loop will execute next = e.next and causes next to point to element with key 3. 

e.next = newTable[i] will cause key(3).next to point to key(7), but currently key(7).next is pointing to key(3) already which leads to a cyclic linked list.

If now map.get(11) call is made, an infinite loop will happen.

Reference: https://www.toutiao.com/i6802089727966052867/

JAVA  HASHMAP  INFINITE LOOP 

       

  RELATED


  0 COMMENT


No comment for this article.



  RANDOM FUN

git branch tragedy