HashMap is one of the most frequently used collection types in Java, it stores key-value pairs. Ideally it expects to use hash table which expects the data access time complexity to be O(1), however, due to hash conflicts, in reality, it uses linked list or red-black tree to store data which makes the worst case time complexity to be O(logn).
Although collections are using data structures like arrays and linked lists, unlike arrays, they will dynamically resize when there is not enough space to store data It involves copying data from old array to the new array which is considered as a heavy operation, so normally an initial capacity can be specified to avoid unnecessary resizing if there is already an estimation of the number of entries would be.
Before deciding the correct initial capacity, one important question to be answered is when resizing will be triggered in HashMap?
HashMap provides a constructor method which can take an initial capacity value and eventually this constructor method will call another constructor method with default load factor value of 0.75.
public HashMap(int initialCapacity, float loadFactor) {
if (initialCapacity < 0)
throw new IllegalArgumentException("Illegal initial capacity: " + initialCapacity);
if (initialCapacity > MAXIMUM_CAPACITY)
initialCapacity = MAXIMUM_CAPACITY;
if (loadFactor <= 0 || Float.isNaN(loadFactor))
throw new IllegalArgumentException("Illegal load factor: " + loadFactor);
this.loadFactor = loadFactor;
this.threshold = tableSizeFor(initialCapacity);
}
Here threshold
is the one being used to determine when the resizing should be triggered. From the value assignment, it's not directly taking the initial capacity value, instead it calls another method tableSizeFor()
to calculate the threshold.
static final int tableSizeFor(int cap) {
int n = cap - 1;
n |= n >>> 1;
n |= n >>> 2;
n |= n >>> 4;
n |= n >>> 8;
n |= n >>> 16;
return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}
The calculated value will be like 2^N which allows to find the location of the old data in new table.
When 10000 is the initial capacity, we may think that it would trigger resizing after adding 10000*0.75 = 7500 entries. The truth is that after running tableSizeFor(), it's threshold for resizing becomes 2^14 = 16384. Does this mean it will not trigger resize even if 10000 entries have been added in the HashMap?
What if 1000 is the initial capacity? The threshold after calculation would be 2^10 = 1024, hence it will not trigger resizing after adding 1000 entries as well?
The answer is no.
To understand this, there is another question to be answered, when is the table initialized? resize()
has the purpose of resizing the table when threshold is reached, but it also has another purpose, resize() also takes care of table initialization.
When creating the HashMap object with constructor method, it will only initialize the table when first value is put in the map.
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,boolean evict) {
Node<K,V>[] tab; Node<K,V> p; int n, i;
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length; // call resize()
// ...
}
And it will be initialized in resize() method.
final Node<K,V>[] resize() {
Node<K,V>[] oldTab = table;
int oldCap = (oldTab == null) ? 0 : oldTab.length;
int oldThr = threshold;
int newCap, newThr = 0;
if (oldCap > 0) {
if (oldCap >= MAXIMUM_CAPACITY) {
threshold = Integer.MAX_VALUE;
return oldTab;
}
else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY)
newThr = oldThr << 1;
} else if (oldThr > 0)
newCap = oldThr; // â‘
else {
newCap = DEFAULT_INITIAL_CAPACITY;
newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
}
if (newThr == 0) {
// â‘¡
float ft = (float)newCap * loadFactor;
newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?(int)ft : Integer.MAX_VALUE);
}
threshold = newThr; // â‘¢
Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
table = newTab; // â‘£
// ....
}
When initial capacity is specified but table is not initialized yet, it will assign oldThr
to newCap
, hence the new table created will have the size of the initial threshold. Then the threshold
will be adjusted to newCap*0.75. After this, the table will be created and initialized.
So if the initial capacity is 1000, the actual threshold to trigger resize would be 1024*0.75 = 768, this means that resizing will be triggered before 1000 entries are added. If the initial size is 10000, the actual threshold to trigger resize is 16384*0.75 = 12288.
Hence when specifying the initial capacity value when creating HashMap object, the actual value to be specified should be calculated as expected initial capacity/loadFactor to avoid unexpected resizing.
a very good article!! I have learn a lot in it.the article answer so many questions in my mind that I keep taking notes while I was reading. thanks a lot for your article !!