Why does HashMap resize() again, when specifying a precise capacity?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
6
down vote

favorite












Code speaks more than words, so:



final int size = 100;
Map<Integer, String> m = new HashMap<>(size);
for (int i = 0; i < size; i++) m.put(i, String.valueOf(i));


Why is the HashMap internally calling resize() 21 2 times! (Credit to Andreas for identifying that the JVM uses HashMaps internally, 19 of the 21 cals were from other processes)



Two resize() calls is still not acceptable for my application. I need to optimize this.



If I am a new java developer, my first intuitive guess at what "capacity" means in the HashMap constructor is that it is the capacity for the number of elements that I (the consumer of HashMap) am going to put into the Map. But this is not true.



If I want to optimize for CPU performance, then I need to know the internals of HashMap intimately enough to know exactly how sparse the HashMap bucket array needs to be. This is strange in my opinion. HashMap should implicitly do this for you. It is the whole point of encapsulation in OOP.



Note: I have confirmed that resize() is the bottleneck for my applications use case, so that is why my goal is to reduce the number of calls to resize().



The question:



If I know the exact quantity of entries I am going to put into the map beforehand. What capacity do I chose, to prevent any extra calls resize() operations? Something like size * 10? I would also like some background on why HashMap is designed this way.



Edit: I am getting asked a lot why this optimization is necassary. My application is spending a non-trivial amount of CPU time in hashmap.resize(). The hashmaps my application uses are initialized with a capacity equal to the number of elements that we put into it. Therefore, if we can reduce the resize() calls (by choosing a better initial capacity), then my application performance is improved.










share|improve this question



















  • 1




    That HashMap should only have to resize once, from an initial capacity of 128, to a final capacity of 256. How have you measured the number of times it has resized during your code execution?
    – Bobulous
    2 hours ago






  • 3




    It's still not clear how you've counted twenty-one calls to resize() during the execution of this specific chunk of code. Just compiling the code will surely not show you how many times the loop causes the actual HashMap object to be resized. What means have you used to count the actual number of times a resize occurs?
    – Bobulous
    2 hours ago






  • 1




    What is the problem of it being called 2 times? first = first creation; second = because load factor 0.75 - if you know how many elements, change load factor to 1.0 (or devide number of elements by the load factor to get initial capacity)
    – Carlos Heuberger
    2 hours ago







  • 1




    If you don't want it to resize twice (technically once), then either specify a larger intial size (~134), or create it with a higher load factor, which BTW, is something which you should not tweak without carefully considering described in its documentation.
    – Mark Rotteveel
    2 hours ago







  • 2




    And you don't need to know the internals intimately, you just need to read the API documentation. This behavior is part of its contract, it is not just an implementation detail.
    – Mark Rotteveel
    2 hours ago














up vote
6
down vote

favorite












Code speaks more than words, so:



final int size = 100;
Map<Integer, String> m = new HashMap<>(size);
for (int i = 0; i < size; i++) m.put(i, String.valueOf(i));


Why is the HashMap internally calling resize() 21 2 times! (Credit to Andreas for identifying that the JVM uses HashMaps internally, 19 of the 21 cals were from other processes)



Two resize() calls is still not acceptable for my application. I need to optimize this.



If I am a new java developer, my first intuitive guess at what "capacity" means in the HashMap constructor is that it is the capacity for the number of elements that I (the consumer of HashMap) am going to put into the Map. But this is not true.



If I want to optimize for CPU performance, then I need to know the internals of HashMap intimately enough to know exactly how sparse the HashMap bucket array needs to be. This is strange in my opinion. HashMap should implicitly do this for you. It is the whole point of encapsulation in OOP.



Note: I have confirmed that resize() is the bottleneck for my applications use case, so that is why my goal is to reduce the number of calls to resize().



The question:



If I know the exact quantity of entries I am going to put into the map beforehand. What capacity do I chose, to prevent any extra calls resize() operations? Something like size * 10? I would also like some background on why HashMap is designed this way.



Edit: I am getting asked a lot why this optimization is necassary. My application is spending a non-trivial amount of CPU time in hashmap.resize(). The hashmaps my application uses are initialized with a capacity equal to the number of elements that we put into it. Therefore, if we can reduce the resize() calls (by choosing a better initial capacity), then my application performance is improved.










share|improve this question



















  • 1




    That HashMap should only have to resize once, from an initial capacity of 128, to a final capacity of 256. How have you measured the number of times it has resized during your code execution?
    – Bobulous
    2 hours ago






  • 3




    It's still not clear how you've counted twenty-one calls to resize() during the execution of this specific chunk of code. Just compiling the code will surely not show you how many times the loop causes the actual HashMap object to be resized. What means have you used to count the actual number of times a resize occurs?
    – Bobulous
    2 hours ago






  • 1




    What is the problem of it being called 2 times? first = first creation; second = because load factor 0.75 - if you know how many elements, change load factor to 1.0 (or devide number of elements by the load factor to get initial capacity)
    – Carlos Heuberger
    2 hours ago







  • 1




    If you don't want it to resize twice (technically once), then either specify a larger intial size (~134), or create it with a higher load factor, which BTW, is something which you should not tweak without carefully considering described in its documentation.
    – Mark Rotteveel
    2 hours ago







  • 2




    And you don't need to know the internals intimately, you just need to read the API documentation. This behavior is part of its contract, it is not just an implementation detail.
    – Mark Rotteveel
    2 hours ago












up vote
6
down vote

favorite









up vote
6
down vote

favorite











Code speaks more than words, so:



final int size = 100;
Map<Integer, String> m = new HashMap<>(size);
for (int i = 0; i < size; i++) m.put(i, String.valueOf(i));


Why is the HashMap internally calling resize() 21 2 times! (Credit to Andreas for identifying that the JVM uses HashMaps internally, 19 of the 21 cals were from other processes)



Two resize() calls is still not acceptable for my application. I need to optimize this.



If I am a new java developer, my first intuitive guess at what "capacity" means in the HashMap constructor is that it is the capacity for the number of elements that I (the consumer of HashMap) am going to put into the Map. But this is not true.



If I want to optimize for CPU performance, then I need to know the internals of HashMap intimately enough to know exactly how sparse the HashMap bucket array needs to be. This is strange in my opinion. HashMap should implicitly do this for you. It is the whole point of encapsulation in OOP.



Note: I have confirmed that resize() is the bottleneck for my applications use case, so that is why my goal is to reduce the number of calls to resize().



The question:



If I know the exact quantity of entries I am going to put into the map beforehand. What capacity do I chose, to prevent any extra calls resize() operations? Something like size * 10? I would also like some background on why HashMap is designed this way.



Edit: I am getting asked a lot why this optimization is necassary. My application is spending a non-trivial amount of CPU time in hashmap.resize(). The hashmaps my application uses are initialized with a capacity equal to the number of elements that we put into it. Therefore, if we can reduce the resize() calls (by choosing a better initial capacity), then my application performance is improved.










share|improve this question















Code speaks more than words, so:



final int size = 100;
Map<Integer, String> m = new HashMap<>(size);
for (int i = 0; i < size; i++) m.put(i, String.valueOf(i));


Why is the HashMap internally calling resize() 21 2 times! (Credit to Andreas for identifying that the JVM uses HashMaps internally, 19 of the 21 cals were from other processes)



Two resize() calls is still not acceptable for my application. I need to optimize this.



If I am a new java developer, my first intuitive guess at what "capacity" means in the HashMap constructor is that it is the capacity for the number of elements that I (the consumer of HashMap) am going to put into the Map. But this is not true.



If I want to optimize for CPU performance, then I need to know the internals of HashMap intimately enough to know exactly how sparse the HashMap bucket array needs to be. This is strange in my opinion. HashMap should implicitly do this for you. It is the whole point of encapsulation in OOP.



Note: I have confirmed that resize() is the bottleneck for my applications use case, so that is why my goal is to reduce the number of calls to resize().



The question:



If I know the exact quantity of entries I am going to put into the map beforehand. What capacity do I chose, to prevent any extra calls resize() operations? Something like size * 10? I would also like some background on why HashMap is designed this way.



Edit: I am getting asked a lot why this optimization is necassary. My application is spending a non-trivial amount of CPU time in hashmap.resize(). The hashmaps my application uses are initialized with a capacity equal to the number of elements that we put into it. Therefore, if we can reduce the resize() calls (by choosing a better initial capacity), then my application performance is improved.







java optimization data-structures hashmap cpu






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 2 hours ago

























asked 2 hours ago









James Wierzba

5,94222454




5,94222454







  • 1




    That HashMap should only have to resize once, from an initial capacity of 128, to a final capacity of 256. How have you measured the number of times it has resized during your code execution?
    – Bobulous
    2 hours ago






  • 3




    It's still not clear how you've counted twenty-one calls to resize() during the execution of this specific chunk of code. Just compiling the code will surely not show you how many times the loop causes the actual HashMap object to be resized. What means have you used to count the actual number of times a resize occurs?
    – Bobulous
    2 hours ago






  • 1




    What is the problem of it being called 2 times? first = first creation; second = because load factor 0.75 - if you know how many elements, change load factor to 1.0 (or devide number of elements by the load factor to get initial capacity)
    – Carlos Heuberger
    2 hours ago







  • 1




    If you don't want it to resize twice (technically once), then either specify a larger intial size (~134), or create it with a higher load factor, which BTW, is something which you should not tweak without carefully considering described in its documentation.
    – Mark Rotteveel
    2 hours ago







  • 2




    And you don't need to know the internals intimately, you just need to read the API documentation. This behavior is part of its contract, it is not just an implementation detail.
    – Mark Rotteveel
    2 hours ago












  • 1




    That HashMap should only have to resize once, from an initial capacity of 128, to a final capacity of 256. How have you measured the number of times it has resized during your code execution?
    – Bobulous
    2 hours ago






  • 3




    It's still not clear how you've counted twenty-one calls to resize() during the execution of this specific chunk of code. Just compiling the code will surely not show you how many times the loop causes the actual HashMap object to be resized. What means have you used to count the actual number of times a resize occurs?
    – Bobulous
    2 hours ago






  • 1




    What is the problem of it being called 2 times? first = first creation; second = because load factor 0.75 - if you know how many elements, change load factor to 1.0 (or devide number of elements by the load factor to get initial capacity)
    – Carlos Heuberger
    2 hours ago







  • 1




    If you don't want it to resize twice (technically once), then either specify a larger intial size (~134), or create it with a higher load factor, which BTW, is something which you should not tweak without carefully considering described in its documentation.
    – Mark Rotteveel
    2 hours ago







  • 2




    And you don't need to know the internals intimately, you just need to read the API documentation. This behavior is part of its contract, it is not just an implementation detail.
    – Mark Rotteveel
    2 hours ago







1




1




That HashMap should only have to resize once, from an initial capacity of 128, to a final capacity of 256. How have you measured the number of times it has resized during your code execution?
– Bobulous
2 hours ago




That HashMap should only have to resize once, from an initial capacity of 128, to a final capacity of 256. How have you measured the number of times it has resized during your code execution?
– Bobulous
2 hours ago




3




3




It's still not clear how you've counted twenty-one calls to resize() during the execution of this specific chunk of code. Just compiling the code will surely not show you how many times the loop causes the actual HashMap object to be resized. What means have you used to count the actual number of times a resize occurs?
– Bobulous
2 hours ago




It's still not clear how you've counted twenty-one calls to resize() during the execution of this specific chunk of code. Just compiling the code will surely not show you how many times the loop causes the actual HashMap object to be resized. What means have you used to count the actual number of times a resize occurs?
– Bobulous
2 hours ago




1




1




What is the problem of it being called 2 times? first = first creation; second = because load factor 0.75 - if you know how many elements, change load factor to 1.0 (or devide number of elements by the load factor to get initial capacity)
– Carlos Heuberger
2 hours ago





What is the problem of it being called 2 times? first = first creation; second = because load factor 0.75 - if you know how many elements, change load factor to 1.0 (or devide number of elements by the load factor to get initial capacity)
– Carlos Heuberger
2 hours ago





1




1




If you don't want it to resize twice (technically once), then either specify a larger intial size (~134), or create it with a higher load factor, which BTW, is something which you should not tweak without carefully considering described in its documentation.
– Mark Rotteveel
2 hours ago





If you don't want it to resize twice (technically once), then either specify a larger intial size (~134), or create it with a higher load factor, which BTW, is something which you should not tweak without carefully considering described in its documentation.
– Mark Rotteveel
2 hours ago





2




2




And you don't need to know the internals intimately, you just need to read the API documentation. This behavior is part of its contract, it is not just an implementation detail.
– Mark Rotteveel
2 hours ago




And you don't need to know the internals intimately, you just need to read the API documentation. This behavior is part of its contract, it is not just an implementation detail.
– Mark Rotteveel
2 hours ago












5 Answers
5






active

oldest

votes

















up vote
6
down vote













The default Load Factor is 0.75, i.e. 3/4, which means that the internal hash table will be resized when 75 of the 100 values have been added.



FYI: resize() is only called twice. Once when the first value is added, and once when it gets to 75% full.



To prevent resizing, you need to ensure that the 100th value will not cause resizing, i.e. size <= capacity * 0.75 aka size <= capacity * 3/4 aka size * 4/3 <= capacity, so to be sure:



capacity = size * 4/3 + 1


With size = 100, that means capacity = 134.






share|improve this answer
















  • 1




    Andreas, the resize() function is being called 21 times. Now, perhaps the function is exiting before actually performing the resize, I will dig into that a little more.
    – James Wierzba
    2 hours ago






  • 2




    @Bobulous Correct. It's a memory saving optimization, since tests have shown that collection objects are often created without values ever being added, so collections are now created with minimal memory footprint, and the initial capacity isn't applied until first value is added.
    – Andreas
    2 hours ago







  • 2




    @JamesWierzba The resize() is only called once of twice, if you add 100 elements to a map initialized with a capacity of 100. If you create a small test program with nothing but the code in the question, beware that JVM startup creates other HashMap objects, so don't count resize() calls on all those other objects. Only count calls on the m instance. --- Tested on Java 10. What version are you running?
    – Andreas
    2 hours ago






  • 2




    @Andreas -- You are right! I confirmed that HashMap was called 19 times by whatever bootstrapping the JVM is doing, and 2 times by my code :)
    – James Wierzba
    2 hours ago






  • 1




    @JamesWierzba Yeah, I ran that test, and saw lots of resize(), but notice the call stack wasn't correct, so put trigger breakpoint on first line in my code, which reduced call count to 2.
    – Andreas
    2 hours ago

















up vote
3
down vote













When in doubt read the Documentation. The docs for HashMap explain the trade offs of initial capacity and load-factor quite well.



According to the documentation if initCapacity = (maxEntries / loadFactor) + 1 then no rehash operations will occur upon adding entries. Where in this case, the maxEntries is 100 as you specify and the loadFactor would be the default load factor of .75.



But aside from just setting an initial size to avoid a rehash (resize()) you should carefully read the documentation of HashMap in order to tune it properly, taking both initial capacity and load factor into account.



If you care about lookup cost more than space, then perhaps try with a lower loadFactors like .5 or lower if you like. In that case you would create your hash map with both parameters like this:



final float loadFactor = 0.5;
final int maxEntries = 100;
final int initCapacity = (int) maxEntries / loadFactor + 1;
new HashMap<>(initCapacity, loadFactor);



(emphasis mine)




An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal datastructures are rebuilt) so that the hash table has approximately twice the number of buckets.

...

As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.







share|improve this answer


















  • 1




    0.75 is also 3/4, so (maxEntries / 0.75) + 1 is better written as maxEntries * 4 / 3 + 1 (no parens needed), to do it using int math (no cast needed).
    – Andreas
    2 hours ago






  • 2




    @Andreas On the other hand, Java itself uses a float for the calculation.
    – Mark Rotteveel
    2 hours ago






  • 1




    @MarkRotteveel Sure, but new HashMap<>(size * 4/3 + 1) is shorter/simpler than new HashMap<>((int) (size / 0.75 + 1)), since you don't have to cast.
    – Andreas
    2 hours ago










  • @xtratic not entirely sure, but it seems you are confusing re-hash with re-size, these are different things
    – Eugene
    1 hour ago

















up vote
1
down vote













This is easy to prove:



private static <K, V> void debugResize(Map<K, V> map, K key, V value) throws Throwable 

Field table = map.getClass().getDeclaredField("table");
AccessibleObject.setAccessible(new Field table , true);
Object nodes = ((Object) table.get(map));

// first put
if (nodes == null)
map.put(key, value);
return;


map.put(key, value);

Field field = map.getClass().getDeclaredField("table");
AccessibleObject.setAccessible(new Field field , true);
int x = ((Object) field.get(map)).length;
if (nodes.length != x)
++currentResizeCalls;




And some usage:



static int currentResizeCalls = 0;

public static void main(String args) throws Throwable

int size = 100;
Map<Integer, String> m = new HashMap<>(size);
for (int i = 0; i < size; i++)
DeleteMe.debugResize(m, i, String.valueOf(i));


System.out.println(DeleteMe.currentResizeCalls);



I am only logging the times it takes when resize actually resizes, because the first call is initializing; as the documentation stipulates:




Initializes or doubles table size





The second of your points is far more interesting. A HashMap defines capacity, now what capacity is? And this is not that obvious:



For HashMap, capacity is the number of buckets before a resize happens, for ConcurrentHashMap it's the number of entries before resize is performed.



Thus, not to call resize internally, in case of HashMap use the formula:



(int)(1.0 + (long)initialCapacity / LOAD_FACTOR)


But this is by far not ideal, say you want 1024 entries without a resize, by using that formula you get to 1367 buckets, which internally are rounded to a power of two, thus 2048 - well, a lot more than you asked for.



For CHM specify the size directly. Easy to prove using one single modification in the previous code:



 // use CHM instead of HashMap
Map<Integer, String> m = new ConcurrentHashMap<>(size);


This will result in zero resizes that actually double the array. But sometimes even CHM internal code is confusing and requires little patching.






share|improve this answer





























    up vote
    0
    down vote













    If you know the exact number of elements that will be added to a HashMap (with default load factor of 0.75) then I propose using this method to determine the ideal initial capacity which will hold all of those elements without the need for a resize, and without the waste of capacity that you can get under certain conditions if using the (admittedly simpler) formula s * 4 / 3 + 1.



    public static int calculateDefaultHashMapIdealCapacity(int elementCount) 
    return -Math.floorDiv(-elementCount * 4, 3);



    If you write code to compare this method ("two") with the simpler formula ("one") then you'll find that they give identical capacities in all but a few cases. (See the table below.)



    The first number is the capacity calculated by the method, and the number in parentheses is the real capacity that HashMap will actually allocate (because behind the scenes HashMap always uses an array capacity which is a power of two). In most cases the real (power-of-two) capacity ends up being identical for both methods. But in a few cases the simpler method gives a larger capacity than is really needed, and the alternative method gives a capacity which won't need to be resized but will hold the entire element count ("s"). So the alternative method avoids wasting memory in these fringe cases.



    Unless you have no need to ever worry about memory allocation, I'd use this alternative capacity calculation instead. (Be warned that I've just come up with this in five minutes, so I recommend you do your own tests to confirm that the numbers are correct.)



    s one two waste
    1 2(2) 2(2) 0
    2 3(4) 3(4) 0
    3 5(8) 4(4) 4
    4 6(8) 6(8) 0
    5 7(8) 7(8) 0
    6 9(16) 8(8) 8
    7 10(16) 10(16) 0
    8 11(16) 11(16) 0
    9 13(16) 12(16) 0
    10 14(16) 14(16) 0
    11 15(16) 15(16) 0
    12 17(32) 16(16) 16
    13 18(32) 18(32) 0
    14 19(32) 19(32) 0
    15 21(32) 20(32) 0
    16 22(32) 22(32) 0
    17 23(32) 23(32) 0
    18 25(32) 24(32) 0
    19 26(32) 26(32) 0
    20 27(32) 27(32) 0
    21 29(32) 28(32) 0
    22 30(32) 30(32) 0
    23 31(32) 31(32) 0
    24 33(64) 32(32) 32
    25 34(64) 34(64) 0
    26 35(64) 35(64) 0
    27 37(64) 36(64) 0
    28 38(64) 38(64) 0
    29 39(64) 39(64) 0
    30 41(64) 40(64) 0
    31 42(64) 42(64) 0
    32 43(64) 43(64) 0
    33 45(64) 44(64) 0
    34 46(64) 46(64) 0
    35 47(64) 47(64) 0
    36 49(64) 48(64) 0
    37 50(64) 50(64) 0
    38 51(64) 51(64) 0
    39 53(64) 52(64) 0
    40 54(64) 54(64) 0
    41 55(64) 55(64) 0
    42 57(64) 56(64) 0
    43 58(64) 58(64) 0
    44 59(64) 59(64) 0
    45 61(64) 60(64) 0
    46 62(64) 62(64) 0
    47 63(64) 63(64) 0
    48 65(128) 64(64) 64
    49 66(128) 66(128) 0
    50 67(128) 67(128) 0





    share|improve this answer






















    • or, you know that the actual formula used from CHM that would apply to HashMap? long size = (long)(1.0 + (long)initialCapacity / loadFactor);
      – Eugene
      1 hour ago











    • Yes, but that's the same as the simple formula s * 4 / 3 for a default HashMap with load factor 0.75. And for element count 24 that formula will give 33 which causes a real capacity of 64. But a real capacity of 32 is able to hold 24 elements without a resize. So the alternative method, which results in real capacity 32, is surely preferable?
      – Bobulous
      1 hour ago

















    up vote
    0
    down vote













    • Resizing is an essential part of a hashmap's working to keep the load factor low.


    • Load factor needs to be low because the hashing function of the hashmap will inevitably start having collisions when hashmap's buckets are maxing out. The collisions can start from second entry itself if your entries hash to an occupied bucket every time.



    However, in your particular case, collision is not a problem, only hashmap's resizing is.



    Hashmap is generally resized at 0.75 ( = 3/4 in the fraction) load factor. Using this information, you can set up a hashmap of 4/3 times the count of the entries you need to store.




    Regarding your dissent with breaking encapsulation:



    I agree with you It is debatable.



    You can say that it would be better if capacity represented the count of entries until which the resize does not happen, rather than the count of maximum possible entries that could be stored in the hashmap - and I tend to agree with you.



    But someone else could also argue the other side as to why the hashmap is occupying more space than it was instructed to reserve.



    The solution to this problem lies in Java's domain. Java can provide two constructors which are explicit enough about what they will do and then the developers can have their pick during their hashmap's initialization.






    share|improve this answer




















      Your Answer





      StackExchange.ifUsing("editor", function ()
      StackExchange.using("externalEditor", function ()
      StackExchange.using("snippets", function ()
      StackExchange.snippets.init();
      );
      );
      , "code-snippets");

      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "1"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      convertImagesToLinks: true,
      noModals: false,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );













       

      draft saved


      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52671362%2fwhy-does-hashmap-resize-again-when-specifying-a-precise-capacity%23new-answer', 'question_page');

      );

      Post as a guest






























      5 Answers
      5






      active

      oldest

      votes








      5 Answers
      5






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      6
      down vote













      The default Load Factor is 0.75, i.e. 3/4, which means that the internal hash table will be resized when 75 of the 100 values have been added.



      FYI: resize() is only called twice. Once when the first value is added, and once when it gets to 75% full.



      To prevent resizing, you need to ensure that the 100th value will not cause resizing, i.e. size <= capacity * 0.75 aka size <= capacity * 3/4 aka size * 4/3 <= capacity, so to be sure:



      capacity = size * 4/3 + 1


      With size = 100, that means capacity = 134.






      share|improve this answer
















      • 1




        Andreas, the resize() function is being called 21 times. Now, perhaps the function is exiting before actually performing the resize, I will dig into that a little more.
        – James Wierzba
        2 hours ago






      • 2




        @Bobulous Correct. It's a memory saving optimization, since tests have shown that collection objects are often created without values ever being added, so collections are now created with minimal memory footprint, and the initial capacity isn't applied until first value is added.
        – Andreas
        2 hours ago







      • 2




        @JamesWierzba The resize() is only called once of twice, if you add 100 elements to a map initialized with a capacity of 100. If you create a small test program with nothing but the code in the question, beware that JVM startup creates other HashMap objects, so don't count resize() calls on all those other objects. Only count calls on the m instance. --- Tested on Java 10. What version are you running?
        – Andreas
        2 hours ago






      • 2




        @Andreas -- You are right! I confirmed that HashMap was called 19 times by whatever bootstrapping the JVM is doing, and 2 times by my code :)
        – James Wierzba
        2 hours ago






      • 1




        @JamesWierzba Yeah, I ran that test, and saw lots of resize(), but notice the call stack wasn't correct, so put trigger breakpoint on first line in my code, which reduced call count to 2.
        – Andreas
        2 hours ago














      up vote
      6
      down vote













      The default Load Factor is 0.75, i.e. 3/4, which means that the internal hash table will be resized when 75 of the 100 values have been added.



      FYI: resize() is only called twice. Once when the first value is added, and once when it gets to 75% full.



      To prevent resizing, you need to ensure that the 100th value will not cause resizing, i.e. size <= capacity * 0.75 aka size <= capacity * 3/4 aka size * 4/3 <= capacity, so to be sure:



      capacity = size * 4/3 + 1


      With size = 100, that means capacity = 134.






      share|improve this answer
















      • 1




        Andreas, the resize() function is being called 21 times. Now, perhaps the function is exiting before actually performing the resize, I will dig into that a little more.
        – James Wierzba
        2 hours ago






      • 2




        @Bobulous Correct. It's a memory saving optimization, since tests have shown that collection objects are often created without values ever being added, so collections are now created with minimal memory footprint, and the initial capacity isn't applied until first value is added.
        – Andreas
        2 hours ago







      • 2




        @JamesWierzba The resize() is only called once of twice, if you add 100 elements to a map initialized with a capacity of 100. If you create a small test program with nothing but the code in the question, beware that JVM startup creates other HashMap objects, so don't count resize() calls on all those other objects. Only count calls on the m instance. --- Tested on Java 10. What version are you running?
        – Andreas
        2 hours ago






      • 2




        @Andreas -- You are right! I confirmed that HashMap was called 19 times by whatever bootstrapping the JVM is doing, and 2 times by my code :)
        – James Wierzba
        2 hours ago






      • 1




        @JamesWierzba Yeah, I ran that test, and saw lots of resize(), but notice the call stack wasn't correct, so put trigger breakpoint on first line in my code, which reduced call count to 2.
        – Andreas
        2 hours ago












      up vote
      6
      down vote










      up vote
      6
      down vote









      The default Load Factor is 0.75, i.e. 3/4, which means that the internal hash table will be resized when 75 of the 100 values have been added.



      FYI: resize() is only called twice. Once when the first value is added, and once when it gets to 75% full.



      To prevent resizing, you need to ensure that the 100th value will not cause resizing, i.e. size <= capacity * 0.75 aka size <= capacity * 3/4 aka size * 4/3 <= capacity, so to be sure:



      capacity = size * 4/3 + 1


      With size = 100, that means capacity = 134.






      share|improve this answer












      The default Load Factor is 0.75, i.e. 3/4, which means that the internal hash table will be resized when 75 of the 100 values have been added.



      FYI: resize() is only called twice. Once when the first value is added, and once when it gets to 75% full.



      To prevent resizing, you need to ensure that the 100th value will not cause resizing, i.e. size <= capacity * 0.75 aka size <= capacity * 3/4 aka size * 4/3 <= capacity, so to be sure:



      capacity = size * 4/3 + 1


      With size = 100, that means capacity = 134.







      share|improve this answer












      share|improve this answer



      share|improve this answer










      answered 2 hours ago









      Andreas

      69.3k450106




      69.3k450106







      • 1




        Andreas, the resize() function is being called 21 times. Now, perhaps the function is exiting before actually performing the resize, I will dig into that a little more.
        – James Wierzba
        2 hours ago






      • 2




        @Bobulous Correct. It's a memory saving optimization, since tests have shown that collection objects are often created without values ever being added, so collections are now created with minimal memory footprint, and the initial capacity isn't applied until first value is added.
        – Andreas
        2 hours ago







      • 2




        @JamesWierzba The resize() is only called once of twice, if you add 100 elements to a map initialized with a capacity of 100. If you create a small test program with nothing but the code in the question, beware that JVM startup creates other HashMap objects, so don't count resize() calls on all those other objects. Only count calls on the m instance. --- Tested on Java 10. What version are you running?
        – Andreas
        2 hours ago






      • 2




        @Andreas -- You are right! I confirmed that HashMap was called 19 times by whatever bootstrapping the JVM is doing, and 2 times by my code :)
        – James Wierzba
        2 hours ago






      • 1




        @JamesWierzba Yeah, I ran that test, and saw lots of resize(), but notice the call stack wasn't correct, so put trigger breakpoint on first line in my code, which reduced call count to 2.
        – Andreas
        2 hours ago












      • 1




        Andreas, the resize() function is being called 21 times. Now, perhaps the function is exiting before actually performing the resize, I will dig into that a little more.
        – James Wierzba
        2 hours ago






      • 2




        @Bobulous Correct. It's a memory saving optimization, since tests have shown that collection objects are often created without values ever being added, so collections are now created with minimal memory footprint, and the initial capacity isn't applied until first value is added.
        – Andreas
        2 hours ago







      • 2




        @JamesWierzba The resize() is only called once of twice, if you add 100 elements to a map initialized with a capacity of 100. If you create a small test program with nothing but the code in the question, beware that JVM startup creates other HashMap objects, so don't count resize() calls on all those other objects. Only count calls on the m instance. --- Tested on Java 10. What version are you running?
        – Andreas
        2 hours ago






      • 2




        @Andreas -- You are right! I confirmed that HashMap was called 19 times by whatever bootstrapping the JVM is doing, and 2 times by my code :)
        – James Wierzba
        2 hours ago






      • 1




        @JamesWierzba Yeah, I ran that test, and saw lots of resize(), but notice the call stack wasn't correct, so put trigger breakpoint on first line in my code, which reduced call count to 2.
        – Andreas
        2 hours ago







      1




      1




      Andreas, the resize() function is being called 21 times. Now, perhaps the function is exiting before actually performing the resize, I will dig into that a little more.
      – James Wierzba
      2 hours ago




      Andreas, the resize() function is being called 21 times. Now, perhaps the function is exiting before actually performing the resize, I will dig into that a little more.
      – James Wierzba
      2 hours ago




      2




      2




      @Bobulous Correct. It's a memory saving optimization, since tests have shown that collection objects are often created without values ever being added, so collections are now created with minimal memory footprint, and the initial capacity isn't applied until first value is added.
      – Andreas
      2 hours ago





      @Bobulous Correct. It's a memory saving optimization, since tests have shown that collection objects are often created without values ever being added, so collections are now created with minimal memory footprint, and the initial capacity isn't applied until first value is added.
      – Andreas
      2 hours ago





      2




      2




      @JamesWierzba The resize() is only called once of twice, if you add 100 elements to a map initialized with a capacity of 100. If you create a small test program with nothing but the code in the question, beware that JVM startup creates other HashMap objects, so don't count resize() calls on all those other objects. Only count calls on the m instance. --- Tested on Java 10. What version are you running?
      – Andreas
      2 hours ago




      @JamesWierzba The resize() is only called once of twice, if you add 100 elements to a map initialized with a capacity of 100. If you create a small test program with nothing but the code in the question, beware that JVM startup creates other HashMap objects, so don't count resize() calls on all those other objects. Only count calls on the m instance. --- Tested on Java 10. What version are you running?
      – Andreas
      2 hours ago




      2




      2




      @Andreas -- You are right! I confirmed that HashMap was called 19 times by whatever bootstrapping the JVM is doing, and 2 times by my code :)
      – James Wierzba
      2 hours ago




      @Andreas -- You are right! I confirmed that HashMap was called 19 times by whatever bootstrapping the JVM is doing, and 2 times by my code :)
      – James Wierzba
      2 hours ago




      1




      1




      @JamesWierzba Yeah, I ran that test, and saw lots of resize(), but notice the call stack wasn't correct, so put trigger breakpoint on first line in my code, which reduced call count to 2.
      – Andreas
      2 hours ago




      @JamesWierzba Yeah, I ran that test, and saw lots of resize(), but notice the call stack wasn't correct, so put trigger breakpoint on first line in my code, which reduced call count to 2.
      – Andreas
      2 hours ago












      up vote
      3
      down vote













      When in doubt read the Documentation. The docs for HashMap explain the trade offs of initial capacity and load-factor quite well.



      According to the documentation if initCapacity = (maxEntries / loadFactor) + 1 then no rehash operations will occur upon adding entries. Where in this case, the maxEntries is 100 as you specify and the loadFactor would be the default load factor of .75.



      But aside from just setting an initial size to avoid a rehash (resize()) you should carefully read the documentation of HashMap in order to tune it properly, taking both initial capacity and load factor into account.



      If you care about lookup cost more than space, then perhaps try with a lower loadFactors like .5 or lower if you like. In that case you would create your hash map with both parameters like this:



      final float loadFactor = 0.5;
      final int maxEntries = 100;
      final int initCapacity = (int) maxEntries / loadFactor + 1;
      new HashMap<>(initCapacity, loadFactor);



      (emphasis mine)




      An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal datastructures are rebuilt) so that the hash table has approximately twice the number of buckets.

      ...

      As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.







      share|improve this answer


















      • 1




        0.75 is also 3/4, so (maxEntries / 0.75) + 1 is better written as maxEntries * 4 / 3 + 1 (no parens needed), to do it using int math (no cast needed).
        – Andreas
        2 hours ago






      • 2




        @Andreas On the other hand, Java itself uses a float for the calculation.
        – Mark Rotteveel
        2 hours ago






      • 1




        @MarkRotteveel Sure, but new HashMap<>(size * 4/3 + 1) is shorter/simpler than new HashMap<>((int) (size / 0.75 + 1)), since you don't have to cast.
        – Andreas
        2 hours ago










      • @xtratic not entirely sure, but it seems you are confusing re-hash with re-size, these are different things
        – Eugene
        1 hour ago














      up vote
      3
      down vote













      When in doubt read the Documentation. The docs for HashMap explain the trade offs of initial capacity and load-factor quite well.



      According to the documentation if initCapacity = (maxEntries / loadFactor) + 1 then no rehash operations will occur upon adding entries. Where in this case, the maxEntries is 100 as you specify and the loadFactor would be the default load factor of .75.



      But aside from just setting an initial size to avoid a rehash (resize()) you should carefully read the documentation of HashMap in order to tune it properly, taking both initial capacity and load factor into account.



      If you care about lookup cost more than space, then perhaps try with a lower loadFactors like .5 or lower if you like. In that case you would create your hash map with both parameters like this:



      final float loadFactor = 0.5;
      final int maxEntries = 100;
      final int initCapacity = (int) maxEntries / loadFactor + 1;
      new HashMap<>(initCapacity, loadFactor);



      (emphasis mine)




      An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal datastructures are rebuilt) so that the hash table has approximately twice the number of buckets.

      ...

      As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.







      share|improve this answer


















      • 1




        0.75 is also 3/4, so (maxEntries / 0.75) + 1 is better written as maxEntries * 4 / 3 + 1 (no parens needed), to do it using int math (no cast needed).
        – Andreas
        2 hours ago






      • 2




        @Andreas On the other hand, Java itself uses a float for the calculation.
        – Mark Rotteveel
        2 hours ago






      • 1




        @MarkRotteveel Sure, but new HashMap<>(size * 4/3 + 1) is shorter/simpler than new HashMap<>((int) (size / 0.75 + 1)), since you don't have to cast.
        – Andreas
        2 hours ago










      • @xtratic not entirely sure, but it seems you are confusing re-hash with re-size, these are different things
        – Eugene
        1 hour ago












      up vote
      3
      down vote










      up vote
      3
      down vote









      When in doubt read the Documentation. The docs for HashMap explain the trade offs of initial capacity and load-factor quite well.



      According to the documentation if initCapacity = (maxEntries / loadFactor) + 1 then no rehash operations will occur upon adding entries. Where in this case, the maxEntries is 100 as you specify and the loadFactor would be the default load factor of .75.



      But aside from just setting an initial size to avoid a rehash (resize()) you should carefully read the documentation of HashMap in order to tune it properly, taking both initial capacity and load factor into account.



      If you care about lookup cost more than space, then perhaps try with a lower loadFactors like .5 or lower if you like. In that case you would create your hash map with both parameters like this:



      final float loadFactor = 0.5;
      final int maxEntries = 100;
      final int initCapacity = (int) maxEntries / loadFactor + 1;
      new HashMap<>(initCapacity, loadFactor);



      (emphasis mine)




      An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal datastructures are rebuilt) so that the hash table has approximately twice the number of buckets.

      ...

      As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.







      share|improve this answer














      When in doubt read the Documentation. The docs for HashMap explain the trade offs of initial capacity and load-factor quite well.



      According to the documentation if initCapacity = (maxEntries / loadFactor) + 1 then no rehash operations will occur upon adding entries. Where in this case, the maxEntries is 100 as you specify and the loadFactor would be the default load factor of .75.



      But aside from just setting an initial size to avoid a rehash (resize()) you should carefully read the documentation of HashMap in order to tune it properly, taking both initial capacity and load factor into account.



      If you care about lookup cost more than space, then perhaps try with a lower loadFactors like .5 or lower if you like. In that case you would create your hash map with both parameters like this:



      final float loadFactor = 0.5;
      final int maxEntries = 100;
      final int initCapacity = (int) maxEntries / loadFactor + 1;
      new HashMap<>(initCapacity, loadFactor);



      (emphasis mine)




      An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal datastructures are rebuilt) so that the hash table has approximately twice the number of buckets.

      ...

      As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.








      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited 2 hours ago

























      answered 2 hours ago









      xtratic

      1,4241718




      1,4241718







      • 1




        0.75 is also 3/4, so (maxEntries / 0.75) + 1 is better written as maxEntries * 4 / 3 + 1 (no parens needed), to do it using int math (no cast needed).
        – Andreas
        2 hours ago






      • 2




        @Andreas On the other hand, Java itself uses a float for the calculation.
        – Mark Rotteveel
        2 hours ago






      • 1




        @MarkRotteveel Sure, but new HashMap<>(size * 4/3 + 1) is shorter/simpler than new HashMap<>((int) (size / 0.75 + 1)), since you don't have to cast.
        – Andreas
        2 hours ago










      • @xtratic not entirely sure, but it seems you are confusing re-hash with re-size, these are different things
        – Eugene
        1 hour ago












      • 1




        0.75 is also 3/4, so (maxEntries / 0.75) + 1 is better written as maxEntries * 4 / 3 + 1 (no parens needed), to do it using int math (no cast needed).
        – Andreas
        2 hours ago






      • 2




        @Andreas On the other hand, Java itself uses a float for the calculation.
        – Mark Rotteveel
        2 hours ago






      • 1




        @MarkRotteveel Sure, but new HashMap<>(size * 4/3 + 1) is shorter/simpler than new HashMap<>((int) (size / 0.75 + 1)), since you don't have to cast.
        – Andreas
        2 hours ago










      • @xtratic not entirely sure, but it seems you are confusing re-hash with re-size, these are different things
        – Eugene
        1 hour ago







      1




      1




      0.75 is also 3/4, so (maxEntries / 0.75) + 1 is better written as maxEntries * 4 / 3 + 1 (no parens needed), to do it using int math (no cast needed).
      – Andreas
      2 hours ago




      0.75 is also 3/4, so (maxEntries / 0.75) + 1 is better written as maxEntries * 4 / 3 + 1 (no parens needed), to do it using int math (no cast needed).
      – Andreas
      2 hours ago




      2




      2




      @Andreas On the other hand, Java itself uses a float for the calculation.
      – Mark Rotteveel
      2 hours ago




      @Andreas On the other hand, Java itself uses a float for the calculation.
      – Mark Rotteveel
      2 hours ago




      1




      1




      @MarkRotteveel Sure, but new HashMap<>(size * 4/3 + 1) is shorter/simpler than new HashMap<>((int) (size / 0.75 + 1)), since you don't have to cast.
      – Andreas
      2 hours ago




      @MarkRotteveel Sure, but new HashMap<>(size * 4/3 + 1) is shorter/simpler than new HashMap<>((int) (size / 0.75 + 1)), since you don't have to cast.
      – Andreas
      2 hours ago












      @xtratic not entirely sure, but it seems you are confusing re-hash with re-size, these are different things
      – Eugene
      1 hour ago




      @xtratic not entirely sure, but it seems you are confusing re-hash with re-size, these are different things
      – Eugene
      1 hour ago










      up vote
      1
      down vote













      This is easy to prove:



      private static <K, V> void debugResize(Map<K, V> map, K key, V value) throws Throwable 

      Field table = map.getClass().getDeclaredField("table");
      AccessibleObject.setAccessible(new Field table , true);
      Object nodes = ((Object) table.get(map));

      // first put
      if (nodes == null)
      map.put(key, value);
      return;


      map.put(key, value);

      Field field = map.getClass().getDeclaredField("table");
      AccessibleObject.setAccessible(new Field field , true);
      int x = ((Object) field.get(map)).length;
      if (nodes.length != x)
      ++currentResizeCalls;




      And some usage:



      static int currentResizeCalls = 0;

      public static void main(String args) throws Throwable

      int size = 100;
      Map<Integer, String> m = new HashMap<>(size);
      for (int i = 0; i < size; i++)
      DeleteMe.debugResize(m, i, String.valueOf(i));


      System.out.println(DeleteMe.currentResizeCalls);



      I am only logging the times it takes when resize actually resizes, because the first call is initializing; as the documentation stipulates:




      Initializes or doubles table size





      The second of your points is far more interesting. A HashMap defines capacity, now what capacity is? And this is not that obvious:



      For HashMap, capacity is the number of buckets before a resize happens, for ConcurrentHashMap it's the number of entries before resize is performed.



      Thus, not to call resize internally, in case of HashMap use the formula:



      (int)(1.0 + (long)initialCapacity / LOAD_FACTOR)


      But this is by far not ideal, say you want 1024 entries without a resize, by using that formula you get to 1367 buckets, which internally are rounded to a power of two, thus 2048 - well, a lot more than you asked for.



      For CHM specify the size directly. Easy to prove using one single modification in the previous code:



       // use CHM instead of HashMap
      Map<Integer, String> m = new ConcurrentHashMap<>(size);


      This will result in zero resizes that actually double the array. But sometimes even CHM internal code is confusing and requires little patching.






      share|improve this answer


























        up vote
        1
        down vote













        This is easy to prove:



        private static <K, V> void debugResize(Map<K, V> map, K key, V value) throws Throwable 

        Field table = map.getClass().getDeclaredField("table");
        AccessibleObject.setAccessible(new Field table , true);
        Object nodes = ((Object) table.get(map));

        // first put
        if (nodes == null)
        map.put(key, value);
        return;


        map.put(key, value);

        Field field = map.getClass().getDeclaredField("table");
        AccessibleObject.setAccessible(new Field field , true);
        int x = ((Object) field.get(map)).length;
        if (nodes.length != x)
        ++currentResizeCalls;




        And some usage:



        static int currentResizeCalls = 0;

        public static void main(String args) throws Throwable

        int size = 100;
        Map<Integer, String> m = new HashMap<>(size);
        for (int i = 0; i < size; i++)
        DeleteMe.debugResize(m, i, String.valueOf(i));


        System.out.println(DeleteMe.currentResizeCalls);



        I am only logging the times it takes when resize actually resizes, because the first call is initializing; as the documentation stipulates:




        Initializes or doubles table size





        The second of your points is far more interesting. A HashMap defines capacity, now what capacity is? And this is not that obvious:



        For HashMap, capacity is the number of buckets before a resize happens, for ConcurrentHashMap it's the number of entries before resize is performed.



        Thus, not to call resize internally, in case of HashMap use the formula:



        (int)(1.0 + (long)initialCapacity / LOAD_FACTOR)


        But this is by far not ideal, say you want 1024 entries without a resize, by using that formula you get to 1367 buckets, which internally are rounded to a power of two, thus 2048 - well, a lot more than you asked for.



        For CHM specify the size directly. Easy to prove using one single modification in the previous code:



         // use CHM instead of HashMap
        Map<Integer, String> m = new ConcurrentHashMap<>(size);


        This will result in zero resizes that actually double the array. But sometimes even CHM internal code is confusing and requires little patching.






        share|improve this answer
























          up vote
          1
          down vote










          up vote
          1
          down vote









          This is easy to prove:



          private static <K, V> void debugResize(Map<K, V> map, K key, V value) throws Throwable 

          Field table = map.getClass().getDeclaredField("table");
          AccessibleObject.setAccessible(new Field table , true);
          Object nodes = ((Object) table.get(map));

          // first put
          if (nodes == null)
          map.put(key, value);
          return;


          map.put(key, value);

          Field field = map.getClass().getDeclaredField("table");
          AccessibleObject.setAccessible(new Field field , true);
          int x = ((Object) field.get(map)).length;
          if (nodes.length != x)
          ++currentResizeCalls;




          And some usage:



          static int currentResizeCalls = 0;

          public static void main(String args) throws Throwable

          int size = 100;
          Map<Integer, String> m = new HashMap<>(size);
          for (int i = 0; i < size; i++)
          DeleteMe.debugResize(m, i, String.valueOf(i));


          System.out.println(DeleteMe.currentResizeCalls);



          I am only logging the times it takes when resize actually resizes, because the first call is initializing; as the documentation stipulates:




          Initializes or doubles table size





          The second of your points is far more interesting. A HashMap defines capacity, now what capacity is? And this is not that obvious:



          For HashMap, capacity is the number of buckets before a resize happens, for ConcurrentHashMap it's the number of entries before resize is performed.



          Thus, not to call resize internally, in case of HashMap use the formula:



          (int)(1.0 + (long)initialCapacity / LOAD_FACTOR)


          But this is by far not ideal, say you want 1024 entries without a resize, by using that formula you get to 1367 buckets, which internally are rounded to a power of two, thus 2048 - well, a lot more than you asked for.



          For CHM specify the size directly. Easy to prove using one single modification in the previous code:



           // use CHM instead of HashMap
          Map<Integer, String> m = new ConcurrentHashMap<>(size);


          This will result in zero resizes that actually double the array. But sometimes even CHM internal code is confusing and requires little patching.






          share|improve this answer














          This is easy to prove:



          private static <K, V> void debugResize(Map<K, V> map, K key, V value) throws Throwable 

          Field table = map.getClass().getDeclaredField("table");
          AccessibleObject.setAccessible(new Field table , true);
          Object nodes = ((Object) table.get(map));

          // first put
          if (nodes == null)
          map.put(key, value);
          return;


          map.put(key, value);

          Field field = map.getClass().getDeclaredField("table");
          AccessibleObject.setAccessible(new Field field , true);
          int x = ((Object) field.get(map)).length;
          if (nodes.length != x)
          ++currentResizeCalls;




          And some usage:



          static int currentResizeCalls = 0;

          public static void main(String args) throws Throwable

          int size = 100;
          Map<Integer, String> m = new HashMap<>(size);
          for (int i = 0; i < size; i++)
          DeleteMe.debugResize(m, i, String.valueOf(i));


          System.out.println(DeleteMe.currentResizeCalls);



          I am only logging the times it takes when resize actually resizes, because the first call is initializing; as the documentation stipulates:




          Initializes or doubles table size





          The second of your points is far more interesting. A HashMap defines capacity, now what capacity is? And this is not that obvious:



          For HashMap, capacity is the number of buckets before a resize happens, for ConcurrentHashMap it's the number of entries before resize is performed.



          Thus, not to call resize internally, in case of HashMap use the formula:



          (int)(1.0 + (long)initialCapacity / LOAD_FACTOR)


          But this is by far not ideal, say you want 1024 entries without a resize, by using that formula you get to 1367 buckets, which internally are rounded to a power of two, thus 2048 - well, a lot more than you asked for.



          For CHM specify the size directly. Easy to prove using one single modification in the previous code:



           // use CHM instead of HashMap
          Map<Integer, String> m = new ConcurrentHashMap<>(size);


          This will result in zero resizes that actually double the array. But sometimes even CHM internal code is confusing and requires little patching.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited 1 hour ago

























          answered 2 hours ago









          Eugene

          62.2k986144




          62.2k986144




















              up vote
              0
              down vote













              If you know the exact number of elements that will be added to a HashMap (with default load factor of 0.75) then I propose using this method to determine the ideal initial capacity which will hold all of those elements without the need for a resize, and without the waste of capacity that you can get under certain conditions if using the (admittedly simpler) formula s * 4 / 3 + 1.



              public static int calculateDefaultHashMapIdealCapacity(int elementCount) 
              return -Math.floorDiv(-elementCount * 4, 3);



              If you write code to compare this method ("two") with the simpler formula ("one") then you'll find that they give identical capacities in all but a few cases. (See the table below.)



              The first number is the capacity calculated by the method, and the number in parentheses is the real capacity that HashMap will actually allocate (because behind the scenes HashMap always uses an array capacity which is a power of two). In most cases the real (power-of-two) capacity ends up being identical for both methods. But in a few cases the simpler method gives a larger capacity than is really needed, and the alternative method gives a capacity which won't need to be resized but will hold the entire element count ("s"). So the alternative method avoids wasting memory in these fringe cases.



              Unless you have no need to ever worry about memory allocation, I'd use this alternative capacity calculation instead. (Be warned that I've just come up with this in five minutes, so I recommend you do your own tests to confirm that the numbers are correct.)



              s one two waste
              1 2(2) 2(2) 0
              2 3(4) 3(4) 0
              3 5(8) 4(4) 4
              4 6(8) 6(8) 0
              5 7(8) 7(8) 0
              6 9(16) 8(8) 8
              7 10(16) 10(16) 0
              8 11(16) 11(16) 0
              9 13(16) 12(16) 0
              10 14(16) 14(16) 0
              11 15(16) 15(16) 0
              12 17(32) 16(16) 16
              13 18(32) 18(32) 0
              14 19(32) 19(32) 0
              15 21(32) 20(32) 0
              16 22(32) 22(32) 0
              17 23(32) 23(32) 0
              18 25(32) 24(32) 0
              19 26(32) 26(32) 0
              20 27(32) 27(32) 0
              21 29(32) 28(32) 0
              22 30(32) 30(32) 0
              23 31(32) 31(32) 0
              24 33(64) 32(32) 32
              25 34(64) 34(64) 0
              26 35(64) 35(64) 0
              27 37(64) 36(64) 0
              28 38(64) 38(64) 0
              29 39(64) 39(64) 0
              30 41(64) 40(64) 0
              31 42(64) 42(64) 0
              32 43(64) 43(64) 0
              33 45(64) 44(64) 0
              34 46(64) 46(64) 0
              35 47(64) 47(64) 0
              36 49(64) 48(64) 0
              37 50(64) 50(64) 0
              38 51(64) 51(64) 0
              39 53(64) 52(64) 0
              40 54(64) 54(64) 0
              41 55(64) 55(64) 0
              42 57(64) 56(64) 0
              43 58(64) 58(64) 0
              44 59(64) 59(64) 0
              45 61(64) 60(64) 0
              46 62(64) 62(64) 0
              47 63(64) 63(64) 0
              48 65(128) 64(64) 64
              49 66(128) 66(128) 0
              50 67(128) 67(128) 0





              share|improve this answer






















              • or, you know that the actual formula used from CHM that would apply to HashMap? long size = (long)(1.0 + (long)initialCapacity / loadFactor);
                – Eugene
                1 hour ago











              • Yes, but that's the same as the simple formula s * 4 / 3 for a default HashMap with load factor 0.75. And for element count 24 that formula will give 33 which causes a real capacity of 64. But a real capacity of 32 is able to hold 24 elements without a resize. So the alternative method, which results in real capacity 32, is surely preferable?
                – Bobulous
                1 hour ago














              up vote
              0
              down vote













              If you know the exact number of elements that will be added to a HashMap (with default load factor of 0.75) then I propose using this method to determine the ideal initial capacity which will hold all of those elements without the need for a resize, and without the waste of capacity that you can get under certain conditions if using the (admittedly simpler) formula s * 4 / 3 + 1.



              public static int calculateDefaultHashMapIdealCapacity(int elementCount) 
              return -Math.floorDiv(-elementCount * 4, 3);



              If you write code to compare this method ("two") with the simpler formula ("one") then you'll find that they give identical capacities in all but a few cases. (See the table below.)



              The first number is the capacity calculated by the method, and the number in parentheses is the real capacity that HashMap will actually allocate (because behind the scenes HashMap always uses an array capacity which is a power of two). In most cases the real (power-of-two) capacity ends up being identical for both methods. But in a few cases the simpler method gives a larger capacity than is really needed, and the alternative method gives a capacity which won't need to be resized but will hold the entire element count ("s"). So the alternative method avoids wasting memory in these fringe cases.



              Unless you have no need to ever worry about memory allocation, I'd use this alternative capacity calculation instead. (Be warned that I've just come up with this in five minutes, so I recommend you do your own tests to confirm that the numbers are correct.)



              s one two waste
              1 2(2) 2(2) 0
              2 3(4) 3(4) 0
              3 5(8) 4(4) 4
              4 6(8) 6(8) 0
              5 7(8) 7(8) 0
              6 9(16) 8(8) 8
              7 10(16) 10(16) 0
              8 11(16) 11(16) 0
              9 13(16) 12(16) 0
              10 14(16) 14(16) 0
              11 15(16) 15(16) 0
              12 17(32) 16(16) 16
              13 18(32) 18(32) 0
              14 19(32) 19(32) 0
              15 21(32) 20(32) 0
              16 22(32) 22(32) 0
              17 23(32) 23(32) 0
              18 25(32) 24(32) 0
              19 26(32) 26(32) 0
              20 27(32) 27(32) 0
              21 29(32) 28(32) 0
              22 30(32) 30(32) 0
              23 31(32) 31(32) 0
              24 33(64) 32(32) 32
              25 34(64) 34(64) 0
              26 35(64) 35(64) 0
              27 37(64) 36(64) 0
              28 38(64) 38(64) 0
              29 39(64) 39(64) 0
              30 41(64) 40(64) 0
              31 42(64) 42(64) 0
              32 43(64) 43(64) 0
              33 45(64) 44(64) 0
              34 46(64) 46(64) 0
              35 47(64) 47(64) 0
              36 49(64) 48(64) 0
              37 50(64) 50(64) 0
              38 51(64) 51(64) 0
              39 53(64) 52(64) 0
              40 54(64) 54(64) 0
              41 55(64) 55(64) 0
              42 57(64) 56(64) 0
              43 58(64) 58(64) 0
              44 59(64) 59(64) 0
              45 61(64) 60(64) 0
              46 62(64) 62(64) 0
              47 63(64) 63(64) 0
              48 65(128) 64(64) 64
              49 66(128) 66(128) 0
              50 67(128) 67(128) 0





              share|improve this answer






















              • or, you know that the actual formula used from CHM that would apply to HashMap? long size = (long)(1.0 + (long)initialCapacity / loadFactor);
                – Eugene
                1 hour ago











              • Yes, but that's the same as the simple formula s * 4 / 3 for a default HashMap with load factor 0.75. And for element count 24 that formula will give 33 which causes a real capacity of 64. But a real capacity of 32 is able to hold 24 elements without a resize. So the alternative method, which results in real capacity 32, is surely preferable?
                – Bobulous
                1 hour ago












              up vote
              0
              down vote










              up vote
              0
              down vote









              If you know the exact number of elements that will be added to a HashMap (with default load factor of 0.75) then I propose using this method to determine the ideal initial capacity which will hold all of those elements without the need for a resize, and without the waste of capacity that you can get under certain conditions if using the (admittedly simpler) formula s * 4 / 3 + 1.



              public static int calculateDefaultHashMapIdealCapacity(int elementCount) 
              return -Math.floorDiv(-elementCount * 4, 3);



              If you write code to compare this method ("two") with the simpler formula ("one") then you'll find that they give identical capacities in all but a few cases. (See the table below.)



              The first number is the capacity calculated by the method, and the number in parentheses is the real capacity that HashMap will actually allocate (because behind the scenes HashMap always uses an array capacity which is a power of two). In most cases the real (power-of-two) capacity ends up being identical for both methods. But in a few cases the simpler method gives a larger capacity than is really needed, and the alternative method gives a capacity which won't need to be resized but will hold the entire element count ("s"). So the alternative method avoids wasting memory in these fringe cases.



              Unless you have no need to ever worry about memory allocation, I'd use this alternative capacity calculation instead. (Be warned that I've just come up with this in five minutes, so I recommend you do your own tests to confirm that the numbers are correct.)



              s one two waste
              1 2(2) 2(2) 0
              2 3(4) 3(4) 0
              3 5(8) 4(4) 4
              4 6(8) 6(8) 0
              5 7(8) 7(8) 0
              6 9(16) 8(8) 8
              7 10(16) 10(16) 0
              8 11(16) 11(16) 0
              9 13(16) 12(16) 0
              10 14(16) 14(16) 0
              11 15(16) 15(16) 0
              12 17(32) 16(16) 16
              13 18(32) 18(32) 0
              14 19(32) 19(32) 0
              15 21(32) 20(32) 0
              16 22(32) 22(32) 0
              17 23(32) 23(32) 0
              18 25(32) 24(32) 0
              19 26(32) 26(32) 0
              20 27(32) 27(32) 0
              21 29(32) 28(32) 0
              22 30(32) 30(32) 0
              23 31(32) 31(32) 0
              24 33(64) 32(32) 32
              25 34(64) 34(64) 0
              26 35(64) 35(64) 0
              27 37(64) 36(64) 0
              28 38(64) 38(64) 0
              29 39(64) 39(64) 0
              30 41(64) 40(64) 0
              31 42(64) 42(64) 0
              32 43(64) 43(64) 0
              33 45(64) 44(64) 0
              34 46(64) 46(64) 0
              35 47(64) 47(64) 0
              36 49(64) 48(64) 0
              37 50(64) 50(64) 0
              38 51(64) 51(64) 0
              39 53(64) 52(64) 0
              40 54(64) 54(64) 0
              41 55(64) 55(64) 0
              42 57(64) 56(64) 0
              43 58(64) 58(64) 0
              44 59(64) 59(64) 0
              45 61(64) 60(64) 0
              46 62(64) 62(64) 0
              47 63(64) 63(64) 0
              48 65(128) 64(64) 64
              49 66(128) 66(128) 0
              50 67(128) 67(128) 0





              share|improve this answer














              If you know the exact number of elements that will be added to a HashMap (with default load factor of 0.75) then I propose using this method to determine the ideal initial capacity which will hold all of those elements without the need for a resize, and without the waste of capacity that you can get under certain conditions if using the (admittedly simpler) formula s * 4 / 3 + 1.



              public static int calculateDefaultHashMapIdealCapacity(int elementCount) 
              return -Math.floorDiv(-elementCount * 4, 3);



              If you write code to compare this method ("two") with the simpler formula ("one") then you'll find that they give identical capacities in all but a few cases. (See the table below.)



              The first number is the capacity calculated by the method, and the number in parentheses is the real capacity that HashMap will actually allocate (because behind the scenes HashMap always uses an array capacity which is a power of two). In most cases the real (power-of-two) capacity ends up being identical for both methods. But in a few cases the simpler method gives a larger capacity than is really needed, and the alternative method gives a capacity which won't need to be resized but will hold the entire element count ("s"). So the alternative method avoids wasting memory in these fringe cases.



              Unless you have no need to ever worry about memory allocation, I'd use this alternative capacity calculation instead. (Be warned that I've just come up with this in five minutes, so I recommend you do your own tests to confirm that the numbers are correct.)



              s one two waste
              1 2(2) 2(2) 0
              2 3(4) 3(4) 0
              3 5(8) 4(4) 4
              4 6(8) 6(8) 0
              5 7(8) 7(8) 0
              6 9(16) 8(8) 8
              7 10(16) 10(16) 0
              8 11(16) 11(16) 0
              9 13(16) 12(16) 0
              10 14(16) 14(16) 0
              11 15(16) 15(16) 0
              12 17(32) 16(16) 16
              13 18(32) 18(32) 0
              14 19(32) 19(32) 0
              15 21(32) 20(32) 0
              16 22(32) 22(32) 0
              17 23(32) 23(32) 0
              18 25(32) 24(32) 0
              19 26(32) 26(32) 0
              20 27(32) 27(32) 0
              21 29(32) 28(32) 0
              22 30(32) 30(32) 0
              23 31(32) 31(32) 0
              24 33(64) 32(32) 32
              25 34(64) 34(64) 0
              26 35(64) 35(64) 0
              27 37(64) 36(64) 0
              28 38(64) 38(64) 0
              29 39(64) 39(64) 0
              30 41(64) 40(64) 0
              31 42(64) 42(64) 0
              32 43(64) 43(64) 0
              33 45(64) 44(64) 0
              34 46(64) 46(64) 0
              35 47(64) 47(64) 0
              36 49(64) 48(64) 0
              37 50(64) 50(64) 0
              38 51(64) 51(64) 0
              39 53(64) 52(64) 0
              40 54(64) 54(64) 0
              41 55(64) 55(64) 0
              42 57(64) 56(64) 0
              43 58(64) 58(64) 0
              44 59(64) 59(64) 0
              45 61(64) 60(64) 0
              46 62(64) 62(64) 0
              47 63(64) 63(64) 0
              48 65(128) 64(64) 64
              49 66(128) 66(128) 0
              50 67(128) 67(128) 0






              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited 1 hour ago

























              answered 1 hour ago









              Bobulous

              9,38842648




              9,38842648











              • or, you know that the actual formula used from CHM that would apply to HashMap? long size = (long)(1.0 + (long)initialCapacity / loadFactor);
                – Eugene
                1 hour ago











              • Yes, but that's the same as the simple formula s * 4 / 3 for a default HashMap with load factor 0.75. And for element count 24 that formula will give 33 which causes a real capacity of 64. But a real capacity of 32 is able to hold 24 elements without a resize. So the alternative method, which results in real capacity 32, is surely preferable?
                – Bobulous
                1 hour ago
















              • or, you know that the actual formula used from CHM that would apply to HashMap? long size = (long)(1.0 + (long)initialCapacity / loadFactor);
                – Eugene
                1 hour ago











              • Yes, but that's the same as the simple formula s * 4 / 3 for a default HashMap with load factor 0.75. And for element count 24 that formula will give 33 which causes a real capacity of 64. But a real capacity of 32 is able to hold 24 elements without a resize. So the alternative method, which results in real capacity 32, is surely preferable?
                – Bobulous
                1 hour ago















              or, you know that the actual formula used from CHM that would apply to HashMap? long size = (long)(1.0 + (long)initialCapacity / loadFactor);
              – Eugene
              1 hour ago





              or, you know that the actual formula used from CHM that would apply to HashMap? long size = (long)(1.0 + (long)initialCapacity / loadFactor);
              – Eugene
              1 hour ago













              Yes, but that's the same as the simple formula s * 4 / 3 for a default HashMap with load factor 0.75. And for element count 24 that formula will give 33 which causes a real capacity of 64. But a real capacity of 32 is able to hold 24 elements without a resize. So the alternative method, which results in real capacity 32, is surely preferable?
              – Bobulous
              1 hour ago




              Yes, but that's the same as the simple formula s * 4 / 3 for a default HashMap with load factor 0.75. And for element count 24 that formula will give 33 which causes a real capacity of 64. But a real capacity of 32 is able to hold 24 elements without a resize. So the alternative method, which results in real capacity 32, is surely preferable?
              – Bobulous
              1 hour ago










              up vote
              0
              down vote













              • Resizing is an essential part of a hashmap's working to keep the load factor low.


              • Load factor needs to be low because the hashing function of the hashmap will inevitably start having collisions when hashmap's buckets are maxing out. The collisions can start from second entry itself if your entries hash to an occupied bucket every time.



              However, in your particular case, collision is not a problem, only hashmap's resizing is.



              Hashmap is generally resized at 0.75 ( = 3/4 in the fraction) load factor. Using this information, you can set up a hashmap of 4/3 times the count of the entries you need to store.




              Regarding your dissent with breaking encapsulation:



              I agree with you It is debatable.



              You can say that it would be better if capacity represented the count of entries until which the resize does not happen, rather than the count of maximum possible entries that could be stored in the hashmap - and I tend to agree with you.



              But someone else could also argue the other side as to why the hashmap is occupying more space than it was instructed to reserve.



              The solution to this problem lies in Java's domain. Java can provide two constructors which are explicit enough about what they will do and then the developers can have their pick during their hashmap's initialization.






              share|improve this answer
























                up vote
                0
                down vote













                • Resizing is an essential part of a hashmap's working to keep the load factor low.


                • Load factor needs to be low because the hashing function of the hashmap will inevitably start having collisions when hashmap's buckets are maxing out. The collisions can start from second entry itself if your entries hash to an occupied bucket every time.



                However, in your particular case, collision is not a problem, only hashmap's resizing is.



                Hashmap is generally resized at 0.75 ( = 3/4 in the fraction) load factor. Using this information, you can set up a hashmap of 4/3 times the count of the entries you need to store.




                Regarding your dissent with breaking encapsulation:



                I agree with you It is debatable.



                You can say that it would be better if capacity represented the count of entries until which the resize does not happen, rather than the count of maximum possible entries that could be stored in the hashmap - and I tend to agree with you.



                But someone else could also argue the other side as to why the hashmap is occupying more space than it was instructed to reserve.



                The solution to this problem lies in Java's domain. Java can provide two constructors which are explicit enough about what they will do and then the developers can have their pick during their hashmap's initialization.






                share|improve this answer






















                  up vote
                  0
                  down vote










                  up vote
                  0
                  down vote









                  • Resizing is an essential part of a hashmap's working to keep the load factor low.


                  • Load factor needs to be low because the hashing function of the hashmap will inevitably start having collisions when hashmap's buckets are maxing out. The collisions can start from second entry itself if your entries hash to an occupied bucket every time.



                  However, in your particular case, collision is not a problem, only hashmap's resizing is.



                  Hashmap is generally resized at 0.75 ( = 3/4 in the fraction) load factor. Using this information, you can set up a hashmap of 4/3 times the count of the entries you need to store.




                  Regarding your dissent with breaking encapsulation:



                  I agree with you It is debatable.



                  You can say that it would be better if capacity represented the count of entries until which the resize does not happen, rather than the count of maximum possible entries that could be stored in the hashmap - and I tend to agree with you.



                  But someone else could also argue the other side as to why the hashmap is occupying more space than it was instructed to reserve.



                  The solution to this problem lies in Java's domain. Java can provide two constructors which are explicit enough about what they will do and then the developers can have their pick during their hashmap's initialization.






                  share|improve this answer












                  • Resizing is an essential part of a hashmap's working to keep the load factor low.


                  • Load factor needs to be low because the hashing function of the hashmap will inevitably start having collisions when hashmap's buckets are maxing out. The collisions can start from second entry itself if your entries hash to an occupied bucket every time.



                  However, in your particular case, collision is not a problem, only hashmap's resizing is.



                  Hashmap is generally resized at 0.75 ( = 3/4 in the fraction) load factor. Using this information, you can set up a hashmap of 4/3 times the count of the entries you need to store.




                  Regarding your dissent with breaking encapsulation:



                  I agree with you It is debatable.



                  You can say that it would be better if capacity represented the count of entries until which the resize does not happen, rather than the count of maximum possible entries that could be stored in the hashmap - and I tend to agree with you.



                  But someone else could also argue the other side as to why the hashmap is occupying more space than it was instructed to reserve.



                  The solution to this problem lies in Java's domain. Java can provide two constructors which are explicit enough about what they will do and then the developers can have their pick during their hashmap's initialization.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered 1 hour ago









                  displayName

                  8,21933253




                  8,21933253



























                       

                      draft saved


                      draft discarded















































                       


                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52671362%2fwhy-does-hashmap-resize-again-when-specifying-a-precise-capacity%23new-answer', 'question_page');

                      );

                      Post as a guest













































































                      Comments

                      Popular posts from this blog

                      Long meetings (6-7 hours a day): Being “babysat” by supervisor

                      Is the Concept of Multiple Fantasy Races Scientifically Flawed? [closed]

                      Confectionery