問題描述
從 Java 8 開始,hashMap 稍作修改,如果同一存儲桶上有超過 8 個 (TREEIFY_THRESHOLD=8) 項,則 hashMap 具有平衡樹而不是鏈表.選擇 8 有什么理由嗎?
From Java 8, the hashMap modified slightly to have balanced tree instead of linkedlist if more than 8 (TREEIFY_THRESHOLD=8) items on same bucket. is there any reason choosing 8?
如果是 9 會影響性能嗎?
would it impact the performance in case it is 9?
推薦答案
使用平衡樹而不是鏈表是一種權衡.在列表的情況下,必須執(zhí)行線性掃描以在存儲桶中執(zhí)行查找,而樹允許日志時間訪問.當列表很小時,查找速度很快,并且使用樹實際上并沒有提供任何好處,而大約 8 個左右的元素在列表中查找的成本變得足夠顯著,以至于樹提供了加速.
The use of a balanced tree instead of a linked-list is a tradeoff. In the case of a list, a linear scan must be performed to perform a lookup in a bucket, while the tree allows for log-time access. When the list is small, the lookup is fast and using a tree doesn't actually provide a benefit while around 8 or so elements the cost of a lookup in the list becomes significant enough that the tree provides a speed-up.
我懷疑樹的使用是針對密鑰哈希被災難性破壞(例如許多密鑰沖突)的例外情況;雖然線性查找會導致性能嚴重下降,但使用樹可以緩解這種情況性能有所損失,如果鍵可直接比較.
I suspect that the use of a tree is intended for the exceptional case where the key hash is catastrophically broken (e.g. many keys collide); while a linear lookup will cause performance to degrade severely the use of a tree mitigates this performance loss somewhat, if the keys are directly comparable.
因此,8 個條目的確切閾值可能不是非常重要:假設良好的密鑰分布,樹箱的機會是 0.00000006,因此在這種情況下顯然很少使用樹箱.當哈希算法災難性地失敗時,存儲桶中的鍵數(shù)無論如何都遠大于 8.
Therefore, the exact threshold of 8 entries may not be terribly significant: the chance of a tree bin is 0.00000006 assuming good key distribution, so tree bins are obviously used very rarely in such a case. When the hash algorithm is failing catastrophically, then the number of keys in the bucket is far greater than 8 anyway.
這會帶來空間損失,因為樹節(jié)點必須包含額外的引用:四個對樹節(jié)點的引用和一個布爾值除了 LinkedHashMap.Entry
(見 它的來源).
This comes at a space penalty since the tree-node must include additional references: four references to tree nodes and a boolean in addition to the fields of a LinkedHashMap.Entry
(see its source).
來自 HashMap類源碼中的注釋:
因為 TreeNode 的大小大約是常規(guī)節(jié)點的兩倍,我們僅當 bin 包含足夠的節(jié)點以保證使用時才使用它們(參見 TREEIFY_THRESHOLD).當它們變得太小時(由于刪除或調整大小)它們被轉換回普通垃圾箱.在使用分布良好的用戶哈希碼,樹箱是很少使用.理想情況下,在隨機哈希碼下,箱中的節(jié)點遵循泊松分布(http://en.wikipedia.org/wiki/Poisson_distribution)默認調整大小的平均參數(shù)約為 0.50.75 的閾值,盡管有很大的差異,因為調整粒度.忽略方差,預期列表大小 k 的出現(xiàn)次數(shù)為 (exp(-0.5) * pow(0.5, k)/階乘(k)).
Because TreeNodes are about twice the size of regular nodes, we use them only when bins contain enough nodes to warrant use (see TREEIFY_THRESHOLD). And when they become too small (due to removal or resizing) they are converted back to plain bins. In usages with well-distributed user hashCodes, tree bins are rarely used. Ideally, under random hashCodes, the frequency of nodes in bins follows a Poisson distribution (http://en.wikipedia.org/wiki/Poisson_distribution) with a parameter of about 0.5 on average for the default resizing threshold of 0.75, although with a large variance because of resizing granularity. Ignoring variance, the expected occurrences of list size k are (exp(-0.5) * pow(0.5, k) / factorial(k)).
這篇關于Java Hashmap 中有什么理由在 TREEIFY_THRESHOLD 上有 8 個嗎?的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網(wǎng)!