久久久久久久av_日韩在线中文_看一级毛片视频_日本精品二区_成人深夜福利视频_武道仙尊动漫在线观看

訪問 ConcurrentHashMap<Element, Boolean> 的每

Scalable way to access every element of ConcurrentHashMaplt;Element, Booleangt; exactly once(訪問 ConcurrentHashMaplt;Element, Booleangt; 的每個元素的可擴展方式恰好一次)
本文介紹了訪問 ConcurrentHashMap<Element, Boolean> 的每個元素的可擴展方式恰好一次的處理方法,對大家解決問題具有一定的參考價值,需要的朋友們下面隨著小編來一起學習吧!

問題描述

我有 32 個機器線程和一個 ConcurrentHashMap<Key,Value>map,其中包含很多鍵.Key 定義了一個公共方法 visit().我想visit() 使用我可用的處理能力以及可能的某種線程池,只對 map 的每個元素進行一次.

I have 32 machine threads and one ConcurrentHashMap<Key,Value> map, which contains a lot of keys. Key has defined a public method visit(). I want to visit() every element of map exactly once using the processing power I have available and possibly some sort of thread pooling.

我可以嘗試的事情:

  • 我可以使用 map.keys() 方法.生成的 Enumeration 可以使用 nextElement() 進行迭代,但是由于對 key.visit() 的調用非常簡短,因此我無法使線程保持忙碌.枚舉本質上是單線程的.
  • 我可以使用同步的 HashSet<Key> 代替,調用方法 toArray() 并將數組上的工作分成所有 32 個線程.我嚴重懷疑這個解決方案,因為方法 toArray() 很可能是單線程瓶頸.
  • 我可以嘗試從 ConcurrentHashMap 繼承,掌握其內部 Segment<K,V> 的實例,嘗試將它們分成 32 個組并工作分別在每組上.不過,這聽起來像是一種硬核方法.
  • 或使用 Enumeration<Key> 的類似魔法.
  • I could use the method map.keys(). The resulting Enumeration<Key> could be iterated over using nextElement(), but since a call to key.visit() is very brief I won't manage to keep threads busy. The Enumeration is inherently single-threaded.
  • I could use a synchronised HashSet<Key> instead, invoke a method toArray() and split the work on the array into all 32 threads. I seriously doubt in this solution, since the method toArray() will likely be a single-thread bottle-neck.
  • I could try to inherit from ConcurrentHashMap, get my hands on the instances of its inner Segment<K,V>, try to group them into 32 groups and work on each group separately. This sounds like a hardcore approach though.
  • or similar magic with Enumeration<Key>.

理想情況下:

  • 理想情況下,ConcurrentHashMap<Key, Value> 將定義一個方法 keysEnumerator(intapproxPosition),這可能會使我丟失大約前 1/32 個元素的枚舉器,即map.keysEnumerator(map.size()/32)
  • Ideally a ConcurrentHashMap<Key, Value> would define a method keysEnumerator(int approximatePosition), which could drop me an enumerator missing approximately first 1/32 elements, i.e. map.keysEnumerator(map.size()/32)

我錯過了什么明顯的東西嗎?有沒有人遇到過類似的問題?

Am I missing anything obvious? Has anybody run into similar problem before?

編輯

我已經進行了分析,看看這個問題是否真的會影響實踐中的性能.由于目前我無法訪問集群,因此我使用筆記本電腦并嘗試將結果外推到更大的數據集.在我的機器上,我可以創建一個 200 萬個鍵 ConcurrentHashMap,并且在每個鍵上調用 visit() 方法來迭代它大約需要 1 秒.該程序應該擴展到 8500 萬鍵(及以上).集群的處理器稍微快一些,但它仍然需要大約 40 秒來迭代整個地圖.現在談談程序的邏輯流程.呈現的邏輯是順序的,即在上一步中的所有線程都完成之前,不允許任何線程進行下一步:

I've had a go at profiling to see whether this problem is actually going to affect the performance in practice. As I don't have access to the cluster at the moment I used my laptop and tried to extrapolate the results to a bigger dataset. On my machine I can create a 2 million keys ConcurrentHashMap and it takes about 1 second to iterate over it invoking the visit() method on every key. The program is supposed to scale to 85 million keys (and over). The cluster's processor is slightly faster, but it still should take about 40 seconds to iterate over entire map. Now a few words about the logic flow of the program. The logic presented is sequential, i.e. it is not allowed for any thread to proceed to the next step until all the threads in the previous step are finished:

  1. 創建哈希映射,創建鍵并填充哈希映射
  2. 遍歷訪問所有鍵的整個哈希映射.
  3. 進行一些數據混洗,即并行插入和刪除.
  4. 將第 2 步和第 3 步重復數百次.
  1. Create the hash map, create the keys and populate the hash map
  2. Iterate over entire hash map visiting all the keys.
  3. Do some data shuffling which is parallel insertions and deletions.
  4. Repeat step 2 and 3 a few hundred times.

這個邏輯流程意味著一個 40 秒的迭代將被重復幾百次,比如 100 次.這讓我們在訪問節點上花費了 一個多小時.使用一組 32 個并行迭代器,它可以縮短到幾分鐘,這是一個顯著的性能改進.

That logic flow means that a 40 second iteration is going to be repeated a few hundred times, say 100. Which gives us a bit over an hour spent just in visiting the nodes. With a set of 32 parallel iterators it could go down to just a few minutes, which is a significant performance improvement.

現在談談 ConcurrentHashMap 是如何工作的(或者我認為它是如何工作的).每個 ConcurrentHashMap 都由段組成(默認為 16 個).對哈希映射的每次寫入都會在相關段上同步.假設我們正在嘗試將兩個新鍵 k1 和 k2 寫入哈希映射,并且它們將被解析為屬于同一段,例如 s1.如果嘗試同時寫入它們,則其中一個將首先獲取鎖,然后再添加另一個.兩個元素被解析為屬于同一段的機會是多少?如果我們有一個好的散列函數和 16 個段,那么它就是 1/16.

Now a few words on how ConcurrentHashMap works (Or how I believe it works). Every ConcurrentHashMap consists of segments (by default 16). Every write to a hash map is synchronised on a relevant segment. So say we're trying to write two new keys k1 and k2 to the hash map and that they would be resolved to belong to the same segment, say s1. If they are attempted to be written simultaneously, one of them is going to acquire the lock first and be added earlier then the other. What is the chance of two elements to be resolved to belong to the same segment? In case we have got a good hash function and 16 segements it is 1/16.

我相信 ConcurrentHashMap 應該有一個方法 concurrentKeys(),它將返回一個枚舉數組,每個段一個.我有一些想法如何通過繼承將它添加到 ConcurrentHashMap,如果我成功了,我會告訴你的.就目前而言,解決方案似乎是創建一個 ConcurrentHashMaps 數組并預??先散列每個鍵以解析為此類數組的一個成員.準備好后,我也會分享該代碼.

I belive that ConcurrentHashMap should have a method concurrentKeys(), which would return an array of Enumerations, one per each segment. I have got a few ideas how to add it to ConcurrentHashMap through inheritance and I'll let you know if I succeed. As for now the solution seems to be to create an array of ConcurrentHashMaps and pre-hashing every key to resolve to one member of such array. I'll share that code as well, once it's ready.

編輯

這是不同語言的相同問題:

This is the same problem in a different language:

并行迭代器

推薦答案

我最終會采用的解決方案是一個 ConcurrentHashMaps 數組,而不是一個 ConcurrentHashMap.這是臨時的,但似乎與我的用例有關.我不在乎第二步的速度很慢,因為它不會影響我的代碼的性能.解決辦法是:

The solution I will eventually go for is an array of ConcurrentHashMaps instead of one ConcurrentHashMap. This is ad hoc, but seems to be relevant for my usecase. I don't care about the second step being slow as it doesn't affect my code's performance. The solution is:

對象創建:

  1. 創建一個大小為 t 的 ConcurrentHashMaps 數組,其中 t 是線程數.
  2. 創建一個大小為 t 的 Runnables 數組.
  1. Create an array of size t of ConcurrentHashMaps, where t is a number of threads.
  2. Create an array of Runnables, also of size t.

數組填充(單線程,不是問題):

Array Population (single threaded, not an issue):

  1. 創建鍵并應用 pre-hash 函數,它將返回 0 ... t-1 范圍內的 int.在我的情況下,只需模 t.
  2. 通過訪問數組中的適當條目,將鍵放入哈希圖中.例如.如果預哈希導致索引為 4,則使用 hashArray[4].put(key)
  1. Create the keys and apply pre-hash function, which will return an int in the range 0 ... t-1. In my case simply modulo t.
  2. Put the key in the hashmap, by accessing appropriate entry in the array. E.g. if the pre-hash has resulted in index 4, then go for hashArray[4].put(key)

數組迭代(很好的多線程,性能提升):

Array Iteration (nicely multithreaded, performance gain):

  1. 為 Runnables 數組中的每個線程分配一個使用相應索引遍歷 hashmap 的作業.與單線程相比,這應該使迭代時間縮短 t 倍.
  1. Assign every thread from Runnables array a job of iterating over the hashmap with a corresponding index. This should give give a t times shorter iteration as opposed to single threaded.

要查看概念驗證代碼(因為它有一些來自項目的依賴項,我無法在此處發布)前往我在 github 上的項目

To see the proof of concept code (as it's got some dependencies from the project I can't post it here) head towards my project on github

編輯

實際上,為我的系統實施上述概念證明已被證明是耗時、容易出錯且令人非常失望的.此外,我發現我會錯過標準庫 ConcurrentHashMap 的許多功能.我最近一直在探索的解決方案是使用 Scala,它看起來不那么特別而且更有希望,它產生的字節碼可以與 Java 完全互操作.概念證明依賴于 本文 中描述的令人驚嘆的庫和 AFAIK考慮到標準庫和相應第三方庫的當前狀態,如果不編寫數千行代碼,目前在 vanilla Java 中實現相應的解決方案是不可能的.

Actually, implementing the above proof of concept for my system has proven to be time-consuming, bug-prone and grossly disappointing. Additionally I've discovered I would have missed many features of the standard library ConcurrentHashMap. The solution I have been exploring recently, which looks much less ad-hoc and much more promising is to use Scala, which produces bytecode that is fully interoperable with Java. The proof of concept relies on stunning library described in this paper and AFAIK it is currently IMPOSSIBLE to achieve a corresponding solution in vanilla Java without writing thousands lines of code, given the current state of the standard library and corresponding third-party libraries.

import scala.collection.parallel.mutable.ParHashMap

class Node(value: Int, id: Int){
    var v = value
    var i = id
    override def toString(): String = v toString
}

object testParHashMap{
    def visit(entry: Tuple2[Int, Node]){
        entry._2.v += 1
    }
    def main(args: Array[String]){
        val hm = new ParHashMap[Int, Node]()
        for (i <- 1 to 10){
            var node = new Node(0, i)
            hm.put(node.i, node)
        }

        println("========== BEFORE ==========")
        hm.foreach{println}

        hm.foreach{visit}

        println("========== AFTER ==========")
        hm.foreach{println}

    }
}

這篇關于訪問 ConcurrentHashMap&lt;Element, Boolean&gt; 的每個元素的可擴展方式恰好一次的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!

【網站聲明】本站部分內容來源于互聯網,旨在幫助大家更快的解決問題,如果有圖片或者內容侵犯了您的權益,請聯系我們刪除處理,感謝您的支持!

相關文檔推薦

Convert List of Strings into Map using Java-8 Streams API(使用 Java-8 Streams API 將字符串列表轉換為 Map)
Getting data from JSON(從 JSON 獲取數據)
java linkedhashmap iteration(javalinkedhashmap迭代)
Converting a list of objects to Map(將對象列表轉換為 Map)
Create a HashMap with a fixed Key corresponding to a HashSet. point of departure(用一個固定的Key對應一個HashSet創建一個HashMap.出發點)
HttpMessageConverter exception : RestClientException: Could not write request: no suitable HttpMessageConverter found(HttpMessageConverter 異常:RestClientException:無法寫入請求:找不到合適的 HttpMessageConverter) - IT屋-程序員
主站蜘蛛池模板: 中文字幕亚洲欧美日韩在线不卡 | 伊人超碰| 精品日韩一区 | 午夜影院在线免费观看视频 | 免费a级毛片在线播放 | 日韩1区 | 日韩精品一区二区三区中文在线 | 中文字幕 国产 | 国产一区二区三区四区三区四 | 99精品一区二区 | av大片在线观看 | 久久久久久久久久久久一区二区 | 综合国产 | 午夜二区 | 日韩精品视频在线 | av在线一区二区三区 | 久久久噜噜噜www成人网 | 日韩成人中文字幕 | 青青久久久 | 日韩免费福利视频 | 最新日韩在线 | 久久99精品久久久久久噜噜 | 国产剧情久久 | 欧美日高清视频 | 日韩一区二区三区av | 日韩精品视频在线播放 | 欧美精品一区二区三区在线四季 | 国产成人免费在线观看 | 欧美激情国产日韩精品一区18 | 中文字幕一区在线观看视频 | 麻豆av免费观看 | 成人免费视屏 | 亚洲精品一区二区冲田杏梨 | 国产一区二区自拍 | 日韩成人精品一区 | 日韩在线观看网站 | 日韩视频精品 | 一区二区三区欧美在线观看 | 精品国产一区二区三区观看不卡 | 欧美福利| 久久久久国产一区二区三区不卡 |