問題描述
我有一個相當大的 Hashmap (~250MB).創建它大約需要 50-55 秒,所以我決定將它序列化并保存到一個文件中.現在從文件中讀取大約需要 16-17 秒.
I have a pretty large Hashmap (~250MB). Creating it takes about 50-55 seconds, so I decided to serialize it and save it to a file. Reading from the file takes about 16-17 seconds now.
唯一的問題是這種方式的查找速度似乎較慢.我一直以為hashmap是從文件中讀入內存的,所以性能應該和我自己創建hashmap的情況是一樣的吧?這是我用來將哈希圖讀入文件的代碼:
The only problem is that lookups seems to be slower this way. I always thought that the hashmap is read from the file into the memory, so the performance should be the same compared to the case when I create the hashmap myself, right? Here is the code I am using to read the hashmap into a file:
File file = new File("omaha.ser");
FileInputStream f = new FileInputStream(file);
ObjectInputStream s = new ObjectInputStream(new BufferedInputStream(f));
omahaMap = (HashMap<Long, Integer>) s.readObject();
s.close();
當我自己創建哈希圖時,3 億次查找大約需要 3.1 秒,而當我從文件中讀取相同的哈希圖時,大約需要 8.5 秒.有人知道為什么嗎?我是否忽略了一些明顯的東西?
300 million lookups take about 3.1 seconds when I create the hashmap myself, and about 8.5 seconds when I read the same hashmap from file. Does anybody have an idea why? Am I overlooking something obvious?
我只是通過 System.nanotime() 來測量"時間,因此沒有使用適當的基準測試方法.代碼如下:
I "measured" the time by just taking the time with System.nanotime(), so no proper benchmark method used. Here is the code:
public class HandEvaluationTest
{
public static void Test()
{
HandEvaluation.populate5Card();
HandEvaluation.populate9CardOmaha();
Card[] player1cards = {new Card("4s"), new Card("2s"), new Card("8h"), new Card("4d")};
Card[] player2cards = {new Card("As"), new Card("9s"), new Card("6c"), new Card("2h")};
Card[] player3cards = {new Card("9h"), new Card("7h"), new Card("Kc"), new Card("Kh")};
Card[] table = {new Card("2d"), new Card("2c"), new Card("3c"), new Card("5c"), new Card("4h")};
int j=0, k=0, l=0;
long startTime = System.nanoTime();
for(int p=0; p<100000000; p++) {
j = HandEvaluation.handEval9Hash(player1cards, table);
k = HandEvaluation.handEval9Hash(player2cards, table);
l = HandEvaluation.handEval9Hash(player3cards, table);
}
long estimatedTime = System.nanoTime() - startTime;
System.out.println("Time needed: " + estimatedTime*Math.pow(10,-6) + "ms");
System.out.println("Handstrength Player 1: " + j);
System.out.println("Handstrength Player 2: " + k);
System.out.println("Handstrength Player 3: " + l);
}
}
大的 hashmap 工作在 HandEvaluation.populate9CardOmaha() 中完成.5 張牌很小.大一號的代碼:
The big hashmap work is done in HandEvaluation.populate9CardOmaha(). The 5-card one is small. The code for the big one:
public static void populate9CardOmaha()
{
//Check if the hashmap is already there- then just read it and exit
File hashmap = new File("omaha.ser");
if(hashmap.exists())
{
try
{
File file = new File("omaha.ser");
FileInputStream f = new FileInputStream(file);
ObjectInputStream s = new ObjectInputStream(new BufferedInputStream(f));
omahaMap = (HashMap<Long, Integer>) s.readObject();
s.close();
}
catch(IOException ioex) {ioex.printStackTrace();}
catch(ClassNotFoundException cnfex)
{
System.out.println("Class not found");
cnfex.printStackTrace();
return;
}
return;
}
// if it's not there, populate it yourself
... Code for populating hashmap ...
// and then save it to file
(
try
{
File file = new File("omaha.ser");
FileOutputStream f = new FileOutputStream(file);
ObjectOutputStream s = new ObjectOutputStream(new BufferedOutputStream(f));
s.writeObject(omahaMap);
s.close();
}
catch(IOException ioex) {ioex.printStackTrace();}
}
當我自己填充它時(= 文件不在此處),HandEvaluationTest.Test() 中的查找大約需要 8 秒而不是 3 秒.也許這只是我測量經過時間的非常天真的方式?
When i am populating it myself (= file is not here), lookups in the HandEvaluationTest.Test() take about 8 seconds instead of 3. Maybe it's just my very naive way of measuring the time elapsed?
推薦答案
這個問題很有意思,所以我自己寫了一個測試用例來驗證一下.我發現實時查找與從序列化文件加載的查找速度沒有差異.任何有興趣運行它的人都可以在文章末尾找到該程序.
This question was interesting, so I wrote my own test case to verify it. I found no difference in speed for a live lookup Vs one that was loaded from a serialized file. The program is available at the end of the post for anyone interested in running it.
- 使用 JProfiler 監控方法.
- 序列化文件與您的類似.~
230 MB
. - 在沒有任何序列化的情況下在內存中查找花費 1210 毫秒
- 序列化地圖并再次讀取它們后,查找成本保持不變(幾乎 - 1224 毫秒)
- 對探查器進行了調整,以在兩種情況下都增加最小的開銷.
- 這是在
Java(TM) SE 運行時環境(內部版本 1.6.0_25-b06)
/4 個 1.7 Ghz 運行的 CPU
/4G??B Ram 800 上測量的兆赫
測量很棘手.我自己注意到了您描述的 8 秒
查找時間,但猜猜我在發生這種情況時還注意到了什么.
Measuring is tricky. I myself noticed the 8 second
lookup time that you described, but guess what else I noticed when that happened.
您的測量結果可能也反映了這一點.如果您單獨隔離 Map.get()
的測量值,您會發現結果具有可比性.
Your measurements are probably picking that up too. If you isolate the measurements of Map.get()
alone you'll see that the results are comparable.
public class GenericTest
{
public static void main(String... args)
{
// Call the methods as you please for a live Vs ser <-> de_ser run
}
private static Map<Long, Integer> generateHashMap()
{
Map<Long, Integer> map = new HashMap<Long, Integer>();
final Random random = new Random();
for(int counter = 0 ; counter < 10000000 ; counter++)
{
final int value = random.nextInt();
final long key = random.nextLong();
map.put(key, value);
}
return map;
}
private static void lookupItems(int n, Map<Long, Integer> map)
{
final Random random = new Random();
for(int counter = 0 ; counter < n ; counter++)
{
final long key = random.nextLong();
final Integer value = map.get(key);
}
}
private static void serialize(Map<Long, Integer> map)
{
try
{
File file = new File("temp/omaha.ser");
FileOutputStream f = new FileOutputStream(file);
ObjectOutputStream s = new ObjectOutputStream(new BufferedOutputStream(f));
s.writeObject(map);
s.close();
}
catch (Exception e)
{
e.printStackTrace();
}
}
private static Map<Long, Integer> deserialize()
{
try
{
File file = new File("temp/omaha.ser");
FileInputStream f = new FileInputStream(file);
ObjectInputStream s = new ObjectInputStream(new BufferedInputStream(f));
HashMap<Long, Integer> map = (HashMap<Long, Integer>) s.readObject();
s.close();
return map;
}
catch (Exception e)
{
throw new RuntimeException(e);
}
}
}
這篇關于反序列化后哈希圖變慢 - 為什么?的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!