問題描述
在標準庫的至少一個實現中,std::uniform_int_distribution<>
的第一次調用不會返回隨機值,而是分布的最小值.也就是說,給定代碼:
In at least one implementation of the standard library, the first invocation of a std::uniform_int_distribution<>
does not return a random value, but rather the distribution's min value. That is, given the code:
default_random_engine engine( any_seed() );
uniform_int_distribution< int > distribution( smaller, larger );
auto x = distribution( engine );
assert( x == smaller );
對于 any_seed()
、smaller
、smaller
的任何值,
...x
實際上將 smaller
,或更大
.
...x
will in fact be smaller
for any values of any_seed()
, smaller
, or larger
.
要在家玩,您可以嘗試在 gcc 4.8.1 中演示此問題的代碼示例.
To play along at home, you can try a code sample that demonstrates this problem in gcc 4.8.1.
我相信這是不正確的行為?如果這是正確的行為,為什么隨機分布會返回這個明顯非隨機的值?
I trust this is not correct behavior? If it is correct behavior, why would a random distribution return this clearly non-random value?
推薦答案
對觀察到的行為的解釋
如果可能結果的范圍小于 rng 產生的數字范圍,uniform_int_distribution
就是這樣將隨機位映射到數字的:
Explanation for the observed behavior
This is how uniform_int_distribution
maps the random bits to numbers if the range of possible outcomes is smaller than the range of number the rng produces:
const __uctype __uerange = __urange + 1; // __urange can be zero
const __uctype __scaling = __urngrange / __uerange;
const __uctype __past = __uerange * __scaling;
do
__ret = __uctype(__urng()) - __urngmin;
while (__ret >= __past);
__ret /= __scaling;
其中 __urange
是 larger -smaller
并且 __urngrange
是 rng 可以返回的最大值和最小值之間的差值.(代碼來自 libstdc++ 6.1 中的 bits/uniform_int_dist.h)
where __urange
is larger - smaller
and __urngrange
is the difference between the maximum and the minimum value the rng can return. (Code from bits/uniform_int_dist.h in libstdc++ 6.1)
在我們的例子中,rng default_random_engine
是一個 minstd_rand0
,它產生 __scaling == 195225785
對于范圍 [0,10] 你測試.因此,如果 rng() <195225785
,分配將返回0.
In our case, the rng default_random_engine
is a minstd_rand0
, which yields __scaling == 195225785
for the range [0,10] you tested with. Thus, if rng() < 195225785
, the distribution will return 0.
minstd_rand0
返回的第一個數字是
(16807 * seed) % 2147483647
(其中 seed == 0
被調整為 1
順便說一句).因此,我們可以看到由 minstd_rand0
產生的第一個值以小于 11615 的數字作為種子將產生 0,uniform_int_distribution<國際 >分布( 0, 10 );
你用過.(修改我的一個錯誤.;))
(where seed == 0
gets adjusted to 1
btw). We can thus see that the first value produced by a minstd_rand0
seeded with a number smaller than 11615 will yield 0 with the uniform_int_distribution< int > distribution( 0, 10 );
you used. (mod off-by-one-errors on my part. ;) )
您提到了更大種子的問題會消失:一旦種子變得足夠大以實際使 mod 操作執行某些操作,我們就不能簡單地通過除法將整個范圍的值分配給相同的輸出,因此結果將看起來更好.
You mentioned the problem going away for bigger seeds: As soon as the seeds get big enough to actually make the mod operation do something, we cannot simply assign a whole range of values to the same output by division, so the results will look better.
沒有.通過始終選擇較小的隨機數,您在應該是隨機的 32 位種子中引入了顯著的偏差.結果中出現的偏見并不奇怪或邪惡.對于隨機種子,即使您的 minstd_rand0
也會產生相當均勻的隨機第一個值.(雖然之后的數字序列不會有很好的統計質量.)
No. You introduced significant bias in what is supposed to be a random 32 bit seed by always choosing it small. That bias showing up in the results is not surprising or evil. For random seeds, even your minstd_rand0
will yield a fairly uniformly random first value. (Though the sequence of numbers after that will not be of great statistical quality.)
案例 1:您想要高統計質量的隨機數.
Case 1: You want random number of high statistical quality.
為此,您可以使用更好的 rng,例如 mt19937
并為其 整個 狀態空間設定種子.對于 Mersenne Twister,這是 624 個 32 位整數.(作為參考,這里是我嘗試正確執行此操作的一些有用建議在答案中.)
For that, you use a better rng like mt19937
and seed its entire state space. For the Mersenne Twister, that's 624 32-bit integers. (For reference, here is my attempt to do this properly with some helpful suggestions in the answer.)
案例 2:您真的只想使用那些小種子.
Case 2: You really want to use those small seeds only.
我們仍然可以從中獲得不錯的結果.問題是偽隨機數生成器通常有點連續地"依賴于隨機數生成器.在他們的種子上.為了解決這個問題,我們丟棄了足夠的數字,讓最初相似的輸出序列發散.因此,如果您的種子必須很小,您可以像這樣初始化您的 rng:
We can still get decent results out of this. The problem is that pseudo random number generators commonly depend "somewhat continuously" on their seed. To ship around this, we discard enough numbers to let the initially similar sequences of output diverge. So if your seed must be small, you can initialize your rng like this:
std::mt19937 rng(smallSeed);
rng.discard(700000);
為此使用像 Mersenne Twister 這樣的好 rng 至關重要.我不知道有什么方法可以從種子不佳的 minstd_rand0
中獲得合適的值,例如參見 這個火車失事.即使播種正確,mt19937
的統計特性也遠勝一籌.
It is vital that you use a good rng like the Mersenne Twister for this. I do not know of any method to get even decent values out of a poorly seeded minstd_rand0
, for example see this train-wreck. Even if seeded properly, the statistical properties of a mt19937
are superior by far.
您有時會聽到對大型狀態空間或緩慢生成的擔憂,但在嵌入式世界之外通常并不擔心.根據 boost 和 cacert.at,MT 甚至比 minstd_rand0代碼>.
Concerns about the large state space or slow generation you sometimes hear about are usually of no concern outside the embedded world. According to boost and cacert.at, the MT is even way faster than minstd_rand0
.
盡管如此,您仍然需要執行丟棄技巧,即使您的結果在沒有肉眼的情況下看起來不錯.在我的系統上它只需要不到一毫秒,而且你不經常播種 rng,所以沒有理由不這樣做.
You still need to do the discard trick though, even if your results look good to the naked eye without. It takes less than a millisecond on my system, and you don't seed rngs very often, so there is no reason not to.
請注意,我無法準確估計我們需要的丟棄次數,我從 中獲取了該值這個答案,它鏈接這篇論文為理性.我現在沒有時間解決這個問題.
Note that I am not able to give you a sharp estimate for the number of discards we need, I took that value from this answer, it links this paper for a rational. I don't have the time to work through that right now.
這篇關于C++uniform_int_distribution 在第一次調用時總是返回 min()的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!