問(wèn)題描述
我想從 [a,b] 之間的特定分布(例如均勻隨機(jī))中生成 N 個(gè)隨機(jī)數(shù),它們的總和為常數(shù) C.我嘗試了一些我能想到的解決方案,其中一些建議用于類似的線程,但它們中的大多數(shù)要么針對(duì)有限形式的問(wèn)題工作,要么我無(wú)法證明結(jié)果仍然遵循所需的分布.
I want to generate N random numbers drawn from a specif distribution (e.g uniform random) between [a,b] which sum to a constant C. I have tried a couple of solutions I could think of myself, and some proposed on similar threads but most of them either work for a limited form of problem or I can't prove the outcome still follows the desired distribution.
我嘗試過(guò)的:生成 N 個(gè)隨機(jī)數(shù),將它們?nèi)砍运鼈兊目偤筒⒊艘运璧某?shù).這似乎有效,但結(jié)果不遵循數(shù)字應(yīng)在 [a:b] 內(nèi)的規(guī)則.
What I have tried: Generage N random numbers, divide all of them by the sum of them and multiply by the desired constant. This seems to work but the result does not follow the rule that the numbers should be within [a:b].
生成 N-1 個(gè)隨機(jī)數(shù)加上 0 和所需的常數(shù) C 并對(duì)其進(jìn)行排序.然后計(jì)算每?jī)蓚€(gè)連續(xù)數(shù)字之間的差值,差值就是結(jié)果.這再次與 C 相加,但具有與上一個(gè)方法相同的問(wèn)題(范圍可以大于 [a:b].
Generage N-1 random numbers add 0 and desired constant C and sort them. Then calculate the difference between each two consecutive nubmers and the differences are the result. This again sums to C but have the same problem of last method(the range can be bigger than [a:b].
我還嘗試生成隨機(jī)數(shù),并始終以保持所需總和和范圍的方式跟蹤最小值和最大值,并得出以下代碼:
I also tried to generate random numbers and always keep track of min and max in a way that the desired sum and range are kept and come up with this code:
bool generate(function<int(int,int)> randomGenerator,int min,int max,int len,int sum,std::vector<int> &output){
/**
* Not possible to produce such a sequence
*/
if(min*len > sum)
return false;
if(max*len < sum)
return false;
int curSum = 0;
int left = sum - curSum;
int leftIndexes = len-1;
int curMax = left - leftIndexes*min;
int curMin = left - leftIndexes*max;
for(int i=0;i<len;i++){
int num = randomGenerator((curMin< min)?min:curMin,(curMax>max)?max:curMax);
output.push_back(num);
curSum += num;
left = sum - curSum;
leftIndexes--;
curMax = left - leftIndexes*min;
curMin = left - leftIndexes*max;
}
return true;
}
這似乎有效,但結(jié)果有時(shí)非常不準(zhǔn)確,我認(rèn)為它沒(méi)有遵循原始分布(例如均勻分布).例如:
This seems to work but the results are sometimes very skewed and I don't think it's following the original distribution (e.g. uniform). E.g:
//10 numbers within [1:10] which sum to 50:
generate(uniform,1,10,10,50,output);
//result:
2,7,2,5,2,10,5,8,4,5 => sum=50
//This looks reasonable for uniform, but let's change to
//10 numbers within [1:25] which sum to 50:
generate(uniform,1,25,10,50,output);
//result:
24,12,6,2,1,1,1,1,1,1 => sum= 50
注意輸出中有多少個(gè).這聽(tīng)起來(lái)可能合理,因?yàn)榉秶?但它們看起來(lái)真的不像均勻分布.我不確定即使有可能實(shí)現(xiàn)我想要的,也許限制使問(wèn)題無(wú)法解決.
Notice how many ones exist in the output. This might sound reasonable because the range is larger. But they really don't look like a uniform distribution. I am not sure even if it is possible to achieve what I want, maybe the constraints are making the problem not solvable.
推薦答案
如果您希望樣本服從均勻分布,則問(wèn)題簡(jiǎn)化為生成總和 = 1 的 N 個(gè)隨機(jī)數(shù).反過(guò)來(lái),這是一個(gè)特殊的Dirichlet 分布的情況,但也可以使用指數(shù)分布更容易地計(jì)算.方法如下:
In case you want the sample to follow a uniform distribution, the problem reduces to generate N random numbers with sum = 1. This, in turn, is a special case of the Dirichlet distribution but can also be computed more easily using the Exponential distribution. Here is how:
- 取一個(gè)統(tǒng)一的樣本 v1 ... vN,其中所有的 vi 都在 0 和 1 之間.
- 對(duì)于所有的 i,1<=i<=N,定義 ui := -ln vi(注意 ui> 0).
- 將 ui 標(biāo)準(zhǔn)化為 pi := ui/s 其中 s 是總和 u1+...+uN.
- Take a uniform sample v1 … vN with all vi between 0 and 1.
- For all i, 1<=i<=N, define ui := -ln vi (notice that ui > 0).
- Normalize the ui as pi := ui/s where s is the sum u1+...+uN.
p1..pN 是均勻分布的(在dim N-1 的單純形中),它們的和為1.
The p1..pN are uniformly distributed (in the simplex of dim N-1) and their sum is 1.
您現(xiàn)在可以將這些 pi 乘以您想要的常數(shù) C,然后通過(guò)對(duì)其他常數(shù) A 求和來(lái)轉(zhuǎn)換它們
You can now multiply these pi by the constant C you want and translate them by summing some other constant A like this
qi := A + pi*C.
qi := A + pi*C.
編輯 3
為了解決評(píng)論中提出的一些問(wèn)題,讓我添加以下內(nèi)容:
In order to address some issues raised in the comments, let me add the following:
- 為了確保最終的隨機(jī)序列落在區(qū)間 [a,b] 內(nèi),選擇上面的常數(shù) A 和 C 作為 A := a 和 C := ba,即取 qi =a + pi*(ba).由于 pi 在 (0,1) 范圍內(nèi),所有 qi 都將在 [a,b] 范圍內(nèi).
- 如果 vi 恰好為 0,則不能取(負(fù))對(duì)數(shù) -ln(vi) 因?yàn)?ln() 未定義為 0.概率此類事件的發(fā)生率極低.但是,為了確保不會(huì)發(fā)出錯(cuò)誤信號(hào),上面第 1 項(xiàng)中 v1 ... vN 的生成必須以特殊方式威脅任何 0 的出現(xiàn):將 -ln(0) 視為 +infinity(記住:ln(x) -> -infinity 當(dāng) x->0 時(shí)).因此總和 s = +infinity,這意味著 pi = 1 和所有其他 pj = 0.沒(méi)有這個(gè)約定,序列 (0...1...0) 永遠(yuǎn)不會(huì)被生成(非常感謝@Severin Pappadeux 的這個(gè)有趣的評(píng)論.)
- 正如@Neil Slater 在問(wèn)題所附的第四條評(píng)論中所解釋的那樣,從邏輯上講,不可能滿足原始框架的所有要求.因此,任何解決方案都必須將約束放寬到原始約束的適當(dāng)子集.@Behrooz 的其他評(píng)論似乎證實(shí)這在這種情況下就足夠了.
- To ensure that the final random sequence falls in the interval [a,b] choose the constants A and C above as A := a and C := b-a, i.e., take qi = a + pi*(b-a). Since pi is in the range (0,1) all qi will be in the range [a,b].
- One cannot take the (negative) logarithm -ln(vi) if vi happens to be 0 because ln() is not defined at 0. The probability of such an event is extremely low. However, in order to ensure that no error is signaled the generation of v1 ... vN in item 1 above must threat any occurrence of 0 in a special way: consider -ln(0) as +infinity (remember: ln(x) -> -infinity when x->0). Thus the sum s = +infinity, which means that pi = 1 and all other pj = 0. Without this convention the sequence (0...1...0) would never be generated (many thanks to @Severin Pappadeux for this interesting remark.)
- As explained in the 4th comment attached to the question by @Neil Slater it is logically impossible to fulfill all the requirements of the original framing. Therefore any solution must relax the constraints to a proper subset of the original ones. Other comments by @Behrooz seem to confirm that this would suffice in this case.
編輯 2
評(píng)論中又提出了一個(gè)問(wèn)題:
One more issue has been raised in the comments:
為什么重新調(diào)整統(tǒng)一樣本還不夠?
換句話說(shuō),我為什么要費(fèi)心取負(fù)對(duì)數(shù)?
原因是,如果我們只是重新縮放,那么結(jié)果樣本將不會(huì)均勻分布在 (0,1) 段(或 [a,b] 為最終樣本.)
The reason is that if we just rescale then the resulting sample won't distribute uniformly across the segment (0,1) (or [a,b] for the final sample.)
為了形象化,讓我們考慮 2D,即讓我們考慮 N=2 的情況.一個(gè)均勻樣本 (v1,v2) 對(duì)應(yīng)于正方形中具有原點(diǎn) (0,0) 和角點(diǎn) (1,1) 的隨機(jī)點(diǎn).現(xiàn)在,當(dāng)我們將這樣一個(gè)點(diǎn)除以總和 s=v1+v2 歸一化時(shí),我們所做的是將點(diǎn)投影到對(duì)角線上,如圖所示(請(qǐng)記住,對(duì)角線是 x + y = 1 線):
To visualize this let's think 2D, i.e., let's consider the case N=2. A uniform sample (v1,v2) corresponds to a random point in the square with origin (0,0) and corner (1,1). Now, when we normalize such a point dividing it by the sum s=v1+v2 what we are doing is projecting the point onto the diagonal as shown in the picture (keep in mind that the diagonal is the line x + y = 1):
但考慮到更靠近從 (0,0) 到 (1,1) 的主對(duì)角線的綠線比靠近 x 軸和 y 軸的橙色線長(zhǎng),投影往往會(huì)累積更多圍繞投影線(藍(lán)色)的中心,縮放樣本所在的位置.這表明簡(jiǎn)單的縮放不會(huì)在所描繪的對(duì)角線上產(chǎn)生統(tǒng)一的樣本.另一方面,可以從數(shù)學(xué)上證明負(fù)對(duì)數(shù)確實(shí)產(chǎn)生了所需的均勻性.因此,與其復(fù)制粘貼數(shù)學(xué)證明,不如邀請(qǐng)每個(gè)人實(shí)現(xiàn)這兩種算法,并檢查結(jié)果圖是否與此答案描述的一樣.
But given that green lines, which are closer to the principal diagonal from (0,0) to (1,1), are longer than orange ones, which are closer to the axes x and y, the projections tend to accumulate more around the center of the projection line (in blue), where the scaled sample lives. This shows that a simple scaling won't produce a uniform sample on the depicted diagonal. On the other hand, it can be proven mathematically that the negative logarithms do produce the desired uniformity. So, instead of copypasting a mathematical proof I would invite everyone to implement both algorithms and check that the resulting plots behave as this answer describes.
(注意:此處 是一篇關(guān)于這個(gè)有趣主題的博客文章,適用于石油和天然氣行業(yè))
(Note: here is a blog post on this interesting subject with an application to the Oil & Gas industry)
這篇關(guān)于生成一個(gè)范圍內(nèi)的N個(gè)隨機(jī)數(shù),其總和為常數(shù)的文章就介紹到這了,希望我們推薦的答案對(duì)大家有所幫助,也希望大家多多支持html5模板網(wǎng)!