問題描述
我很難理解如何使用 Python 的多處理模塊.
I am having difficulty understanding how to use Python's multiprocessing module.
我有一個從 1
到 n
的總和,其中 n=10^10
,太大而無法放入列表中,這似乎是許多使用多處理的在線示例的主旨.
I have a sum from 1
to n
where n=10^10
, which is too large to fit into a list, which seems to be the thrust of many examples online using multiprocessing.
有沒有辦法將范圍拆分"成一定大小的段,然后對每個段進行求和?
Is there a way to "split up" the range into segments of a certain size and then perform the sum for each segment?
例如
def sum_nums(low,high):
result = 0
for i in range(low,high+1):
result += i
return result
我想通過將 sum_nums(1,10**10)
分解為許多 sum_nums(1,1000) + sum_nums(1001,2000) + sum_nums(2001) 來計算,3000)...
等等.我知道有一個接近形式的 n(n+1)/2
但假裝我們不知道.
And I want to compute sum_nums(1,10**10)
by breaking it up into many sum_nums(1,1000) + sum_nums(1001,2000) + sum_nums(2001,3000)...
and so on. I know there is a close-form n(n+1)/2
but pretend we don't know that.
這是我嘗試過的
import multiprocessing
def sum_nums(low,high):
result = 0
for i in range(low,high+1):
result += i
return result
if __name__ == "__main__":
n = 1000
procs = 2
sizeSegment = n/procs
jobs = []
for i in range(0, procs):
process = multiprocessing.Process(target=sum_nums, args=(i*sizeSegment+1, (i+1)*sizeSegment))
jobs.append(process)
for j in jobs:
j.start()
for j in jobs:
j.join()
#where is the result?
推薦答案
首先,解決內存問題的最佳方法是使用迭代器/生成器而不是列表:
First, the best way to get around the memory issue is to use an iterator/generator instead of a list:
def sum_nums(low, high):
result = 0
for i in xrange(low, high+1):
result += 1
return result
在python3中,range()產生一個迭代器,所以這只在python2中需要
現在,當您希望將處理拆分到不同的進程或 CPU 內核時,多處理就派上用場了.如果您不需要控制單個工作人員,那么最簡單的方法是使用進程池.這將允許您將函數映射到池并獲取輸出.您也可以使用 apply_async
一次將作業應用到池中,并獲得延遲結果,您可以使用 .get()
:
Now, where multiprocessing comes in is when you want to split up the processing to different processes or CPU cores. If you don't need to control the individual workers than the easiest method is to use a process pool. This will let you map a function to the pool and get the output. You can alternatively use apply_async
to apply jobs to the pool one at a time and get a delayed result which you can get with .get()
:
import multiprocessing
from multiprocessing import Pool
from time import time
def sum_nums(low, high):
result = 0
for i in xrange(low, high+1):
result += i
return result
# map requires a function to handle a single argument
def sn((low,high)):
return sum_nums(low, high)
if __name__ == '__main__':
#t = time()
# takes forever
#print sum_nums(1,10**10)
#print '{} s'.format(time() -t)
p = Pool(4)
n = int(1e8)
r = range(0,10**10+1,n)
results = []
# using apply_async
t = time()
for arg in zip([x+1 for x in r],r[1:]):
results.append(p.apply_async(sum_nums, arg))
# wait for results
print sum(res.get() for res in results)
print '{} s'.format(time() -t)
# using process pool
t = time()
print sum(p.map(sn, zip([x+1 for x in r], r[1:])))
print '{} s'.format(time() -t)
在我的機器上,僅使用 10**10 調用 sum_nums
需要將近 9 分鐘,但使用 Pool(8)
和 n=int(1e8)
將這個時間縮短到一分鐘多一點.
On my machine, just calling sum_nums
with 10**10 takes almost 9 minutes, but using a Pool(8)
and n=int(1e8)
reduces this to just over a minute.
這篇關于如何在 Python 中使用多處理并行求和循環的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!