問題描述
我正在嘗試實現這個多處理 教程 用于我自己的目的.起初我認為它不能很好地擴展,但是當我做了一個可重現的例子時,我發現如果項目列表超過 124,它似乎永遠不會返回答案.在 x = 124
處,它在 0.4 秒內運行,但是當我將其設置為 x = 125
時,它永遠不會完成.我在 Windows 7 上運行 Python 2.7.
I am trying to implement this multiprocessing tutorial for my own purposes. At first I thought it did not scale well, but when I made a reproducible example I found that if the list of items goes above 124, it seems to never return an answer. At x = 124
it runs in .4 seconds, but when I set it to x = 125
it never finishes. I am running Python 2.7 on Windows 7.
from multiprocessing import Lock, Process, Queue, current_process
import time
class Testclass(object):
def __init__(self, x):
self.x = x
def toyfunction(testclass):
testclass.product = testclass.x * testclass.x
return testclass
def worker(work_queue, done_queue):
try:
for testclass in iter(work_queue.get, 'STOP'):
print(testclass.counter)
newtestclass = toyfunction(testclass)
done_queue.put(newtestclass)
except:
print('error')
return True
def main(x):
counter = 1
database = []
while counter <= x:
database.append(Testclass(10))
counter += 1
print(counter)
workers = 8
work_queue = Queue()
done_queue = Queue()
processes = []
start = time.clock()
counter = 1
for testclass in database:
testclass.counter = counter
work_queue.put(testclass)
counter += 1
print(counter)
print('items loaded')
for w in range(workers):
p = Process(target=worker, args=(work_queue, done_queue))
p.start()
processes.append(p)
work_queue.put('STOP')
for p in processes:
p.join()
done_queue.put('STOP')
newdatabase = []
for testclass in iter(done_queue.get, 'STOP'):
newdatabase.append(testclass)
print(time.clock()-start)
print("Done")
return(newdatabase)
if __name__ == '__main__':
database = main(124)
database2 = main(125)
推薦答案
好的!來自文檔:
警告如上所述,如果子進程已將項目放入隊列中(并且它沒有使用 JoinableQueue.cancel_join_thread),那么該進程將不會終止,直到所有緩沖的項目已被刷新到管道.這意味著如果您嘗試加入該進程,除非您確定,否則您可能會遇到死鎖已放入隊列的所有項目都已被消耗.同樣,如果子進程是非守護進程,然后父進程可能會在嘗試退出時掛起加入其所有非惡魔的孩子.請注意,使用管理器創建的隊列確實沒有這個問題.請參閱編程指南.
Warning As mentioned above, if a child process has put items on a queue (and it has not used JoinableQueue.cancel_join_thread), then that process will not terminate until all buffered items have been flushed to the pipe. This means that if you try joining that process you may get a deadlock unless you are sure that all items which have been put on the queue have been consumed. Similarly, if the child process is non-daemonic then the parent process may hang on exit when it tries to join all its non-daemonic children. Note that a queue created using a manager does not have this issue. See Programming guidelines.
正如我在前面的評論中指出的,代碼嘗試 .join()
處理 before done_queue
隊列耗盡 - 并且在以一種時髦的方式更改代碼以確保在 .join()
之前耗盡了 done_queue
之后,代碼對一百萬個項目運行良好.
As I noted in a comment earlier, the code attempts to .join()
processes before the done_queue
Queue is drained - and that after changing the code in a funky way to be sure done_queue
was drained before .join()
'ing, the code worked fine for a million items.
所以這是一個飛行員錯誤的例子,雖然很模糊.至于為什么行為取決于傳遞給 main(x)
的數字,這是不可預測的:它取決于內部緩沖是如何完成的.真有趣;-)
So this is a case of pilot error, although quite obscure. As to why behavior depends on the number passed to main(x)
, it's unpredictable: it depends on how buffering is done internally. Such fun ;-)
這篇關于Python 多處理 >= 125 列表永遠不會完成的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!