問題描述
我最近設置了一臺新機器,以幫助減少擬合模型和數據處理的運行時間.
I recently set up a new machine to aid in decreasing run times for fitting models and data wrangling.
我做了一些初步的基準測試,一切都很順利,但是當我嘗試在 scikit learn 中啟用多進程工作者時遇到了障礙.
I did some preliminary benchmarks and everything is mostly smoothe, but I ran into a snag when I tried enabling multi-process workers with in scikit learn.
我已將錯誤簡化為與我的原始代碼無關,因為我在不同的機器和 VM 上啟用了此功能而沒有問題.
I've simplified the error to not be associated with my original code as I enabled this feature without a problem on a different machine and a VM.
我還進行了內存分配檢查,以確保我的機器沒有用完可用的 RAM.我有 16gb 的 RAM,所以應該沒有問題,但我留下了測試的輸出,以防我錯過了一些東西.
I've also done memory allocation checks to make sure my machine wasn't running out of available RAM. I have 16gb of RAM so there should be no issue, but I've left the output of the test incase I missed something.
鑒于附近的回溯錯誤,我可以告訴我的操作系統正在殺死它,但對于我的生活,我無法弄清楚為什么.據我所知,我的代碼僅在僅使用單個 CPU 內核時才會運行.
Given the traceback error near I can tell my OS is killing this, but for the life of me I can't figure out why. Near as I can tell my code will ONLY run when it is just using a single CPU core.
我運行的是 Windows 10、AMD ryzen 7 2700x、16GB RAM
I'm running Windows 10, AMD ryzen 7 2700x, 16GB RAM
import sklearn
import numpy as np
import tracemalloc
import time
from sklearn.model_selection import cross_val_score
from numpy.random import randn
from sklearn.linear_model import Ridge
##################### memory allocation snapshot
tracemalloc.start()
start_time = time.time()
snapshot1 = tracemalloc.take_snapshot()
###################### model
X = randn(815000, 100)
y = randn(815000, 1)
mod = Ridge()
sc = cross_val_score(mod, X, y,verbose =10, n_jobs=3)
################### Second memory allocation snapshot
snapshot2 = tracemalloc.take_snapshot()
top_stats = snapshot2.compare_to(snapshot1, 'lineno')
print("[ Top 10 ]")
for stat in top_stats[:5]:
print(stat)
由此得出的預期結果非常明顯,只是擬合模型的返回分數.
The expected results from this are pretty obvious, just a returned score with the fit model.
[Parallel(n_jobs=3)]: Using backend LokyBackend with 3 concurrent workers.
[Parallel(n_jobs=3)]: Done 3 out of 3 | elapsed: 0.2s remaining: 0.0s
---------------------------------------------------------------------------
TerminatedWorkerError Traceback (most recent call last)
<ipython-input-18-b2bdfd425f82> in <module>
16 y = randn(815000, 1)
17 mod = Ridge()
---> 18 sc = cross_val_score(mod, X, y,verbose =10, n_jobs=3)
..........
TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated.
This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker.
內存輸出
[ Top 5 ]
<ipython-input-18-b2bdfd425f82>:15: size=622 MiB (+622 MiB), count=3 (+3), average=207 MiB
<ipython-input-18-b2bdfd425f82>:16: size=6367 KiB (+6367 KiB), count=3 (+3), average=2122 KiB
~python37libinspect.py:732: size=37.2 KiB (+26.2 KiB), count=596 (+419), average=64 B
~python37libsite-packagessklearnexternalsjoblib
umpy_pickle.py:292: size=7072 B (+3808 B), count=13 (+7), average=544 B
~python37libpickle.py:549: size=5728 B (+3408 B), count=14 (+8), average=409 B
推薦答案
我發現我的 scipy 模塊與我的 windows 10 C++ 可再發行版本不兼容.
I figured out the my scipy module was incompatible with my windows 10 C++ redistributable version.
我所做的只是下載最新的 Visual Studio 并安裝單個組件"部分中列出的 C++ 可再發行更新.
All i did was download the latest visual studio and installed the C++ redistributable update that is listed in the "individual components" section.
安裝后,我重新啟動計算機并運行.
Once I installed that I restarted my computer and ran.
import scipy
scipy.test()
一旦它實際運行,我嘗試了上面的代碼塊并修復了它.
Once that was actually running I attempted my code block above and it fixed.
我認為這歸結為使用全新版本的 python 和 scipy 安裝舊版本的 windows 10
I think what this boils down to is installing an old build of windows 10 with a brand new version of python and scipy
這需要很長時間來解決和調試.希望對您有所幫助.
This took a LONG time to solve and debug. Hopefully it helps.
這篇關于如何修復/調試 scikit learn 中引發的這個多進程終止工作錯誤的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!