問(wèn)題描述
以下測(cè)試代碼在 OSX 10.7.3 上對(duì)我來(lái)說(shuō)是段錯(cuò)誤,但在其他機(jī)器上沒(méi)有:
The following test code segfaults for me on OSX 10.7.3, but not other machines:
from __future__ import print_function
import numpy as np
import multiprocessing as mp
import scipy.linalg
def f(a):
print("about to call")
### these all cause crashes
sign, x = np.linalg.slogdet(a)
#x = np.linalg.det(a)
#x = np.linalg.inv(a).sum()
### these are all fine
#x = scipy.linalg.expm3(a).sum()
#x = np.dot(a, a.T).sum()
print("result:", x)
return x
def call_proc(a):
print("
calling with multiprocessing")
p = mp.Process(target=f, args=(a,))
p.start()
p.join()
if __name__ == '__main__':
import sys
n = int(sys.argv[1]) if len(sys.argv) > 1 else 50
a = np.random.normal(0, 2, (n, n))
f(a)
call_proc(a)
call_proc(a)
其中一個(gè)段錯(cuò)誤的示例輸出:
Example output for one of the segfaulty ones:
$ python2.7 test.py
about to call
result: -4.96797718087
calling with multiprocessing
about to call
calling with multiprocessing
about to call
OSX問(wèn)題報(bào)告"彈出,抱怨像 KERN_INVALID_ADDRESS at 0x0000000000000108
這樣的段錯(cuò)誤;這是一個(gè)完整的.
with an OSX "problem report" popping up complaining about a segfault like KERN_INVALID_ADDRESS at 0x0000000000000108
; here's a full one.
如果我用 n <= 32
運(yùn)行它,它運(yùn)行良好;對(duì)于任何 n >= 33
,它都會(huì)崩潰.
If I run it with n <= 32
, it runs fine; for any n >= 33
, it crashes.
如果我注釋掉在原始過(guò)程中完成的 f(a)
調(diào)用,那么對(duì) call_proc
的兩個(gè)調(diào)用都可以.如果我在不同的大數(shù)組上調(diào)用 f
,它仍然會(huì)出現(xiàn)段錯(cuò)誤;如果我在不同的小數(shù)組上調(diào)用它,或者如果我調(diào)用 f(large_array)
然后將 f(small_array)
傳遞給不同的進(jìn)程,它工作正常.它們實(shí)際上不需要是相同的功能.np.inv(large_array)
然后傳遞給 np.linalg.slogdet(different_large_array)
也是段錯(cuò)誤.
If I comment out the f(a)
call that's done in the original process, both calls to call_proc
are fine. It still segfaults if I call f
on a different large array; if I call it on a different small array, or if I call f(large_array)
and then pass off f(small_array)
to a different process, it works fine. They don't actually need to be the same function; np.inv(large_array)
followed by passing off to np.linalg.slogdet(different_large_array)
also segfaults.
f
中所有被注釋掉的 np.linalg
東西都會(huì)導(dǎo)致崩潰;np.dot(self.a, self.a.T).sum()
和 scipy.linalg.exp3m
工作正常.據(jù)我所知,區(qū)別在于前者使用 numpy 的 lapack_lite 而后者不使用.
All of the commented-out np.linalg
things in f
cause crashes; np.dot(self.a, self.a.T).sum()
and scipy.linalg.exp3m
work fine. As far as I can tell, the difference is that the former use numpy's lapack_lite and the latter don't.
這發(fā)生在我的桌面上
- python 2.6.7,numpy 1.5.1
- python 2.7.1、numpy 1.5.1、scipy 0.10.0
- python 3.2.2、numpy 1.6.1、scipy 0.10.1
2.6和2.7我認(rèn)為是系統(tǒng)默認(rèn)安裝的;我從源代碼壓縮包手動(dòng)安裝了 3.2 版本.所有這些 numpy 都鏈接到系統(tǒng) Accelerate 框架:
The 2.6 and 2.7 are I think the default system installs; I installed the 3.2 versions manually from the source tarballs. All of those numpys are linked to the system Accelerate framework:
$ otool -L `python3.2 -c 'from numpy.core import _dotblas; print(_dotblas.__file__)'`
/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/numpy/core/_dotblas.so:
/System/Library/Frameworks/Accelerate.framework/Versions/A/Accelerate (compatibility version 1.0.0, current version 4.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 125.2.1)
我在另一臺(tái)具有類似設(shè)置的 Mac 上得到相同的行為.
I get the same behavior on another Mac with a similar setup.
但是 f
的所有選項(xiàng)都可以在其他運(yùn)行的機(jī)器上運(yùn)行
But all of the options for f
work on other machines running
- OSX 10.6.8 與 Python 2.6.1 和 numpy 1.2.1 鏈接到 Accelerate 4 和 vecLib 268(除了它沒(méi)有 scipy 或
slogdet
) - Debian 6 與 Python 3.2.2、numpy 1.6.1 和 scipy 0.10.1 鏈接到系統(tǒng) ATLAS
- Ubuntu 11.04 與 Python 2.7.1、numpy 1.5.1 和 scipy 0.8.0 鏈接到系統(tǒng) ATLAS
我在這里做錯(cuò)了嗎?這可能是什么原因造成的?我不明白如何在一個(gè)被腌制和解封的 numpy 數(shù)組上運(yùn)行一個(gè)函數(shù)可能會(huì)導(dǎo)致它稍后在不同的進(jìn)程中出現(xiàn)段錯(cuò)誤.
Am I doing something wrong here? What could possibly be causing this? I don't see how running a function on a numpy array that's getting pickled and unpickled can possibly cause it to later segfault in a different process.
更新:當(dāng)我進(jìn)行核心轉(zhuǎn)儲(chǔ)時(shí),回溯位于 dispatch_group_async_f
內(nèi)部,即 Grand Central Dispatch 接口.大概這是 numpy/GCD 和多處理之間的交互中的一個(gè)錯(cuò)誤.我已將此報(bào)告為 一個(gè) numpy 錯(cuò)誤,但如果有人對(duì)解決方法有任何想法,或者就此而言,如何解決該錯(cuò)誤,將不勝感激.:)
Update: when I do a core dump, the backtrace is inside dispatch_group_async_f
, the Grand Central Dispatch interface. Presumably this is a bug in the interactions between numpy/GCD and multiprocessing. I`ve reported this as a numpy bug, but if anyone has any ideas about workarounds or, for that matter, how to solve the bug, it'd be greatly appreciated. :)
推薦答案
原來(lái)OSX上默認(rèn)使用的Accelerate框架只是不支持在 fork
的兩側(cè)使用 BLAS 調(diào)用.除了鏈接到不同的 BLAS 之外,沒(méi)有真正的解決方法,而且這似乎不是他們有興趣修復(fù)的問(wèn)題.
It turns out that the Accelerate framework used by default on OSX just doesn't support using BLAS calls on both sides of a fork
. No real way to deal with this other than linking to a different BLAS, and it doesn't seem like something they're interested in fixing.
這篇關(guān)于segfault 使用 numpy 的 lapack_lite 在 osx 上進(jìn)行多處理,而不是 linux的文章就介紹到這了,希望我們推薦的答案對(duì)大家有所幫助,也希望大家多多支持html5模板網(wǎng)!