問題描述
我正在嘗試在我的代碼中使用 multiprocessing
以獲得更好的性能.
I am tring to use multiprocessing
in my code for better performance.
但是,我收到如下錯(cuò)誤:
However, I got an error as follows:
Traceback (most recent call last):
File "D:EpubBuilderTinyEpub.py", line 49, in <module>
e.epub2txt()
File "D:EpubBuilderTinyEpub.py", line 43, in epub2txt
tempread = self.get_text()
File "D:EpubBuilderTinyEpub.py", line 29, in get_text
txtlist = pool.map(self.char2text,charlist)
File "C:Python34libmultiprocessingpool.py", line 260, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "C:Python34libmultiprocessingpool.py", line 599, in get
raise self._value
File "C:Python34libmultiprocessingpool.py", line 383, in _handle_tasks
put(task)
File "C:Python34libmultiprocessingconnection.py", line 206, in send
self._send_bytes(ForkingPickler.dumps(obj))
File "C:Python34libmultiprocessing
eduction.py", line 50, in dumps
cls(buf, protocol).dump(obj)
TypeError: cannot serialize '_io.BufferedReader' object
我嘗試了另一種方法并得到了這個(gè)錯(cuò)誤:
I have tried it an other way and got this error:
TypeError: cannot serialize '_io.TextIOWrapper' object
我的代碼如下所示:
from multiprocessing import Pool
class Book(object):
def __init__(self, arg):
self.namelist = arg
def format_char(self,char):
char = char + "a"
return char
def format_book(self):
self.tempread = ""
charlist = [f.read() for f in self.namelist] #list of char
with Pool() as pool:
txtlist = pool.map(self.format_char,charlist)
self.tempread = "".join(txtlist)
return self.tempread
if __name__ == '__main__':
import os
b = Book([open(f) for f in os.listdir()])
t = b.format_book()
print(t)
我認(rèn)為這個(gè)錯(cuò)誤是因?yàn)闆]有在main函數(shù)中使用Pool
引起的.
I think that the error is raised because of not using the Pool
in the main function.
我的猜想對(duì)嗎?以及如何修改我的代碼來修復(fù)錯(cuò)誤?
Is my conjecture right? And how can I modify my code to fix the error?
推薦答案
問題是你在 Book
實(shí)例中有一個(gè)不可選擇的實(shí)例變量 (namelist
).因?yàn)槟趯?shí)例方法上調(diào)用 pool.map
,并且您在 Windows 上運(yùn)行,所以整個(gè)實(shí)例需要是可挑選的,以便將其傳遞給子進(jìn)程.Book.namelist
是一個(gè)打開的文件對(duì)象(_io.BufferedReader
),不能被pickle.您可以通過多種方式解決此問題.根據(jù)示例代碼,您可以將 format_char
設(shè)為頂級(jí)函數(shù):
The issue is that you've got an unpicklable instance variable (namelist
) in the Book
instance. Because you're calling pool.map
on an instance method, and you're running on Windows, the entire instance needs to be picklable in order for it to be passed to the child process. Book.namelist
is a open file object (_io.BufferedReader
), which can't be pickled. You can fix this a couple of ways. Based on the example code, it looks like you could just make format_char
a top-level function:
def format_char(char):
char = char + "a"
return char
class Book(object):
def __init__(self, arg):
self.namelist = arg
def format_book(self):
self.tempread = ""
charlist = [f.read() for f in self.namelist] #list of char
with Pool() as pool:
txtlist = pool.map(format_char,charlist)
self.tempread = "".join(txtlist)
return self.tempread
但是,如果實(shí)際上,您需要 format_char
作為實(shí)例方法,則可以使用 __getstate__
/__setstate__
通過刪除 使
在腌制之前從實(shí)例中獲取參數(shù):Book
可挑選namelist
However, if in reality, you need format_char
to be an instance method, you can use __getstate__
/__setstate__
to make Book
picklable, by removing the namelist
argument from the instance before pickling it:
class Book(object):
def __init__(self, arg):
self.namelist = arg
def __getstate__(self):
""" This is called before pickling. """
state = self.__dict__.copy()
del state['namelist']
return state
def __setstate__(self, state):
""" This is called while unpickling. """
self.__dict__.update(state)
def format_char(self,char):
char = char + "a"
def format_book(self):
self.tempread = ""
charlist = [f.read() for f in self.namelist] #list of char
with Pool() as pool:
txtlist = pool.map(self.format_char,charlist)
self.tempread = "".join(txtlist)
return self.tempread
只要你不需要在子進(jìn)程中訪問namelist
就可以了.
This would be ok as long as you don't need to access namelist
in the child process.
這篇關(guān)于我可以在類的方法中使用 multiprocessing.Pool 嗎?的文章就介紹到這了,希望我們推薦的答案對(duì)大家有所幫助,也希望大家多多支持html5模板網(wǎng)!