問題描述
如何在 Python 中判斷文件是否為二進制(非文本)文件?
How can I tell if a file is binary (non-text) in Python?
我在 Python 中搜索大量文件,并不斷在二進制文件中找到匹配項.這使得輸出看起來非常混亂.
I am searching through a large set of files in Python, and keep getting matches in binary files. This makes the output look incredibly messy.
我知道我可以使用 grep -I
,但我對數據所做的工作超出了 grep 所允許的范圍.
I know I could use grep -I
, but I am doing more with the data than what grep allows for.
在過去,我只會搜索大于 0x7f
的字符,但 utf8
等在現代系統上是不可能的.理想情況下,解決方案會很快.
In the past, I would have just searched for characters greater than 0x7f
, but utf8
and the like, make that impossible on modern systems. Ideally, the solution would be fast.
推薦答案
你也可以使用 mimetypes 模塊:
You can also use the mimetypes module:
import mimetypes
...
mime = mimetypes.guess_type(file)
編譯二進制mime 類型列表相當容易.例如,Apache 分發了一個 mime.types 文件,您可以將其解析為一組列表、二進制和文本,然后檢查 mime 是否在您的文本或二進制列表中.
It's fairly easy to compile a list of binary mime types. For example Apache distributes with a mime.types file that you could parse into a set of lists, binary and text and then check to see if the mime is in your text or binary list.
這篇關于如何在 Python 中檢測文件是否為二進制(非文本)文件?的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!