問題描述
我正在讀取二進制文件(在本例中為 jpg),并且需要在該文件中找到一些值.對于那些感興趣的人,二進制文件是一個 jpg,我試圖通過尋找二進制結構來挑選它的尺寸 詳細在這里.
I'm reading in a binary file (a jpg in this case), and need to find some values in that file. For those interested, the binary file is a jpg and I'm attempting to pick out its dimensions by looking for the binary structure as detailed here.
我需要在二進制數據中找到 FFC0,向前跳過一些字節,然后讀取 4 個字節(這應該給我圖像尺寸).
I need to find FFC0 in the binary data, skip ahead some number of bytes, and then read 4 bytes (this should give me the image dimensions).
在二進制數據中搜索值的好方法是什么?是否有相當于find"或類似 re 的東西?
What's a good way of searching for the value in the binary data? Is there an equivalent of 'find', or something like re?
推薦答案
您實際上可以將文件加載到一個字符串中,并使用 str.find 在該字符串中搜索字節序列
方法.它適用于任何字節序列.0xffc0
()
You could actually load the file into a string and search that string for the byte sequence 0xffc0
using the str.find()
method. It works for any byte sequence.
執行此操作的代碼取決于幾件事.如果您以二進制模式打開文件并且使用的是 Python 3(這兩種方法都可能是這種情況下的最佳實踐),您需要搜索字節字符串(而不是字符串),這意味著您必須在字符串前面加上 b
.
The code to do this depends on a couple things. If you open the file in binary mode and you're using Python 3 (both of which are probably best practice for this scenario), you'll need to search for a byte string (as opposed to a character string), which means you have to prefix the string with b
.
with open(filename, 'rb') as f:
s = f.read()
s.find(b'xffxc0')
如果您在 Python 3 中以文本模式打開文件,則必須搜索字符串:
If you open the file in text mode in Python 3, you'd have to search for a character string:
with open(filename, 'r') as f:
s = f.read()
s.find('xffxc0')
雖然沒有特別的理由這樣做.與以前的方式相比,它不會給您帶來任何優勢,并且如果您使用的平臺處理二進制文件和文本文件的方式不同(例如 Windows),這可能會導致問題.
though there's no particular reason to do this. It doesn't get you any advantage over the previous way, and if you're on a platform that treats binary files and text files differently (e.g. Windows), there is a chance this will cause problems.
Python 2 沒有區分字節串和字符串,所以如果你使用那個版本,在 中包含還是排除
.而且,如果您的平臺對二進制文件和文本文件的處理方式相同(例如 Mac 或 Linux),則無論您使用 b
都沒有關系b'xffxc0''r'
還是 'rb'
作為文件都沒有關系模式.但我仍然建議使用類似于上面第一個代碼示例的東西,只是為了向前兼容 - 如果您確實切換到 Python 3,那么修復它就少了一件事情.
Python 2 doesn't make the distinction between byte strings and character strings, so if you're using that version, it doesn't matter whether you include or exclude the b
in b'xffxc0'
. And if your platform treats binary files and text files identically (e.g. Mac or Linux), it doesn't matter whether you use 'r'
or 'rb'
as the file mode either. But I'd still recommend using something like the first code sample above just for forward compatibility - in case you ever do switch to Python 3, it's one less thing to fix.
這篇關于在 Python 中搜索/讀取二進制數據的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!