問題描述
我需要將存儲在 pandas.DataFrame
中的數據轉換為字節字符串,其中每列可以具有單獨的數據類型(整數或浮點數).這是一組簡單的數據:
I need convert the data stored in a pandas.DataFrame
into a byte string where each column can have a separate data type (integer or floating point). Here is a simple set of data:
df = pd.DataFrame([ 10, 15, 20], dtype='u1', columns=['a'])
df['b'] = np.array([np.iinfo('u8').max, 230498234019, 32094812309], dtype='u8')
df['c'] = np.array([1.324e10, 3.14159, 234.1341], dtype='f8')
df 看起來像這樣:
a b c
0 10 18446744073709551615 1.324000e+10
1 15 230498234019 3.141590e+00
2 20 32094812309 2.341341e+02
DataFrame
知道每一列 df.dtypes
的類型,所以我想做這樣的事情:
The DataFrame
knows about the types of each column df.dtypes
so I'd like to do something like this:
data_to_pack = [tuple(record) for _, record in df.iterrows()]
data_array = np.array(data_to_pack, dtype=zip(df.columns, df.dtypes))
data_bytes = data_array.tostring()
這通常可以正常工作,但在這種情況下(由于 df['b'][0]
中存儲的最大值.上面的第二行將元組數組轉換為 具有給定類型集的 np.array
會導致以下錯誤:
This typically works fine but in this case (due to the maximum value stored in df['b'][0]
. The second line above converting the array of tuples to an np.array
with a given set of types causes the following error:
OverflowError: Python int too large to convert to C long
第一行中的錯誤結果(我相信)將記錄提取為具有單一數據類型(默認為 float64
)的 Series
和在float64
的最大 uint64
值不能直接轉換回 uint64
.
The error results (I believe) in the first line which extracts the record as a Series
with a single data type (defaults to float64
) and the representation chosen in float64
for the maximum uint64
value is not directly convertible back to uint64
.
1) 由于 DataFrame
已經知道每一列的類型,因此有辦法繞過創建一行元組以輸入到類型化的 numpy.array
構造函數中?或者有沒有比上面概述的更好的方法來保存這種轉換中的類型信息?
1) Since the DataFrame
already knows the types of each column is there a way to get around creating a row of tuples for input into the typed numpy.array
constructor? Or is there a better way than outlined above to preserve the type information in such a conversion?
2) 有沒有辦法直接從 DataFrame
到使用每列的類型信息表示數據的字節字符串.
2) Is there a way to go directly from DataFrame
to a byte string representing the data using the type information for each column.
推薦答案
可以使用df.to_records()
將您的數據幀轉換為 numpy recarray,然后調用 .tostring()
到將其轉換為字節串:
You can use df.to_records()
to convert your dataframe to a numpy recarray, then call .tostring()
to convert this to a string of bytes:
rec = df.to_records(index=False)
print(repr(rec))
# rec.array([(10, 18446744073709551615, 13240000000.0), (15, 230498234019, 3.14159),
# (20, 32094812309, 234.1341)],
# dtype=[('a', '|u1'), ('b', '<u8'), ('c', '<f8')])
s = rec.tostring()
rec2 = np.fromstring(s, rec.dtype)
print(np.all(rec2 == rec))
# True
這篇關于將 pandas.DataFrame 轉換為字節的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!