問題描述
我有一個(gè)包含一些日期時(shí)間(作為字符串)和一些空值作為nan"的系列:
I have a series with some datetimes (as strings) and some nulls as 'nan':
import pandas as pd, numpy as np, datetime as dt
df = pd.DataFrame({'Date':['2014-10-20 10:44:31', '2014-10-23 09:33:46', 'nan', '2014-10-01 09:38:45']})
我正在嘗試將這些轉(zhuǎn)換為日期時(shí)間:
I'm trying to convert these to datetime:
df['Date'] = df['Date'].apply(lambda x: dt.datetime.strptime(x, '%Y-%m-%d %H:%M:%S'))
但我得到了錯(cuò)誤:
time data 'nan' does not match format '%Y-%m-%d %H:%M:%S'
所以我試著把這些變成實(shí)際的空值:
So I try to turn these into actual nulls:
df.ix[df['Date'] == 'nan', 'Date'] = np.NaN
然后重復(fù):
df['Date'] = df['Date'].apply(lambda x: dt.datetime.strptime(x, '%Y-%m-%d %H:%M:%S'))
然后我得到錯(cuò)誤:
必須是字符串,不能是浮點(diǎn)數(shù)
must be string, not float
解決這個(gè)問題的最快方法是什么?
What is the quickest way to solve this problem?
推薦答案
只要使用to_datetime
并設(shè)置 errors='coerce'
來處理 duff 數(shù)據(jù):
Just use to_datetime
and set errors='coerce'
to handle duff data:
In [321]:
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
df
Out[321]:
Date
0 2014-10-20 10:44:31
1 2014-10-23 09:33:46
2 NaT
3 2014-10-01 09:38:45
In [322]:
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 4 entries, 0 to 3
Data columns (total 1 columns):
Date 3 non-null datetime64[ns]
dtypes: datetime64[ns](1)
memory usage: 64.0 bytes
調(diào)用 strptime
的問題是如果字符串或 dtype 不正確會(huì)引發(fā)錯(cuò)誤.
the problem with calling strptime
is that it will raise an error if the string, or dtype is incorrect.
如果你這樣做了,那么它會(huì)起作用:
If you did this then it would work:
In [324]:
def func(x):
try:
return dt.datetime.strptime(x, '%Y-%m-%d %H:%M:%S')
except:
return pd.NaT
df['Date'].apply(func)
Out[324]:
0 2014-10-20 10:44:31
1 2014-10-23 09:33:46
2 NaT
3 2014-10-01 09:38:45
Name: Date, dtype: datetime64[ns]
但是使用內(nèi)置的 to_datetime
而不是調(diào)用 apply
會(huì)更快,這實(shí)際上只是循環(huán)您的系列.
but it will be faster to use the inbuilt to_datetime
rather than call apply
which essentially just loops over your series.
時(shí)間
In [326]:
%timeit pd.to_datetime(df['Date'], errors='coerce')
%timeit df['Date'].apply(func)
10000 loops, best of 3: 65.8 μs per loop
10000 loops, best of 3: 186 μs per loop
我們?cè)谶@里看到使用 to_datetime
的速度提高了 3 倍.
We see here that using to_datetime
is 3X faster.
這篇關(guān)于如何使用空值將字符串轉(zhuǎn)換為日期時(shí)間 - python,pandas?的文章就介紹到這了,希望我們推薦的答案對(duì)大家有所幫助,也希望大家多多支持html5模板網(wǎng)!