久久久久久久av_日韩在线中文_看一级毛片视频_日本精品二区_成人深夜福利视频_武道仙尊动漫在线观看

如何根據 pandas 滾動窗口中的多列查找重復項?

How to find duplicate based upon multiple columns in a rolling window in pandas?(如何根據 pandas 滾動窗口中的多列查找重復項?)
本文介紹了如何根據 pandas 滾動窗口中的多列查找重復項?的處理方法,對大家解決問題具有一定的參考價值,需要的朋友們下面隨著小編來一起學習吧!

問題描述

樣本數據

{"transaction": {"merchant": "merchantA", "amount": 20, "time": "2019-02-13T10:00:00.000Z"}}
{"transaction": {"merchant": "merchantB", "amount": 90, "time": "2019-02-13T11:00:01.000Z"}}
{"transaction": {"merchant": "merchantC", "amount": 90, "time": "2019-02-13T11:00:10.000Z"}}
{"transaction": {"merchant": "merchantD", "amount": 90, "time": "2019-02-13T11:00:20.000Z"}}
{"transaction": {"merchant": "merchantE", "amount": 90, "time": "2019-02-13T11:01:30.000Z"}}
{"transaction": {"merchant": "merchantE", "amount": 90, "time": "2019-02-13T11:02:30.000Z"}}
.
.

我有一些這樣的代碼

    df = pd.DataFrame()
for line in sys.stdin:
    data = json.loads(line)
    # df1 = pd.DataFrame(data["transaction"], index=[len(df.index)])
    df1 = pd.DataFrame(data["transaction"], index=[data['transaction']['time']])
    df1['time'] = pd.to_datetime(df1['time'])
    df = df.append(df1)
    # df['count'] = df.rolling('2min', on='time', min_periods=1)['amount'].count()

print(df)
print(len(df[df.merchant.eq(data['transaction']['merchant']) & df.amount.eq(data['transaction']['amount'])].index))

電流輸出

2019-02-13T10:00:00.000Z  merchantA      20 2019-02-13 10:00:00
2019-02-13T11:00:01.000Z  merchantB      90 2019-02-13 11:00:01
2019-02-13T11:00:10.000Z  merchantC      90 2019-02-13 11:00:10
2019-02-13T11:00:20.000Z  merchantD      90 2019-02-13 11:00:20
2019-02-13T11:01:30.000Z  merchantE      90 2019-02-13 11:01:30
2019-02-13T11:02:30.000Z  merchantE      90 2019-02-13 11:02:30

2

預期輸出

2019-02-13T10:00:00.000Z  merchantA      20 2019-02-13 10:00:00
2019-02-13T11:00:01.000Z  merchantB      90 2019-02-13 11:00:01
2019-02-13T11:00:10.000Z  merchantC      90 2019-02-13 11:00:10
2019-02-13T11:00:20.000Z  merchantD      90 2019-02-13 11:00:20
2019-02-13T11:01:30.000Z  merchantE      90 2019-02-13 11:01:30

由于數據正在流式傳輸.我想檢查重復記錄(其商家和金額值相同)是否在兩分鐘內到達,所以我將其丟棄并且不對其進行處理.將其打印為副本.

As the data is streaming. I want to check if a duplicate record(whose merchant and amount value are same) arrives withing two minutes so I discard it as and do no processing on it. print it as a duplicate.

我必須對索引壓縮或 groupby 做些什么嗎?但是然后如何等同于多列.或者兩列上有一些滾動條件,但找不到任何方法.

Do I have to do something with index zipping or groupby? but then how to equate of multiple columns. Or some rolling condition on two columns but can't find anything how to do it.

我在這里錯過了什么?

謝謝

編輯

#dup = df[df.duplicated(subset=['merchant', 'amount'], keep=False)]
     res = df.loc[(df.merchant == data['transaction']['merchant']) & (df.amount == data['transaction']['amount'])]
        # res['timediff'] = pd.to_timedelta((data['transaction']['time'] - res['time']), unit='T')
        res['timediff'] = (data['transaction']['time'] - res['time'])
        if len(res.index) >1:
           print(res)

所以我嘗試這樣的事情,如果結果小于 120 秒,我可以處理它.但生成的df目前以

so im trying something like this and if the result is less than 120 seconds i can process it. But the resulting df in currently in the form of

                      merchant  amount                time       concat          timediff
2019-02-13 11:03:00  merchantF      10 2019-02-13 11:03:00  merchantF10 -1 days +23:59:20
2019-02-13 11:02:20  merchantF      10 2019-02-13 11:02:20  merchantF10          00:00:00

2019-02-13 11:01:30  merchantE      10 2019-02-13 11:01:30  merchantE10 00:01:00
2019-02-13 11:02:00  merchantE      10 2019-02-13 11:02:00  merchantE10 00:00:30
2019-02-13 11:02:30  merchantE      10 2019-02-13 11:02:30  merchantE10 00:00:00

-1 天 +23:59:20 這種格式我覺得可以用絕對值代替?

-1 days +23:59:20 this format I think can be delt with taking Absolute value?

如何將時間轉換為可以與 120 秒比較的格式?pd.to_deltatime() 對我不起作用,或者我使用錯誤.

how can I convert the time in a format that I can compare it with 120 seconds? pd.to_deltatime() didn't work for me or maybe I'm using it wrong.

推薦答案

所以我讓它工作但不是滾動窗口,因為它不支持字符串類型.該功能也在 Pandas Repo 上報告和請求.

So i made it work but not with rolling windows as it doesn't support string type. the feature is reported and requested on Pandas Repo as well.

我的問題解決方案片段:

My solution snippet to the problem:

    if len(df.index) > 0:
        res = df.loc[(df.merchant == data['transaction']['merchant']) & (df.amount == data['transaction']['amount'])]
        res['timediff'] = (data['transaction']['time'] - res['time']).dt.total_seconds().abs() <= 120
        if res.timediff.any():
            continue
    df = df.append(df1)
print(df)

樣本數據:

{"transaction": {"merchant": "merchantA", "amount": 20, "time": "2019-02-13T10:00:00.000Z"}}
{"transaction": {"merchant": "merchantB", "amount": 90, "time": "2019-02-13T11:00:01.000Z"}}
{"transaction": {"merchant": "merchantC", "amount": 10, "time": "2019-02-13T11:00:10.000Z"}}
{"transaction": {"merchant": "merchantD", "amount": 10, "time": "2019-02-13T11:00:20.000Z"}}
{"transaction": {"merchant": "merchantE", "amount": 10, "time": "2019-02-13T11:01:30.000Z"}}
{"transaction": {"merchant": "merchantF", "amount": 10, "time": "2019-02-13T11:03:00.000Z"}}
{"transaction": {"merchant": "merchantE", "amount": 10, "time": "2019-02-13T11:02:00.000Z"}}
{"transaction": {"merchant": "merchantF", "amount": 10, "time": "2019-02-13T11:02:20.000Z"}}
{"transaction": {"merchant": "merchantE", "amount": 10, "time": "2019-02-13T11:02:30.000Z"}}
{"transaction": {"merchant": "merchantF", "amount": 10, "time": "2019-02-13T11:05:20.000Z"}}
{"transaction": {"merchant": "merchantE", "amount": 10, "time": "2019-02-13T11:00:30.000Z"}}

輸出:

                      merchant  amount                time
2019-02-13 10:00:00  merchantA      20 2019-02-13 10:00:00
2019-02-13 11:00:01  merchantB      90 2019-02-13 11:00:01
2019-02-13 11:00:10  merchantC      10 2019-02-13 11:00:10
2019-02-13 11:00:20  merchantD      10 2019-02-13 11:00:20
2019-02-13 11:01:30  merchantE      10 2019-02-13 11:01:30
2019-02-13 11:03:00  merchantF      10 2019-02-13 11:03:00
2019-02-13 11:05:20  merchantF      10 2019-02-13 11:05:20

這篇關于如何根據 pandas 滾動窗口中的多列查找重復項?的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!

【網站聲明】本站部分內容來源于互聯網,旨在幫助大家更快的解決問題,如果有圖片或者內容侵犯了您的權益,請聯系我們刪除處理,感謝您的支持!

相關文檔推薦

How to draw a rectangle around a region of interest in python(如何在python中的感興趣區域周圍繪制一個矩形)
How can I detect and track people using OpenCV?(如何使用 OpenCV 檢測和跟蹤人員?)
How to apply threshold within multiple rectangular bounding boxes in an image?(如何在圖像的多個矩形邊界框中應用閾值?)
How can I download a specific part of Coco Dataset?(如何下載 Coco Dataset 的特定部分?)
Detect image orientation angle based on text direction(根據文本方向檢測圖像方向角度)
Detect centre and angle of rectangles in an image using Opencv(使用 Opencv 檢測圖像中矩形的中心和角度)
主站蜘蛛池模板: 亚洲福利网站 | 午夜视频一区 | 毛片综合 | 久久福利电影 | 成人国产精品免费观看 | 国产特级毛片aaaaaa喷潮 | 日韩二区三区 | 国产精品久久久久久久久污网站 | 日韩精品久久久久 | 欧美精品一区二区三区一线天视频 | 日本黄色大片免费看 | 国产激情综合五月久久 | 欧美在线国产精品 | 新疆少妇videos高潮 | 犬夜叉在线观看 | 性国产xxxx乳高跟 | 亚洲一区二区三区免费在线观看 | 五月天国产视频 | 国产精品激情 | 中文字幕亚洲欧美日韩在线不卡 | 91人人澡人人爽 | 国产在线一级片 | 国产精品一区二区三区99 | 在线欧美亚洲 | 久久久久久国产 | 国产精品一区二区av | 在线免费观看视频黄 | www.亚洲国产精品 | 婷婷精品 | 午夜男人视频 | 亚洲久久 | 成人在线视频免费播放 | 亚洲一页| 午夜影院在线观看版 | 91精品一区 | 91动漫在线观看 | 国产精品视频在线观看 | 免费黄色录像视频 | 少妇黄色 | 国产成人精品一区二 | 麻豆精品国产91久久久久久 |