久久久久久久av_日韩在线中文_看一级毛片视频_日本精品二区_成人深夜福利视频_武道仙尊动漫在线观看

如何通過幾列中的唯一索引對(duì) pandas 求和?

How to sum in pandas by unique index in several columns?(如何通過幾列中的唯一索引對(duì) pandas 求和?)
本文介紹了如何通過幾列中的唯一索引對(duì) pandas 求和?的處理方法,對(duì)大家解決問題具有一定的參考價(jià)值,需要的朋友們下面隨著小編來一起學(xué)習(xí)吧!

問題描述

我有一個(gè) pandas DataFrame,它詳細(xì)說明了用戶會(huì)話期間的點(diǎn)擊"方面的在線活動(dòng).有多達(dá) 50,000 個(gè)獨(dú)立用戶,數(shù)據(jù)框有大約 150 萬個(gè)樣本.顯然大多數(shù)用戶都有多條記錄.

I have a pandas DataFrame which details online activities in terms of "clicks" during an user session. There are as many as 50,000 unique users, and the dataframe has around 1.5 million samples. Obviously most users have multiple records.

四列是唯一的用戶id,用戶開始服務(wù)Registration"的日期,用戶使用服務(wù)Session"的日期,總點(diǎn)擊次數(shù).

The four columns are a unique user id, the date when the user began the service "Registration", the date the user used the service "Session", the total number of clicks.

dataframe的組織結(jié)構(gòu)如下:

The organization of the dataframe is as follows:

User_ID    Registration  Session      clicks
2349876    2012-02-22    2014-04-24   2 
1987293    2011-02-01    2013-05-03   1 
2234214    2012-07-22    2014-01-22   7 
9874452    2010-12-22    2014-08-22   2 
...

(上面還有一個(gè)以0開頭的索引,但可以將User_ID設(shè)置為索引.)

(There is also an index above beginning with 0, but one could set User_ID as the index.)

我想?yún)R總用戶自注冊(cè)日期以來的總點(diǎn)擊次數(shù).數(shù)據(jù)框(或 pandas Series 對(duì)象)將列出 User_ID 和Total_Number_Clicks".

I would like to aggregate the total number of clicks by the user since Registration date. The dataframe (or pandas Series object) would list User_ID and "Total_Number_Clicks".

User_ID    Total_Clicks
2349876    722 
1987293    341
2234214    220 
9874452    1405 
...

如何在 pandas 中做到這一點(diǎn)?這是由 .agg() 完成的嗎?每個(gè) User_ID 都需要單獨(dú)求和.

How does one do this in pandas? Is this done by .agg()? Each User_ID needs to be summed individually.

由于有 150 萬條記錄,這是否可以擴(kuò)展?

As there are 1.5 million records, does this scale?

推薦答案

IIUC你可以使用groupby, sumreset_index:

IIUC you can use groupby, sum and reset_index:

print df
   User_ID Registration    Session  clicks
0  2349876   2012-02-22 2014-04-24       2
1  1987293   2011-02-01 2013-05-03       1
2  2234214   2012-07-22 2014-01-22       7
3  9874452   2010-12-22 2014-08-22       2

print df.groupby('User_ID')['clicks'].sum().reset_index()
   User_ID  clicks
0  1987293       1
1  2234214       7
2  2349876       2
3  9874452       2

如果第一列User_IDindex:

print df
        Registration    Session  clicks
User_ID                                
2349876   2012-02-22 2014-04-24       2
1987293   2011-02-01 2013-05-03       1
2234214   2012-07-22 2014-01-22       7
9874452   2010-12-22 2014-08-22       2

print df.groupby(level=0)['clicks'].sum().reset_index()
   User_ID  clicks
0  1987293       1
1  2234214       7
2  2349876       2
3  9874452       2

或者:

print df.groupby(df.index)['clicks'].sum().reset_index()
   User_ID  clicks
0  1987293       1
1  2234214       7
2  2349876       2
3  9874452       2

正如 Alexander 所指出的,您需要在 groupby 之前過濾數(shù)據(jù),如果 Session 日期少于每個(gè) User_IDRegistration 日期:

As Alexander pointed, you need filter data before groupby, if Session dates is less as Registration dates per User_ID:

print df
   User_ID Registration    Session  clicks
0  2349876   2012-02-22 2014-04-24       2
1  1987293   2011-02-01 2013-05-03       1
2  2234214   2012-07-22 2014-01-22       7
3  9874452   2010-12-22 2014-08-22       2

print df[df.Session >= df.Registration].groupby('User_ID')['clicks'].sum().reset_index()
   User_ID  clicks
0  1987293       1
1  2234214       7
2  2349876       2
3  9874452       2

我更改了 3. 行數(shù)據(jù)以獲得更好的樣本:

I change 3. row of data for better sample:

print df
        Registration    Session  clicks
User_ID                                
2349876   2012-02-22 2014-04-24       2
1987293   2011-02-01 2013-05-03       1
2234214   2012-07-22 2012-01-22       7
9874452   2010-12-22 2014-08-22       2

print df.Session >= df.Registration
User_ID
2349876     True
1987293     True
2234214    False
9874452     True
dtype: bool

print df[df.Session >= df.Registration]
        Registration    Session  clicks
User_ID                                
2349876   2012-02-22 2014-04-24       2
1987293   2011-02-01 2013-05-03       1
9874452   2010-12-22 2014-08-22       2

df1 = df[df.Session >= df.Registration]
print df1.groupby(df1.index)['clicks'].sum().reset_index()
   User_ID  clicks
0  1987293       1
1  2349876       2
2  9874452       2

這篇關(guān)于如何通過幾列中的唯一索引對(duì) pandas 求和?的文章就介紹到這了,希望我們推薦的答案對(duì)大家有所幫助,也希望大家多多支持html5模板網(wǎng)!

【網(wǎng)站聲明】本站部分內(nèi)容來源于互聯(lián)網(wǎng),旨在幫助大家更快的解決問題,如果有圖片或者內(nèi)容侵犯了您的權(quán)益,請(qǐng)聯(lián)系我們刪除處理,感謝您的支持!

相關(guān)文檔推薦

How to draw a rectangle around a region of interest in python(如何在python中的感興趣區(qū)域周圍繪制一個(gè)矩形)
How can I detect and track people using OpenCV?(如何使用 OpenCV 檢測(cè)和跟蹤人員?)
How to apply threshold within multiple rectangular bounding boxes in an image?(如何在圖像的多個(gè)矩形邊界框中應(yīng)用閾值?)
How can I download a specific part of Coco Dataset?(如何下載 Coco Dataset 的特定部分?)
Detect image orientation angle based on text direction(根據(jù)文本方向檢測(cè)圖像方向角度)
Detect centre and angle of rectangles in an image using Opencv(使用 Opencv 檢測(cè)圖像中矩形的中心和角度)
主站蜘蛛池模板: 色婷婷av一区二区三区软件 | 免费播放一级片 | 日韩 欧美 综合 | 国产一区二区小视频 | 亚洲黄色视屏 | 国产激情免费视频 | 一区二区在线看 | 欧美成人一区二区 | 婷婷久久五月 | 91香蕉视频在线观看 | 欧美日韩在线观看一区 | 精品久久久久香蕉网 | 国产精品日韩在线观看 | 国产精品久久久久999 | av一区二区三区四区 | 国产欧美一区二区三区久久手机版 | 日本特黄特色aaa大片免费 | 亚洲国产欧美精品 | 国产大片一区 | 国产欧美日韩一区 | 日日干夜夜操天天操 | 久国产 | 日韩电影在线 | 久久久婷婷| 美女天天干天天操 | 91免费在线播放 | 日韩欧美亚洲 | 久久精品国产亚洲一区二区三区 | 欧美一区二区三区在线免费观看 | 精品不卡 | 日日综合| 人人擦人人 | 成人特级毛片 | 高清视频一区二区三区 | 91麻豆精品一区二区三区 | 亚洲欧美激情精品一区二区 | 国产精品久久a | av在线一区二区三区 | 粉嫩国产精品一区二区在线观看 | 精品亚洲一区二区三区四区五区高 | 一级毛片视频 |