Aggregate a Pandas Dataframe by week and month


Aggregate a Pandas Dataframe by week and month



The below Dataframe has information about launching of a program with only one column of dates:


indate
2016-12-19 12:16:00
2016-12-19 12:21:00
2016-12-20 12:32:00
2016-12-20 12:34:00
2016-12-20 12:40:00
2016-12-21 13:47:01
2016-12-21 14:27:01
2016-12-21 14:43:00
2016-12-21 15:02:00
2016-12-22 15:16:00
2016-12-22 15:22:00
2016-12-22 15:25:00
2016-12-22 15:22:00
2016-12-22 15:25:00
........



I'd like to aggregate to get number of launchings per day:


indate number of launchings
2016-12-19 2
2016-12-20 3
2016-12-21 4
2016-12-22 5
...



And then also get the week of the launch, the day of launch and the no. of launchings:


week day number of launchings
2016-12-19 - 2016-12-25 Mo 2
2016-12-19 - 2016-12-25 Tu 3
2016-12-19 - 2016-12-25 We 4
2016-12-19 - 2016-12-25 Th 5
2016-12-19 - 2016-12-25 Fr n
2016-12-19 - 2016-12-25 Su n
2016-12-19 - 2016-12-25 Sa n
2016-12-26 - 2017-01-01 Mo n
2016-12-26 - 2017-01-01 Tu n
2016-12-26 - 2017-01-01 We n
....



I didn't find any special methods in Pandas to do this.




1 Answer
1



Use resample by days with aggregate size first, then extract names of days by strftime and last for weeks use resample per weeks with transform first and last values:


resample


size


strftime


transform


first


last


df1 = df.resample('d', on='indate').size().reset_index(name='number of launchings')

df1['day'] = df1['indate'].dt.strftime('%a')
g = df1.resample('W', on='indate')['indate']
df1['week'] = g.transform('first').dt.strftime('%Y-%m-%d') + ' - ' +
g.transform('last').dt.strftime('%Y-%m-%d')



Another solution is use Grouper:


Grouper


df1 = (df.groupby(pd.Grouper(freq='d', key='indate'))
.size()
.reset_index(name='number of launchings'))

df1['day'] = df1['indate'].dt.strftime('%a')
g = df1.groupby(pd.Grouper(freq='W', key='indate'))['indate']
df1['week'] = (g.transform('first').dt.strftime('%Y-%m-%d') + ' - ' +
g.transform('last').dt.strftime('%Y-%m-%d'))
print (df1)


indate number of launchings day week
0 2016-12-19 2 Mon 2016-12-19 - 2016-12-25
1 2016-12-20 3 Tue 2016-12-19 - 2016-12-25
2 2016-12-21 4 Wed 2016-12-19 - 2016-12-25
3 2016-12-22 5 Thu 2016-12-19 - 2016-12-25
4 2016-12-23 1 Fri 2016-12-19 - 2016-12-25
5 2016-12-24 1 Sat 2016-12-19 - 2016-12-25
6 2016-12-25 1 Sun 2016-12-19 - 2016-12-25
7 2016-12-26 1 Mon 2016-12-26 - 2017-01-01
8 2016-12-27 1 Tue 2016-12-26 - 2017-01-01
9 2016-12-28 1 Wed 2016-12-26 - 2017-01-01
10 2016-12-29 1 Thu 2016-12-26 - 2017-01-01
11 2016-12-30 1 Fri 2016-12-26 - 2017-01-01
12 2016-12-31 1 Sat 2016-12-26 - 2017-01-01
13 2017-01-01 1 Sun 2016-12-26 - 2017-01-01



Sample data:


print (df)
indate
0 2016-12-19 12:16:00
1 2016-12-19 12:21:00
2 2016-12-20 12:32:00
3 2016-12-20 12:34:00
4 2016-12-20 12:40:00
5 2016-12-21 13:47:01
6 2016-12-21 14:27:01
7 2016-12-21 14:43:00
8 2016-12-21 15:02:00
9 2016-12-22 15:16:00
10 2016-12-22 15:22:00
11 2016-12-22 15:25:00
12 2016-12-22 15:22:00
13 2016-12-22 15:25:00
14 2016-12-23 12:16:00
15 2016-12-24 12:21:00
16 2016-12-25 12:32:00
17 2016-12-26 12:34:00
18 2016-12-27 12:40:00
19 2016-12-28 13:47:01
20 2016-12-29 14:27:01
21 2016-12-30 14:43:00
22 2016-12-31 15:02:00
23 2017-01-01 15:16:00





Thank you, kind man
– Сергей Шумилин
Jun 29 at 11:28





@СергейШумилин - You are welcome! Nice weekend!
– jezrael
Jun 29 at 11:29






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Comments

Popular posts from this blog

paramiko-expect timeout is happening after executing the command

Export result set on Dbeaver to CSV

Opening a url is failing in Swift