Aggregate a Pandas Dataframe by week and month
Aggregate a Pandas Dataframe by week and month
The below Dataframe has information about launching of a program with only one column of dates:
indate
2016-12-19 12:16:00
2016-12-19 12:21:00
2016-12-20 12:32:00
2016-12-20 12:34:00
2016-12-20 12:40:00
2016-12-21 13:47:01
2016-12-21 14:27:01
2016-12-21 14:43:00
2016-12-21 15:02:00
2016-12-22 15:16:00
2016-12-22 15:22:00
2016-12-22 15:25:00
2016-12-22 15:22:00
2016-12-22 15:25:00
........
I'd like to aggregate to get number of launchings per day:
indate number of launchings
2016-12-19 2
2016-12-20 3
2016-12-21 4
2016-12-22 5
...
And then also get the week of the launch, the day of launch and the no. of launchings:
week day number of launchings
2016-12-19 - 2016-12-25 Mo 2
2016-12-19 - 2016-12-25 Tu 3
2016-12-19 - 2016-12-25 We 4
2016-12-19 - 2016-12-25 Th 5
2016-12-19 - 2016-12-25 Fr n
2016-12-19 - 2016-12-25 Su n
2016-12-19 - 2016-12-25 Sa n
2016-12-26 - 2017-01-01 Mo n
2016-12-26 - 2017-01-01 Tu n
2016-12-26 - 2017-01-01 We n
....
I didn't find any special methods in Pandas to do this.
1 Answer
1
Use resample
by days with aggregate size
first, then extract names of days by strftime
and last for weeks use resample per weeks with transform
first
and last
values:
resample
size
strftime
transform
first
last
df1 = df.resample('d', on='indate').size().reset_index(name='number of launchings')
df1['day'] = df1['indate'].dt.strftime('%a')
g = df1.resample('W', on='indate')['indate']
df1['week'] = g.transform('first').dt.strftime('%Y-%m-%d') + ' - ' +
g.transform('last').dt.strftime('%Y-%m-%d')
Another solution is use Grouper
:
Grouper
df1 = (df.groupby(pd.Grouper(freq='d', key='indate'))
.size()
.reset_index(name='number of launchings'))
df1['day'] = df1['indate'].dt.strftime('%a')
g = df1.groupby(pd.Grouper(freq='W', key='indate'))['indate']
df1['week'] = (g.transform('first').dt.strftime('%Y-%m-%d') + ' - ' +
g.transform('last').dt.strftime('%Y-%m-%d'))
print (df1)
indate number of launchings day week
0 2016-12-19 2 Mon 2016-12-19 - 2016-12-25
1 2016-12-20 3 Tue 2016-12-19 - 2016-12-25
2 2016-12-21 4 Wed 2016-12-19 - 2016-12-25
3 2016-12-22 5 Thu 2016-12-19 - 2016-12-25
4 2016-12-23 1 Fri 2016-12-19 - 2016-12-25
5 2016-12-24 1 Sat 2016-12-19 - 2016-12-25
6 2016-12-25 1 Sun 2016-12-19 - 2016-12-25
7 2016-12-26 1 Mon 2016-12-26 - 2017-01-01
8 2016-12-27 1 Tue 2016-12-26 - 2017-01-01
9 2016-12-28 1 Wed 2016-12-26 - 2017-01-01
10 2016-12-29 1 Thu 2016-12-26 - 2017-01-01
11 2016-12-30 1 Fri 2016-12-26 - 2017-01-01
12 2016-12-31 1 Sat 2016-12-26 - 2017-01-01
13 2017-01-01 1 Sun 2016-12-26 - 2017-01-01
Sample data:
print (df)
indate
0 2016-12-19 12:16:00
1 2016-12-19 12:21:00
2 2016-12-20 12:32:00
3 2016-12-20 12:34:00
4 2016-12-20 12:40:00
5 2016-12-21 13:47:01
6 2016-12-21 14:27:01
7 2016-12-21 14:43:00
8 2016-12-21 15:02:00
9 2016-12-22 15:16:00
10 2016-12-22 15:22:00
11 2016-12-22 15:25:00
12 2016-12-22 15:22:00
13 2016-12-22 15:25:00
14 2016-12-23 12:16:00
15 2016-12-24 12:21:00
16 2016-12-25 12:32:00
17 2016-12-26 12:34:00
18 2016-12-27 12:40:00
19 2016-12-28 13:47:01
20 2016-12-29 14:27:01
21 2016-12-30 14:43:00
22 2016-12-31 15:02:00
23 2017-01-01 15:16:00
@СергейШумилин - You are welcome! Nice weekend!
– jezrael
Jun 29 at 11:29
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Thank you, kind man
– Сергей Шумилин
Jun 29 at 11:28