Pandas: shiftafter N strings
Pandas: shiftafter N strings
I have dataframe
atm_id dNDCSessionTime sCardNumber nTrRequestCount
152865 0 2017-01-14 07:56:17 1 1
153052 0 2017-01-14 08:01:13 1 1
153053 0 2017-01-14 08:02:19 1 1
153054 0 2017-01-14 08:03:41 1 1
152804 0 2017-01-14 08:04:37 1 1
152805 0 2017-01-14 08:05:24 1 1
152806 1 2017-01-14 08:06:08 1 1
152807 1 2017-01-14 08:07:15 1 1
152808 1 2017-01-14 08:08:08 1 1
152866 1 2017-01-14 08:08:50 1 1
I need add new columns, that, means period. (For every atm_id every 3 transaction (strings in dataframe))
Desire output
atm_id dNDCSessionTime sCardNumber nTrRequestCount period
152865 0 2017-01-14 07:56:17 1 1 1
153052 0 2017-01-14 08:01:13 1 1 1
153053 0 2017-01-14 08:02:19 1 1 1
153054 0 2017-01-14 08:03:41 1 1 2
152804 0 2017-01-14 08:04:37 1 1 2
152805 0 2017-01-14 08:05:24 1 1 2
152806 1 2017-01-14 08:06:08 1 1 3
152807 1 2017-01-14 08:07:15 1 1 3
152808 1 2017-01-14 08:08:08 1 1 3
152866 1 2017-01-14 08:08:50 1 1 4
I try to do this with
df['period'] = df.sort_values(['atm_id', 'dNDCSessionTime']).groupby('atm_id').shift(500)
But I've got an error.
1
2
3
4
@NilsWerner thank you, you're right. It was my mistake. I've fix that
– Petr Petrov
Jun 29 at 9:35
1 Answer
1
If like in your example, the number of rows per 'atm_id' is a multiple of 3, then you can use iloc[::3]
to select every 3 rows.
iloc[::3]
# initiate the column
df['Period'] = None
# select every three rows and assign 1 there in the column Period
df['Period'].iloc[::3] = 1
# use cumsum, ffill and astype to populate an incremental value every 3 rows
df['Period'] = df['Period'].cumsum().ffill().astype(int)
Now, I assume that you won't have always a number of rows multiple of 3 within an atm_id
, so you can go by groupby
and apply
. I added a row to your input with atm_id = 2
to make the point. If you do:
atm_id
groupby
apply
atm_id = 2
print (df.groupby('atm_id').apply(lambda x: x.iloc[::3]))
atm_id dNDCSessionTime sCardNumber nTrRequestCount Period
atm_id
0 152865 0 2017-01-14 07:56:17 1 1 None
153054 0 2017-01-14 08:03:41 1 1 None
1 152806 1 2017-01-14 08:06:08 1 1 None
152866 1 2017-01-14 08:08:50 1 1 None
2 152887 2 2017-01-12 07:56:17 1 1 None
so you get every 3 rows within a same atm_id
. Here different solutions are possible to use the above information in df
, mine is to go by index numbers, assuming there are unique in your data. Create a mask
with original index number and use it with the same idea than above.
atm_id
df
mask
mask_index = df.groupby('atm_id').apply(lambda x: x.iloc[::3]).index.get_level_values(1)
#if you do print (mask_index), you get
#Out[35]: Int64Index([152865, 153054, 152806, 152866, 152887], dtype='int64')
# initiate the column
df['Period'] = None
# select every three rows within a same atm_id and assign 1 there in the column Period
df.loc[mask_index, 'Period'] = 1
# use cumsum, ffill and astype to populate have an incremental value every 3 rows
df['Period'] = df['Period'].cumsum().ffill().astype(int)
the result is like
atm_id dNDCSessionTime sCardNumber nTrRequestCount Period
152865 0 2017-01-14 07:56:17 1 1 1
153052 0 2017-01-14 08:01:13 1 1 1
153053 0 2017-01-14 08:02:19 1 1 1
153054 0 2017-01-14 08:03:41 1 1 2
152804 0 2017-01-14 08:04:37 1 1 2
152805 0 2017-01-14 08:05:24 1 1 2
152806 1 2017-01-14 08:06:08 1 1 3
152807 1 2017-01-14 08:07:15 1 1 3
152808 1 2017-01-14 08:08:08 1 1 3
152866 1 2017-01-14 08:08:50 1 1 4
152887 2 2017-01-12 07:56:17 1 1 5
where you can see the increment from 4 to 5 between the last two rows while there is not 3 rows for Period=4
as you change atm_id
Period=4
atm_id
Note: if your indexes are not unique, a reset_index
at the beginning and set_index
back at the end gives the same result, also I did not sort_values
as you, but it's not a problem.
reset_index
set_index
sort_values
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Your desired result is unclear. Why do you want 3x
1
, 2x2
, 3x3
, 2x4
?– Nils Werner
Jun 29 at 9:18