Pandas: shiftafter N strings


Pandas: shiftafter N strings



I have dataframe


atm_id dNDCSessionTime sCardNumber nTrRequestCount
152865 0 2017-01-14 07:56:17 1 1
153052 0 2017-01-14 08:01:13 1 1
153053 0 2017-01-14 08:02:19 1 1
153054 0 2017-01-14 08:03:41 1 1
152804 0 2017-01-14 08:04:37 1 1
152805 0 2017-01-14 08:05:24 1 1
152806 1 2017-01-14 08:06:08 1 1
152807 1 2017-01-14 08:07:15 1 1
152808 1 2017-01-14 08:08:08 1 1
152866 1 2017-01-14 08:08:50 1 1



I need add new columns, that, means period. (For every atm_id every 3 transaction (strings in dataframe))



Desire output


atm_id dNDCSessionTime sCardNumber nTrRequestCount period
152865 0 2017-01-14 07:56:17 1 1 1
153052 0 2017-01-14 08:01:13 1 1 1
153053 0 2017-01-14 08:02:19 1 1 1
153054 0 2017-01-14 08:03:41 1 1 2
152804 0 2017-01-14 08:04:37 1 1 2
152805 0 2017-01-14 08:05:24 1 1 2
152806 1 2017-01-14 08:06:08 1 1 3
152807 1 2017-01-14 08:07:15 1 1 3
152808 1 2017-01-14 08:08:08 1 1 3
152866 1 2017-01-14 08:08:50 1 1 4



I try to do this with


df['period'] = df.sort_values(['atm_id', 'dNDCSessionTime']).groupby('atm_id').shift(500)



But I've got an error.





Your desired result is unclear. Why do you want 3x1, 2x2, 3x3, 2x4?
– Nils Werner
Jun 29 at 9:18



1


2


3


4





@NilsWerner thank you, you're right. It was my mistake. I've fix that
– Petr Petrov
Jun 29 at 9:35




1 Answer
1



If like in your example, the number of rows per 'atm_id' is a multiple of 3, then you can use iloc[::3] to select every 3 rows.


iloc[::3]


# initiate the column
df['Period'] = None
# select every three rows and assign 1 there in the column Period
df['Period'].iloc[::3] = 1
# use cumsum, ffill and astype to populate an incremental value every 3 rows
df['Period'] = df['Period'].cumsum().ffill().astype(int)



Now, I assume that you won't have always a number of rows multiple of 3 within an atm_id, so you can go by groupby and apply. I added a row to your input with atm_id = 2 to make the point. If you do:


atm_id


groupby


apply


atm_id = 2


print (df.groupby('atm_id').apply(lambda x: x.iloc[::3]))

atm_id dNDCSessionTime sCardNumber nTrRequestCount Period
atm_id
0 152865 0 2017-01-14 07:56:17 1 1 None
153054 0 2017-01-14 08:03:41 1 1 None
1 152806 1 2017-01-14 08:06:08 1 1 None
152866 1 2017-01-14 08:08:50 1 1 None
2 152887 2 2017-01-12 07:56:17 1 1 None



so you get every 3 rows within a same atm_id. Here different solutions are possible to use the above information in df, mine is to go by index numbers, assuming there are unique in your data. Create a mask with original index number and use it with the same idea than above.


atm_id


df


mask


mask_index = df.groupby('atm_id').apply(lambda x: x.iloc[::3]).index.get_level_values(1)
#if you do print (mask_index), you get
#Out[35]: Int64Index([152865, 153054, 152806, 152866, 152887], dtype='int64')

# initiate the column
df['Period'] = None
# select every three rows within a same atm_id and assign 1 there in the column Period
df.loc[mask_index, 'Period'] = 1
# use cumsum, ffill and astype to populate have an incremental value every 3 rows
df['Period'] = df['Period'].cumsum().ffill().astype(int)



the result is like


atm_id dNDCSessionTime sCardNumber nTrRequestCount Period
152865 0 2017-01-14 07:56:17 1 1 1
153052 0 2017-01-14 08:01:13 1 1 1
153053 0 2017-01-14 08:02:19 1 1 1
153054 0 2017-01-14 08:03:41 1 1 2
152804 0 2017-01-14 08:04:37 1 1 2
152805 0 2017-01-14 08:05:24 1 1 2
152806 1 2017-01-14 08:06:08 1 1 3
152807 1 2017-01-14 08:07:15 1 1 3
152808 1 2017-01-14 08:08:08 1 1 3
152866 1 2017-01-14 08:08:50 1 1 4
152887 2 2017-01-12 07:56:17 1 1 5



where you can see the increment from 4 to 5 between the last two rows while there is not 3 rows for Period=4 as you change atm_id


Period=4


atm_id



Note: if your indexes are not unique, a reset_index at the beginning and set_index back at the end gives the same result, also I did not sort_values as you, but it's not a problem.


reset_index


set_index


sort_values






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Comments

Popular posts from this blog

paramiko-expect timeout is happening after executing the command

Opening a url is failing in Swift

Export result set on Dbeaver to CSV