Pandas: shiftafter N strings

I have dataframe

atm_id dNDCSessionTime sCardNumber nTrRequestCount 152865 0 2017-01-14 07:56:17 1 1 153052 0 2017-01-14 08:01:13 1 1 153053 0 2017-01-14 08:02:19 1 1 153054 0 2017-01-14 08:03:41 1 1 152804 0 2017-01-14 08:04:37 1 1 152805 0 2017-01-14 08:05:24 1 1 152806 1 2017-01-14 08:06:08 1 1 152807 1 2017-01-14 08:07:15 1 1 152808 1 2017-01-14 08:08:08 1 1 152866 1 2017-01-14 08:08:50 1 1

I need add new columns, that, means period. (For every atm_id every 3 transaction (strings in dataframe))

Desire output

atm_id dNDCSessionTime sCardNumber nTrRequestCount period 152865 0 2017-01-14 07:56:17 1 1 1 153052 0 2017-01-14 08:01:13 1 1 1 153053 0 2017-01-14 08:02:19 1 1 1 153054 0 2017-01-14 08:03:41 1 1 2 152804 0 2017-01-14 08:04:37 1 1 2 152805 0 2017-01-14 08:05:24 1 1 2 152806 1 2017-01-14 08:06:08 1 1 3 152807 1 2017-01-14 08:07:15 1 1 3 152808 1 2017-01-14 08:08:08 1 1 3 152866 1 2017-01-14 08:08:50 1 1 4

I try to do this with

df['period'] = df.sort_values(['atm_id', 'dNDCSessionTime']).groupby('atm_id').shift(500)

But I've got an error.

Your desired result is unclear. Why do you want 3x1, 2x2, 3x3, 2x4?
– Nils Werner
Jun 29 at 9:18

1

2

3

4

@NilsWerner thank you, you're right. It was my mistake. I've fix that
– Petr Petrov
Jun 29 at 9:35

1 Answer
1

If like in your example, the number of rows per 'atm_id' is a multiple of 3, then you can use iloc[::3] to select every 3 rows.

iloc[::3]

# initiate the column df['Period'] = None # select every three rows and assign 1 there in the column Period df['Period'].iloc[::3] = 1 # use cumsum, ffill and astype to populate an incremental value every 3 rows df['Period'] = df['Period'].cumsum().ffill().astype(int)

Now, I assume that you won't have always a number of rows multiple of 3 within an atm_id, so you can go by groupby and apply. I added a row to your input with atm_id = 2 to make the point. If you do:

atm_id

groupby

apply

atm_id = 2

print (df.groupby('atm_id').apply(lambda x: x.iloc[::3])) atm_id dNDCSessionTime sCardNumber nTrRequestCount Period atm_id 0 152865 0 2017-01-14 07:56:17 1 1 None 153054 0 2017-01-14 08:03:41 1 1 None 1 152806 1 2017-01-14 08:06:08 1 1 None 152866 1 2017-01-14 08:08:50 1 1 None 2 152887 2 2017-01-12 07:56:17 1 1 None

so you get every 3 rows within a same atm_id. Here different solutions are possible to use the above information in df, mine is to go by index numbers, assuming there are unique in your data. Create a mask with original index number and use it with the same idea than above.

atm_id

df

mask

mask_index = df.groupby('atm_id').apply(lambda x: x.iloc[::3]).index.get_level_values(1) #if you do print (mask_index), you get #Out[35]: Int64Index([152865, 153054, 152806, 152866, 152887], dtype='int64') # initiate the column df['Period'] = None # select every three rows within a same atm_id and assign 1 there in the column Period df.loc[mask_index, 'Period'] = 1 # use cumsum, ffill and astype to populate have an incremental value every 3 rows df['Period'] = df['Period'].cumsum().ffill().astype(int)

the result is like

atm_id dNDCSessionTime sCardNumber nTrRequestCount Period 152865 0 2017-01-14 07:56:17 1 1 1 153052 0 2017-01-14 08:01:13 1 1 1 153053 0 2017-01-14 08:02:19 1 1 1 153054 0 2017-01-14 08:03:41 1 1 2 152804 0 2017-01-14 08:04:37 1 1 2 152805 0 2017-01-14 08:05:24 1 1 2 152806 1 2017-01-14 08:06:08 1 1 3 152807 1 2017-01-14 08:07:15 1 1 3 152808 1 2017-01-14 08:08:08 1 1 3 152866 1 2017-01-14 08:08:50 1 1 4 152887 2 2017-01-12 07:56:17 1 1 5

where you can see the increment from 4 to 5 between the last two rows while there is not 3 rows for Period=4 as you change atm_id

Period=4

atm_id

Note: if your indexes are not unique, a reset_index at the beginning and set_index back at the end gives the same result, also I did not sort_values as you, but it's not a problem.

reset_index

set_index

sort_values

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Search This Blog

Mgiyuk