Python Pandas max value of a column depending on shared condition


Python Pandas max value of a column depending on shared condition



I have a table that looks like this:



My table



I am trying to figure out what the "first_stop" and "last_stop" is for each "id" based on the "sequence". I have the first_stop done, but I can't figure out the last stop. So I need an additional column that says if it is the max sequence number for each ID. I have tried so many ways and they all end up failing or giving me some errors. I've wasted too much of my day trying to solve this and I know you all will be able to help me out super quick. Thanks for your help.


for index, row in df.iterrows():
first_mask = df['sequence'] == 0
last_mask = df['sequence'] == ???
df['first_stop'] = np.where(first_mask, 'Y', 'N')
df['last_stop'] = np.where(last_mask, 'Y', 'N')
print(df[['id', 'sequence', 'first_stop', 'last_stop']])



The sequence is the order of the stops, so I want the lowest sequence number (first_stop) for each ID, and the highest sequence number (last_stop) for each ID. The lowest sequence number is easy, it's always 0. But how do I calculate the highest sequence number for each ID?





what exactly is first stop and last stop? what is sequence and how is it supposed to relate to these values?
– R Balasubramanian
2 days ago





Sorry for the confusion, I added a little more information to try to clarify it a little more. I basically want to determine if each row is the highest "sequence" number for each "ID".
– Emac
2 days ago





if your data is always sorted such that all entries for an id are grouped together and the entries for sequences are in order, you can just check if the next row has the same id - if not, you are at the highest sequence for this ID, so last_mask at this point = Y. Make sure you check for bounds as well, so you do not try to access an n+1th element on your last row.
– R Balasubramanian
2 days ago


last_mask


Y


n+1




2 Answers
2



Given that your DataFrame appears to already be sorted by ['id', 'sequence'] you just need to shift as was pointed out in the comments to check for the last row where the id changes.


DataFrame


df['last_stop'] = df.id != df.id.shift(-1)



Outputs:


id sequence last_stop
0 2 0 False
1 2 1 True
2 3 0 False
3 3 1 False
4 3 2 False
5 3 3 True
6 4 0 False
7 4 1 True
8 5 0 False
9 5 1 True
10 6 0 True
11 7 0 True
12 8 0 False
13 8 1 True



If it is not sorted, you can sort before and apply the same logic. Otherwise a single transform will take care of it:


df['last_stop'] = df.groupby('id').sequence.transform(lambda x: x == max(x))





Even though I already solved it, I like your solution better (feels more pythonic), so thank you!
– Emac
2 days ago





My solution below seems faster, though more of a hacky method. Curious why that is?
– Emac
2 days ago





@Emac Which solution does yours seem faster than, the .shift() solution or the .transform()?
– ALollz
2 days ago


.shift()


.transform()





Sorry for the confusion, it seems faster than the .transform() solution, which I think looks cleaner and I prefer it overall.
– Emac
2 days ago





@Emac I'm not sure, I know others have more expertise in efficiency, so perhaps if you asked another question with some formatted data and timings it would be well received and answered.
– ALollz
yesterday



Alright, I solved it myself, so here is what I did. It's pretty simple and I feel stupid for not figuring it out earlier.



Let me know if anyone else has a better solution.



I created a row that counts the total stops:


df['stop_count'] = df.groupby('id')['id'].transform('count') - 1



Then I checked to see if the sequence number matches the stop_count:



last_mask = df['sequence'] == df['stop_count']


last_mask = df['sequence'] == df['stop_count']






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Comments

Popular posts from this blog

paramiko-expect timeout is happening after executing the command

Export result set on Dbeaver to CSV

Opening a url is failing in Swift