Grouping by ID column and collapsing Boolean columns for summary

Trying to transform a dataframe with multiple boolean columns for rows with duplicate IDs into a new dataframe where there is only one entry for each ID but the boolean values are combined for the ID groups. I also want to carry down the latest date value.

Example input:

ID S1 S2 S3 S4 Date 1 ex1 1 0 0 0 4/7/12 2 ex1 0 1 0 0 6/8/16 3 ex2 0 0 1 0 5/5/15 4 ex3 1 1 0 0 4/19/13 5 ex3 0 1 0 1 6/7/15 6 ex4 0 1 0 0 8/7/09 7 ex5 1 1 1 0 6/12/17

Desired output:

ID S1 S2 S3 S4 Date ex1 1 1 0 0 6/8/16 ex2 0 0 1 0 5/5/15 ex3 1 1 0 1 6/7/15 ex4 0 1 0 0 8/7/09 ex5 1 1 1 0 6/12/17

2 Answers
2

Simple summarization as below -

df <- df %>% group_by(ID) %>% summarize( S1=max(S1), S2 =max(S2), S3 =max(S3), S4 = max(S4), Date = max(Date) )

library(data.table) setDT(df) df[, lapply(.SD, max), ID] # ID S1 S2 S3 S4 Date # 1: ex1 1 1 0 0 2016-06-08 # 2: ex2 0 0 1 0 2015-05-05 # 3: ex3 1 1 0 1 2015-06-07 # 4: ex4 0 1 0 0 2009-08-07 # 5: ex5 1 1 1 0 2017-06-12

This also works:

library(dplyr) df %>% group_by(ID) %>% summarise_all(max)

Or in Base R:

do.call(rbind , lapply(split(df, df$ID) , function(g) data.frame(lapply(g, max))))

Data used:

df <- fread(" a ID S1 S2 S3 S4 Date 1 ex1 1 0 0 0 4/7/12 2 ex1 0 1 0 0 6/8/16 3 ex2 0 0 1 0 5/5/15 4 ex3 1 1 0 0 4/19/13 5 ex3 0 1 0 1 6/7/15 6 ex4 0 1 0 0 8/7/09 7 ex5 1 1 1 0 6/12/17 ")[, -1] df[, Date := lubridate::mdy(Date)]

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Search This Blog

Mgiyuk