r - Vectorizing or for loop or apply function? -
i have dataframe 6 columns. column 1 holds dates, column 2 individuals , column 3 6 used calcuation.
date <- c(1, 1, 2, 2, 2, 3) ind <- c("a","a","a","b","c","c") c <- c(5, 6, 5, 7, 8, 8) d <- c(8, 8, 9, 9, 9, 9) e <- c(8, 9, 11, 10, 9, 7) f <- c(5, 6, 8, 5, 7, 4) df <- data.frame(date, ind, c, d, e, f)
i want perform calculation (like (c-e)+(d-f) (in real life coordinates , i'm calculating distances, that's not problem right now).
i want perform calculations, stored in new column (g), 1 day difference between use value of column c , e day 1, , values of column e , f day+1 same individual.
i'm not sure if should use loop or apply function. i've tried far, vectorized operation , subsetting, based on thread: loop on rows of dataframe applying function if-statement
df$g <- na df[!(df$date ==(df$date+1)), "g"] <- ((c-e)+(d-f))
this works, calculations on coordinates same row (c, d, e, f same row). realize why this, don't state row take coordinates. c , d need taken row date = date, , e , f row date = (date+1). realize it, can't head around how that.
continue route? in loop? using apply function?
the dplyr
package provides nice lag
, lead
functions.
> library(dplyr) > df %>% mutate(g = c + d + lead(e,1) + lead(f,1)) date ind c d e f g 1 1 5 8 8 5 28 2 1 6 8 9 6 33 3 2 5 9 11 8 29 4 2 b 7 9 10 5 32 5 2 c 8 9 9 7 28 6 3 c 8 9 7 4 na
g na
last row because there no next date value.
edit:
as others have mentioned, looks example data has 2 dates ind==a
. may want careful doing lead/lag in situation.
if makes sense so, aggregate them first, before doing lead/lag.
df %>% group_by(date,ind) %>% summarise(c=mean(c),d=mean(d),e=mean(e),f=mean(f)) %>% ungroup %>% mutate(g = c + d + lead(e,1) + lead(f,1))
which produces:
date ind c d e f g 1 1 5.5 8 8.5 5.5 32.5 2 2 5.0 9 11.0 8.0 29.0 3 2 b 7.0 9 10.0 5.0 32.0 4 2 c 8.0 9 9.0 7.0 28.0 5 3 c 8.0 9 7.0 4.0 na
Comments
Post a Comment