r - Determining if values of previous rows repeat in dataframe -


i have data organized this:

set.seed(12)  ids <- matrix(replicate(1000,sample(letters[1:4],2)),ncol=2,byrow=t)  df <- data.frame(   event = 1:100,   id1 = ids[,1],   id2 = ids[,2],   grp = rep(1:10, each=100), stringsasfactors=f)  head(df,10)     event id1 id2 grp 1      1     c   1 2      2   d     1 3      3     d   1 4      4     b   1 5      5     d   1 6      6   b   c   1 7      7   b   d   1 8      8   b   d   1 9      9   b   d   1 10    10   c     1 

there pairs of ids (id1 & id2). within row never same. there variable called grp. there 10 groups. each group considered separate sample of data. event variable goes 1-100 in each group.

the first question have quite straightforward. within each group, each row, combination of 2 ids (id1-id2) same previous row, reverse of previous row, or neither of these 2 options. obviously, if there a-c combination on row 100 of 1 group, not interested in whether reversed, same or whatever on row 1 of following group.

this temporary solution:

#give each id pair , identifier: df$pair <- paste(pmin(df$id1,df$id2), pmax(df$id1,df$id2))  #for each grp, work out using `lag` if previous row contains same pair of ids, , if in same or reversed order:  df.sp <- split(df, df$grp) df$value <- unlist(lapply(df.sp, function(x) ifelse(x$pair!=lag(x$pair), na, ifelse(x$id1==lag(x$id1), 1, 0)) )) 

this gives:

head(df,10)    event id1 id2 grp pair value 1      1     c   1  c    na 2      2   d     1  d    na 3      3     d   1  d     0 4      4     b   1  b    na 5      5     d   1  d    na 6      6   b   c   1  b c    na 7      7   b   d   1  b d    na 8      8   b   d   1  b d     1 9      9   b   d   1  b d     1 10    10   c     1  c    na 

this works - showing 0 reversal, 1 copy , na neither.

the more complex question interested in following. within each group (grp), each row, find if combination of 2 ids (the pair) occurred in grp. if did, return whether in same order or reversed order immediate previous time occurred.

that result this:

   event id1 id2 grp pair value 1      1     c   1  c    na 2      2   d     1  d    na 3      3     d   1  d     0 4      4     b   1  b    na 5      5     d   1  d     1 6      6   b   c   1  b c    na 7      7   b   d   1  b d    na 8      8   b   d   1  b d     1 9      9   b   d   1  b d     1 10    10   c     1  c     0 

e.g. row 10 returned 0 because combination a-c occurred , in reverse order (row 1). on row 5 1 returned a-d occurred in same order on row 3.

you're there! second question equivalent first question, grouping pair group. converted code dplyr (though appreciate spirit behind keeping question in base). removed second ifelse, replacing numeric conversion of logical, should more performant (and find easier read).

df %>% group_by(grp) %>%     mutate(         pair = paste(pmin(id1, id2), pmax(id1, id2)),         prev_row = ifelse(pair != lag(pair), na, as.numeric(id1 == lag(id1)))     ) %>%     group_by(grp, pair) %>%     mutate(prev_any = ifelse(pair != lag(pair), na, as.numeric(id1 == lag(id1)))) %>%     head(10) # source: local data frame [10 x 7] # groups: grp, pair [5] # #    event   id1   id2   grp  pair prev_row prev_any #    (int) (chr) (chr) (int) (chr)    (dbl)    (dbl) # 1      1         c     1   c       na       na # 2      2     d         1   d       na       na # 3      3         d     1   d        0        0 # 4      4         b     1   b       na       na # 5      5         d     1   d       na        1 # 6      6     b     c     1   b c       na       na # 7      7     b     d     1   b d       na       na # 8      8     b     d     1   b d        1        1 # 9      9     b     d     1   b d        1        1 # 10    10     c         1   c       na        0 

Comments

Popular posts from this blog

how to insert data php javascript mysql with multiple array session 2 -

multithreading - Exception in Application constructor -

windows - CertCreateCertificateContext returns CRYPT_E_ASN1_BADTAG / 8009310b -