dataframe - R : Calculate frequency of a transaction and increment for every transaction -


i have dataframe called cust

    > head(cust,15)     txn_date  cust_no   credit 1  2013-12-02 12345000  400.00 2  2013-12-02 12345000  300.00 3  2013-12-02 12345000  304.71 4  2013-12-02 12345000  475.00 5  2013-12-02 12345000  325.00 6  2013-12-02 34567890  1390.00 7  2013-12-02 34567890  100.00 8  2013-12-02 34567890  500.00 9  2013-12-02 23232323  5.00 10 2013-12-02 23232323  130.00 11 2013-12-02 23232323  5975.00 12 2013-12-02 23232323  3711.00 13 2013-12-02 14345422  12530.50 14 2013-12-02 14345422  3312.00 15 2013-12-02 98765432  370.00 

to calculate total amount, based on cust_no, have used within , applied cumsum it, below

newcust <-within(cust, {    runningtotal <- ave(cust$credit, cust$cust_no, fun = cumsum)}) 

and hence result this, can see credit getting incremented based on cust number

>head(newcust,15)    txn_date  cust_no   credit   runningtotal     1  2013-12-02 12345000  400.00   400.00 2  2013-12-02 12345000  300.00   700.00 3  2013-12-02 12345000  304.71   1004.71 4  2013-12-02 12345000  475.00   1479.71 5  2013-12-02 12345000  325.00   1804.71 6  2013-12-02 34567890  1390.00  1390.00 7  2013-12-02 34567890  100.00   1490.00 8  2013-12-02 34567890  500.00   1990.00 9  2013-12-02 23232323  5.00     5.00 10 2013-12-02 23232323  130.00   135.00 11 2013-12-02 23232323  5975.00  6110.00 12 2013-12-02 23232323  3711.00  9821.00 13 2013-12-02 14345422  12530.50 12530.50 14 2013-12-02 14345422  3312.00  15842.5 15 2013-12-02 98765432  370.00   370 

now question is, how find out number of transactions every cust_no, using above logic of within , length?? or maybe, other logic.

i not getting desired output shown below tried using aggregate() , apply..

   txn_date  cust_no   credit   frequency    1  2013-12-02 12345000  400.00   1 2  2013-12-02 12345000  300.00   2 3  2013-12-02 12345000  304.71   3 4  2013-12-02 12345000  475.00   4 5  2013-12-02 12345000  325.00   5 6  2013-12-02 34567890  1390.00  1 7  2013-12-02 34567890  100.00   2 8  2013-12-02 34567890  500.00   3 9  2013-12-02 23232323  5.00     1 10 2013-12-02 23232323  130.00   2 11 2013-12-02 23232323  5975.00  3 12 2013-12-02 23232323  3711.00  4 13 2013-12-02 14345422  12530.50 1 14 2013-12-02 14345422  3312.00  2 15 2013-12-02 98765432  370.00   1 

you can try

 within(cust, frequency <- ave(seq_along(cust_no), cust_no, fun=seq_along))  #     txn_date  cust_no   credit frequency  #1  2013-12-02 12345000   400.00         1  #2  2013-12-02 12345000   300.00         2  #3  2013-12-02 12345000   304.71         3  #4  2013-12-02 12345000   475.00         4  #5  2013-12-02 12345000   325.00         5  #6  2013-12-02 34567890  1390.00         1  #7  2013-12-02 34567890   100.00         2  #8  2013-12-02 34567890   500.00         3  #9  2013-12-02 23232323     5.00         1  #10 2013-12-02 23232323   130.00         2  #11 2013-12-02 23232323  5975.00         3  #12 2013-12-02 23232323  3711.00         4  #13 2013-12-02 14345422 12530.50         1  #14 2013-12-02 14345422  3312.00         2  #15 2013-12-02 98765432   370.00         1 

or using splitstackshape

 library(splitstackshape)  getanid(cust, 'cust_no') 

or using data.table

 library(data.table)  setdt(cust)[, frequency:=1:.n, by=cust_no] 

Comments

Popular posts from this blog

c++ - OpenMP unpredictable overhead -

ruby on rails - RuntimeError: Circular dependency detected while autoloading constant - ActiveAdmin.register Role -

javascript - Wordpress slider, not displayed 100% width -