dataframe - R : Calculate frequency of a transaction and increment for every transaction -
i have dataframe called cust
> head(cust,15) txn_date cust_no credit 1 2013-12-02 12345000 400.00 2 2013-12-02 12345000 300.00 3 2013-12-02 12345000 304.71 4 2013-12-02 12345000 475.00 5 2013-12-02 12345000 325.00 6 2013-12-02 34567890 1390.00 7 2013-12-02 34567890 100.00 8 2013-12-02 34567890 500.00 9 2013-12-02 23232323 5.00 10 2013-12-02 23232323 130.00 11 2013-12-02 23232323 5975.00 12 2013-12-02 23232323 3711.00 13 2013-12-02 14345422 12530.50 14 2013-12-02 14345422 3312.00 15 2013-12-02 98765432 370.00
to calculate total amount, based on cust_no, have used within
, applied cumsum
it, below
newcust <-within(cust, { runningtotal <- ave(cust$credit, cust$cust_no, fun = cumsum)})
and hence result this, can see credit getting incremented based on cust number
>head(newcust,15) txn_date cust_no credit runningtotal 1 2013-12-02 12345000 400.00 400.00 2 2013-12-02 12345000 300.00 700.00 3 2013-12-02 12345000 304.71 1004.71 4 2013-12-02 12345000 475.00 1479.71 5 2013-12-02 12345000 325.00 1804.71 6 2013-12-02 34567890 1390.00 1390.00 7 2013-12-02 34567890 100.00 1490.00 8 2013-12-02 34567890 500.00 1990.00 9 2013-12-02 23232323 5.00 5.00 10 2013-12-02 23232323 130.00 135.00 11 2013-12-02 23232323 5975.00 6110.00 12 2013-12-02 23232323 3711.00 9821.00 13 2013-12-02 14345422 12530.50 12530.50 14 2013-12-02 14345422 3312.00 15842.5 15 2013-12-02 98765432 370.00 370
now question is, how find out number of transactions every cust_no, using above logic of within
, length
?? or maybe, other logic.
i not getting desired output shown below tried using aggregate()
, apply
..
txn_date cust_no credit frequency 1 2013-12-02 12345000 400.00 1 2 2013-12-02 12345000 300.00 2 3 2013-12-02 12345000 304.71 3 4 2013-12-02 12345000 475.00 4 5 2013-12-02 12345000 325.00 5 6 2013-12-02 34567890 1390.00 1 7 2013-12-02 34567890 100.00 2 8 2013-12-02 34567890 500.00 3 9 2013-12-02 23232323 5.00 1 10 2013-12-02 23232323 130.00 2 11 2013-12-02 23232323 5975.00 3 12 2013-12-02 23232323 3711.00 4 13 2013-12-02 14345422 12530.50 1 14 2013-12-02 14345422 3312.00 2 15 2013-12-02 98765432 370.00 1
you can try
within(cust, frequency <- ave(seq_along(cust_no), cust_no, fun=seq_along)) # txn_date cust_no credit frequency #1 2013-12-02 12345000 400.00 1 #2 2013-12-02 12345000 300.00 2 #3 2013-12-02 12345000 304.71 3 #4 2013-12-02 12345000 475.00 4 #5 2013-12-02 12345000 325.00 5 #6 2013-12-02 34567890 1390.00 1 #7 2013-12-02 34567890 100.00 2 #8 2013-12-02 34567890 500.00 3 #9 2013-12-02 23232323 5.00 1 #10 2013-12-02 23232323 130.00 2 #11 2013-12-02 23232323 5975.00 3 #12 2013-12-02 23232323 3711.00 4 #13 2013-12-02 14345422 12530.50 1 #14 2013-12-02 14345422 3312.00 2 #15 2013-12-02 98765432 370.00 1
or using splitstackshape
library(splitstackshape) getanid(cust, 'cust_no')
or using data.table
library(data.table) setdt(cust)[, frequency:=1:.n, by=cust_no]
Comments
Post a Comment