merge - Merging code values of categories to dataset in R -


i have political donation dataset holds industry categories in alphanumeric codes. separate text document has listing how these alphanumeric codes translate industry name, sector name, , category-of-industry name.

for instance, "a1200", crop production industry, in agribusiness sector, in sugar cane industry category. i'd know how pair alphanumeric codes respective sector, industry, , category values in separate columns.

right now, code values dataset in

    catcode catname     catorder    industry             sector           a1200   sugar cane  a01         crop production  agribusiness 

and industry donation dataset:

business name    amount donated    year   category sarah farms      1000              2010   a1200 

the category dataset 444 rows , donation set 1m rows. how feel out donation dataset looks this. category common name

    catcode catname     catorder    industry             sector          business name    amount donated    year   category     a1200   sugar cane  a01         crop production  agribusiness     sarah farms      1000              2010   a1200 

i'm little new these forums, if there better way ask question, please let me know. thank help!

if speed matter, may want use data.table or dplyr. here, modified sample data bit provide ideas.

df1 <- data.frame(catcode = c("a1200", "b1500", "c1800"),                   catname = c("sugar", "salty", "butter"),                   catorder = c("cane a01", "cane a01", "cane a01"),                   industry = c("crop production", "crop production", "crop production"),                   sector = c("agribusiness", "agribusiness", "agribusiness"),                   stringsasfactors = false)  #  catcode catname catorder        industry       sector #1   a1200   sugar cane a01 crop production agribusiness #2   b1500   salty cane a01 crop production agribusiness #3   c1800  butter cane a01 crop production agribusiness  df2 <- data.frame(businessname = c("sarah farms", "ben farms"),                   amountdonated = c(100, 200),                   year = c(2010, 2010),                   category = c("a1200", "b1500"),                   stringsasfactors = false)  #  businessname amountdonated year category #1  sarah farms           100 2010    a1200 #2    ben farms           200 2010    b1500  library(dplyr) library(data.table)  # 1) dplyr option # catcode c1800 dropped since not exist in both data frames. inner_join(df1, df2, = c("catcode" = "category"))  #      catcode catname catorder        industry       sector businessname amountdonated year #1   a1200   sugar cane a01 crop production agribusiness  sarah farms           100 2010 #2   b1500   salty cane a01 crop production agribusiness    ben farms           200 2010  # catcide c1800 remains left_join(df1, df2, = c("catcode" = "category"))  #      catcode catname catorder        industry       sector businessname amountdonated year #1   a1200   sugar cane a01 crop production agribusiness  sarah farms           100 2010 #2   b1500   salty cane a01 crop production agribusiness    ben farms           200 2010 #3   c1800  butter cane a01 crop production agribusiness         <na>            na   na  # 2) data.table option # convert data.frame data.table setdt(df1) setdt(df2)  #set columns merge setkey(df1, "catcode") setkey(df2, "category")  df1[df2]  #   catcode catname catorder        industry       sector businessname amountdonated year #1:   a1200   sugar cane a01 crop production agribusiness  sarah farms           100 2010 #2:   b1500   salty cane a01 crop production agribusiness    ben farms           200 2010  df2[df1] #   businessname amountdonated year category catname catorder        industry       sector #1:  sarah farms           100 2010    a1200   sugar cane a01 crop production agribusiness #2:    ben farms           200 2010    b1500   salty cane a01 crop production agribusiness #3:           na            na   na    c1800  butter cane a01 crop production agribusiness 

Comments

Popular posts from this blog

ruby on rails - RuntimeError: Circular dependency detected while autoloading constant - ActiveAdmin.register Role -

c++ - OpenMP unpredictable overhead -

javascript - Wordpress slider, not displayed 100% width -