merge - Merging code values of categories to dataset in R -
i have political donation dataset holds industry categories in alphanumeric codes. separate text document has listing how these alphanumeric codes translate industry name, sector name, , category-of-industry name.
for instance, "a1200", crop production industry, in agribusiness sector, in sugar cane industry category. i'd know how pair alphanumeric codes respective sector, industry, , category values in separate columns.
right now, code values dataset in
catcode catname catorder industry sector a1200 sugar cane a01 crop production agribusiness
and industry donation dataset:
business name amount donated year category sarah farms 1000 2010 a1200
the category dataset 444 rows , donation set 1m rows. how feel out donation dataset looks this. category common name
catcode catname catorder industry sector business name amount donated year category a1200 sugar cane a01 crop production agribusiness sarah farms 1000 2010 a1200
i'm little new these forums, if there better way ask question, please let me know. thank help!
if speed matter, may want use data.table
or dplyr
. here, modified sample data bit provide ideas.
df1 <- data.frame(catcode = c("a1200", "b1500", "c1800"), catname = c("sugar", "salty", "butter"), catorder = c("cane a01", "cane a01", "cane a01"), industry = c("crop production", "crop production", "crop production"), sector = c("agribusiness", "agribusiness", "agribusiness"), stringsasfactors = false) # catcode catname catorder industry sector #1 a1200 sugar cane a01 crop production agribusiness #2 b1500 salty cane a01 crop production agribusiness #3 c1800 butter cane a01 crop production agribusiness df2 <- data.frame(businessname = c("sarah farms", "ben farms"), amountdonated = c(100, 200), year = c(2010, 2010), category = c("a1200", "b1500"), stringsasfactors = false) # businessname amountdonated year category #1 sarah farms 100 2010 a1200 #2 ben farms 200 2010 b1500 library(dplyr) library(data.table) # 1) dplyr option # catcode c1800 dropped since not exist in both data frames. inner_join(df1, df2, = c("catcode" = "category")) # catcode catname catorder industry sector businessname amountdonated year #1 a1200 sugar cane a01 crop production agribusiness sarah farms 100 2010 #2 b1500 salty cane a01 crop production agribusiness ben farms 200 2010 # catcide c1800 remains left_join(df1, df2, = c("catcode" = "category")) # catcode catname catorder industry sector businessname amountdonated year #1 a1200 sugar cane a01 crop production agribusiness sarah farms 100 2010 #2 b1500 salty cane a01 crop production agribusiness ben farms 200 2010 #3 c1800 butter cane a01 crop production agribusiness <na> na na # 2) data.table option # convert data.frame data.table setdt(df1) setdt(df2) #set columns merge setkey(df1, "catcode") setkey(df2, "category") df1[df2] # catcode catname catorder industry sector businessname amountdonated year #1: a1200 sugar cane a01 crop production agribusiness sarah farms 100 2010 #2: b1500 salty cane a01 crop production agribusiness ben farms 200 2010 df2[df1] # businessname amountdonated year category catname catorder industry sector #1: sarah farms 100 2010 a1200 sugar cane a01 crop production agribusiness #2: ben farms 200 2010 b1500 salty cane a01 crop production agribusiness #3: na na na c1800 butter cane a01 crop production agribusiness
Comments
Post a Comment