r - Minbucket not working when producing trees with CHAID package -


i have been trying ensure classification tree obtain using chaid algorithm implemented in chaid package produce tree terminal nodes (leafs) @ least minbucket number of observations. according description of chaid procedure can done specifying chaid_control function:

chaid_control(alpha2 = 0.05, alpha3 = -1, alpha4 = 0.05,               minsplit = 20, minbucket = 7, minprob = 0.01,               stump = false, maxheight = -1) 

this similar behavior controlling trees in rpart package.

nevertheless, setting minbucket parameter seems not have influence on final shape of resulting tree. here example:

library("chaid") set.seed(290875) usvotes <- usvote[sample(1:nrow(usvote), 1000),] chaid(vote3 ~ ., data = usvotes)  model formula: vote3 ~ gender + ager + empstat + educr + marstat  fitted party: [1] root |   [2] marstat in married |   |   [3] educr <hs, hs, >hs: gore (n = 311, err = 49.5%) |   |   [4] educr in college, post coll: bush (n = 249, err = 35.3%) |   [5] marstat in widowed, divorced, never married |   |   [6] gender in male: gore (n = 159, err = 47.8%) |   |   [7] gender in female |   |   |   [8] ager in 18-24, 25-34, 35-44, 45-54: gore (n = 127, err = 22.0%) |   |   |   [9] ager in 55-64, 65+: gore (n = 115, err = 40.9%)  number of inner nodes:    4 number of terminal nodes: 5 

the terminal nodes 3, 4, 6, 8, , 9 consist of 311, 249, 159, 127, , 115 observations, respectively. now, normally, in order constrain minimal number of observations 1 should proceed follows:

ctrl <- chaid_control(minbucket = 200) 

nevertheless, invoking

chaid(vote3 ~ ., data = usvotes, control = ctrl) 

yields same tree before (instead of tree nodes @ least 200 observations).

i not sure whether makes mistake or missing in implementation of chaid procedure...

the minimum number of observations in each terminal node controlled minbucket , minprob. former gives absolute number of observations, latter relative frequency (relative sample size of current node). internally, minimum of both quantities used in each node. counterintuitive me have expected maximum used - didn't check whether original chaid algorithm described in way.

if want make sure minbucket controls minimum node size, set minbucket = 200, minprob = 1.


Comments

Popular posts from this blog

ruby on rails - RuntimeError: Circular dependency detected while autoloading constant - ActiveAdmin.register Role -

c++ - OpenMP unpredictable overhead -

javascript - Wordpress slider, not displayed 100% width -