r - Minbucket not working when producing trees with CHAID package -
i have been trying ensure classification tree obtain using chaid algorithm implemented in chaid package produce tree terminal nodes (leafs) @ least minbucket
number of observations. according description of chaid procedure can done specifying chaid_control
function:
chaid_control(alpha2 = 0.05, alpha3 = -1, alpha4 = 0.05, minsplit = 20, minbucket = 7, minprob = 0.01, stump = false, maxheight = -1)
this similar behavior controlling trees in rpart package.
nevertheless, setting minbucket
parameter seems not have influence on final shape of resulting tree. here example:
library("chaid") set.seed(290875) usvotes <- usvote[sample(1:nrow(usvote), 1000),] chaid(vote3 ~ ., data = usvotes) model formula: vote3 ~ gender + ager + empstat + educr + marstat fitted party: [1] root | [2] marstat in married | | [3] educr <hs, hs, >hs: gore (n = 311, err = 49.5%) | | [4] educr in college, post coll: bush (n = 249, err = 35.3%) | [5] marstat in widowed, divorced, never married | | [6] gender in male: gore (n = 159, err = 47.8%) | | [7] gender in female | | | [8] ager in 18-24, 25-34, 35-44, 45-54: gore (n = 127, err = 22.0%) | | | [9] ager in 55-64, 65+: gore (n = 115, err = 40.9%) number of inner nodes: 4 number of terminal nodes: 5
the terminal nodes 3, 4, 6, 8, , 9 consist of 311, 249, 159, 127, , 115 observations, respectively. now, normally, in order constrain minimal number of observations 1 should proceed follows:
ctrl <- chaid_control(minbucket = 200)
nevertheless, invoking
chaid(vote3 ~ ., data = usvotes, control = ctrl)
yields same tree before (instead of tree nodes @ least 200 observations).
i not sure whether makes mistake or missing in implementation of chaid
procedure...
the minimum number of observations in each terminal node controlled minbucket
, minprob
. former gives absolute number of observations, latter relative frequency (relative sample size of current node). internally, minimum of both quantities used in each node. counterintuitive me have expected maximum used - didn't check whether original chaid algorithm described in way.
if want make sure minbucket
controls minimum node size, set minbucket = 200, minprob = 1
.
Comments
Post a Comment