cv.GAMBoost {GAMBoost}R Documentation

Cross-validation for GAMBoost fits

Description

Performs a K-fold cross-validation for GAMBoost in search for the optimal number of boosting steps.

Usage

cv.GAMBoost(x=NULL,y,x.linear=NULL,maxstepno=500,
            K=10,type=c("loglik","error","L2"),pred.cutoff=0.5,
            just.criterion=FALSE,trace=FALSE,parallel=FALSE,
            upload.x=TRUE,folds=NULL,...) 

Arguments

x n * p matrix of covariates with potentially non-linear influence. If this is not given (and argument x.linear is employed), a generalized linear model is fitted.
y response vector of length n.
x.linear optional n * q matrix of covariates with linear influence.
maxstepno maximum number of boosting steps to evaluate.
K number of folds to be used for cross-validation.
type, pred.cutoff goodness-of-fit criterion: likelihood ("loglik"), error rate for binary response data ("error") or squared error for others ("L2"). For binary response data and the "error" criterion pred.cutoff specifies the p value cutoff for prediction of class 1 vs 0.
just.criterion logical value indicating wether a list with the goodness-of-fit information should be returned or a GAMBoost fit with the optimal number of steps.
trace logical value indicating whether information on progress should be printed.
parallel logical value indicating whether evaluation of cross-validation folds should be performed in parallel on a compute cluster. This requires library snowfall.
upload.x logical value indicating whether x and x.linear should/have to be uploaded to the compute cluster for parallel computation. Uploading these only once (using sfExport(x,x.linear) from library snowfall) can save much time for large data sets.
folds if not NULL, this has to be a list of length K, each element being a vector of indices of fold elements. Useful for employing the same folds for repeated runs.
... miscellaneous parameters for the calls to GAMBoost

Value

GAMBoost fit with the optimal number of boosting steps or list with the following components:

criterion vector with goodness-of fit criterion for boosting step 1 , ... , maxstep
se vector with standard error estimates for the goodness-of-fit criterion in each boosting step.
selected index of the optimal boosting step.
folds list of length K, where the elements are vectors of the indices of observations in the respective folds.

Author(s)

Harald Binder binderh@fdm.uni-freiburg.de

See Also

GAMBoost

Examples

## Not run: 
##  Generate some data 

x <- matrix(runif(100*8,min=-1,max=1),100,8)             
eta <- -0.5 + 2*x[,1] + 2*x[,3]^2
y <- rbinom(100,1,binomial()$linkinv(eta))

##  Fit the model with smooth components

gb1 <- GAMBoost(x,y,penalty=400,stepno=100,trace=TRUE,family=binomial()) 

##  10-fold cross-validation with prediction error as a criterion

gb1.crit <- cv.GAMBoost(x,y,penalty=400,maxstepno=100,trace=TRUE,
                        family=binomial(),
                        K=10,type="error",just.criterion=TRUE)

##  Compare AIC and estimated prediction error

which.min(gb1$AIC)          
which.min(gb1.crit$criterion)
## End(Not run)


[Package GAMBoost version 1.1 Index]