configBB.VarSelMisc {galgo}R Documentation

Creates and configure all objects needed for a “variable selection” problem

Description

Creates and configure all objects needed for a ``variable selection'' problem. It configures Gene, Chromosome, Niche, World, Galgo and BigBang objects.

Usage

configBB.VarSelMisc(
        file=NULL, 
        data=NULL, 
        strata=NULL, 
        train=rep(2/3,333), 
        test=1-train, 

        main="project",

        test.error=c(0,1),
        train.error=c("kfolds","splits","loocv","resubstitution"),
        train.Ksets=-1, # -1 : automatic detection : max(min(round(13-n/11),n),3) n=samples, n <=50: n/4,  n<=100, n/10, else 3
        train.splitFactor=2/3, 
        fitnessFunc=NULL,

        scale=FALSE, 

        geneFunc=runifInt,
        chromosomeSize=5, 
        populationSize=-1, 
        niches=1, 
        worlds=1,
        immigration=c(rep(0,18),.5,1), 
        crossoverPoints=round(chromosomeSize/2,0), 
        offspringScaleFactor=1,
        offspringMeanFactor=0.85,
        offspringPowerFactor=2,
        elitism=c(rep(1,9),.5),
        goalFitness=0.90, 
        galgoVerbose=20, 
        maxGenerations=200, 
        minGenerations=10, 
        galgoUserData=NULL, # additional user data for galgo

        maxBigBangs=1000, 
        maxSolutions=500, 
        onlySolutions=FALSE, 
        collectMode="bigbang", 
        bigbangVerbose=1, 
        saveFile="?.Rdata", 
        saveFrequency=50,
        saveVariable="bigbang",
        callBackFuncGALGO=function(...) 1,
        callBackFuncBB=plot,
        callEnhancerFunc=function(chr, parent) NULL,
        saveGeneBreaks=NULL,
        geneNames=NULL,
        sampleNames=NULL,
        bigbangUserData=NULL # additional user data for bigbang
        )

Arguments

file The file containing the data. First row should be sample names. First column should be variable names (genes). Second row must be the class or strata for every sample if strata is not provided. The strata is used to balance the train-test sets relative to different strata. If there are only one strata, use the same value for all samples.
data If a file is not provided, data is the a data matrix or data frame with samples in columns and genes in rows (with its respective colnames and rownames set). If data is provided, strata must be specified.
strata if a file is not provided, specifies the classes or strata of the data. If the file is provided and strata is specified, the second row of the file is considered as data. The strata is used to balance the train-test sets relative to different strata. If there are only one strata, use the same value for all samples.
train A vector of the proportion of random samples to be used as training sets. The number of sets is determined by the length of train. The train+test should never be greather than 1. All sets are randomly chosen with the same proportion of samples per class than the original sample set.
test A vector of the proportion of random samples to be used as testing sets. The number of sets is determined by the length of train. All sets are randomly chosen with the same proportion of samples per class than the original sample set.
main A string or ID related to your project that will be used in all plots and would help you to distinguish results from different studies.
test.error Vector of two weights specifing how the fitness function is evaluated to compute the test error. The first value is the weight of training and the second the weight of test. The default is c(0,1) which consider only test error. The sum of this values should be 1.
train.error Specify how the training set is divided to compute the error in the training set (in evolve method for Galgo object). "splits" compute K (train.Ksets) random splits. "loocv" (leave-one-out-cross-validation) compute K=training samples. "resubstitution" no folding at all; it is faster and provided for quick overviews.
train.Ksets The number of training set folds/splits. Negative means automatic detection (n=samples, max(min(round(13-n/11),n),3)).
train.splitFactor When train.error=="splits", specifies the proportion of samples used in spliting the training set.
fitnessFunc Specify the function that would be used to compute the accuracy. The required prototype is function(chr, parent, tr, te, result) where chr is the chromosome to be evaluated. parent would be the BigBang object where all their variables are exposed. The fitness function commonly use parent$data$data, which has been trasposed. tr is the vector of samples (rows) that MUST be used as training and te the samples that must be used as test.
scale TRUE instruct to scale all rows for zero mean and unitary variance.
geneFunc Specify the function that mutate genes. The default is using an integer uniform distribution function (runifInt).
chromosomeSize Specify the chromosome size (the number of variables/genes to be included in a model). Defaults to 5. See Gene and Chromosome objects.
populationSize Specify the number of chromosomes per niche. Defaults is min(20,20+(2000-nrow(data))/400). See Chromosome and Niche objects.
niches Specify the number of niches. Defaults to 2. See Niche, World and Galgo objects.
worlds Specify the number of worlds. Defaults to 1. See World and Galgo objects.
immigration Specify the migration criteria.
crossoverPoints Specify the active positions for crossover operator. Defaults to a single point in the middle of the chromosome. See Niche object.
offspringScaleFactor Scale factor for offspring generation. Defaults 1. See Niche object.
offspringMeanFactor Mean factor for offspring generation. Defaults to 0.85. See Niche object.
offspringPowerFactor Power factor for offspring generation. Defaults to 2. See Niche object.
elitism Elitism probability/flag/vector. Defaults to c(1,1,1,1,1,1,1,1,1,0.5) (elitism present for 9 generations followed by a 50% chance, then repeated). See Niche object.
goalFitness Specify the desired fitness value (fraction of correct classification). Defaults to 0.90. See Galgo object.
galgoVerbose verbose parameter for Galgo object.
maxGenerations Maximum number of generations. Defaults to 200. See Galgo object.
minGenerations Minimum number of generations. Defaults to 10. See Galgo object.
galgoUserData Additional user data for the Galgo object. See Galgo object.
maxBigBangs Maximum number of bigbang cycles. Defaults to 1000. See BigBang object.
maxSolutions Maximum number of solutions collected. Defaults to 1000. See BigBang object.
onlySolutions Save only when a solution is reach. Defaults to FALSE (to use all the information, then a filter can be used afterwards). See BigBang object.
collectMode information to collect. Defaults to "bigbang". See BigBang object.
bigbangVerbose Verbose flag for BigBang object. Defaults to 1. See BigBang object.
saveFile File name where the data is saved. Defaults to NULL which implies the name is a concatenation of classification.method, method specific parameters, file and ".Rdata". See BigBang object.
saveFrequency How often the ``current'' solutions are saved. Defaults to 50. See BigBang object.
saveVariable Internal R variable name of the saved file. Defaults to ``bigbang''. See BigBang object.
callBackFuncGALGO callBackFunc for Galgo object. See Galgo object.
callBackFuncBB callBackFunc for BigBang object. See BigBang object.
callEnhancerFunc callEnhancerFunc for BigBang object. See BigBang object.
saveGeneBreaks saveGeneBreaks vector for BigBang object. Defaults to NULL which means to be computed automatically (recommended). See BigBang object.
geneNames The gene (variable) names if they differ from the first column in file or rownames(data).
sampleNames The sample names if they differ from first row in file or colnames(data).
bigbangUserData Additional user data for BigBang object (stored in $data variable in BigBang object returned).

Details

Wrapper function. Configure all objects from parameters.

Value

A ready to use bigbang object.
*** TO DO: EXPLAIN THE STRUCTURE OF "DATA" ***

Author(s)

Victor Trevino

See Also

BigBang.

Examples

bb <- configBB.VarSelMisc(...)
bb
blast(bb)

[Package galgo version 1.0-10 Index]