R: Creates and configure all objects needed for a “variable selection for classificacion” problem

configBB.VarSel {galgo}

R Documentation

Creates and configure all objects needed for a “variable selection for classificacion” problem

Description

Creates and configure all objects needed for a ``variable selection for classificacion'' problem. It configures Gene, Chromosome, Niche, World, Galgo and BigBang objects.

Usage

configBB.VarSel(
        file=NULL, 
        data=NULL, 
        classes=NULL, 
        train=rep(2/3,333), 
        test=1-train, 

        main="project",

        classification.method=c("knn","mlhd","svm","nearcent","rpart","nnet","user"),
        classification.test.error=c(0,1),
        classification.train.error=c("kfolds","splits","loocv","resubstitution"),
        classification.train.Ksets=-1, # -1 : max(min(round(13-n/11),n),3) n=samples, n <=50: n/4,  n<=100, n/10, else 3
        classification.train.splitFactor=2/3, 
        classification.rutines=c("C","R"),
        classification.userFitnessFunc=NULL,

        scale=(classification.method[1] 

        knn.k=3,
        knn.l=1,
        knn.distance=c("euclidean", "maximum", "manhattan", "canberra", "binary", "minkowski", "pearson", "kendall", "spearman", "absolutepearson","absolutekendall", "absolutespearman"),

        nearcent.method=c("mean","median"),

        svm.kernel=c("radial","polynomial","linear","sigmoid"),
        svm.type=c("C-classification", "nu-classification", "one-classification"),
        svm.nu=0.5,
        svm.degree=4,
        svm.cost=1,

        nnet.size=2,
        nnet.decay=5e-4,
        nnet.skip=TRUE,
        nnet.rang=0.1,

        geneFunc=runifInt,
        chromosomeSize=5, 
        populationSize=-1, 
        niches=2, 
        worlds=1,
        immigration=c(rep(0,18),.5,1), 
        crossoverPoints=round(chromosomeSize/2,0), 
        offspringScaleFactor=1,
        offspringMeanFactor=0.85,
        offspringPowerFactor=2,
        elitism=c(rep(1,9),.5),
        goalFitness=0.90, 
        galgoVerbose=20, 
        maxGenerations=200, 
        minGenerations=10, 
        galgoUserData=NULL, # additional user data for galgo

        maxBigBangs=1000, 
        maxSolutions=1000, 
        onlySolutions=FALSE, 
        collectMode="bigbang", 
        bigbangVerbose=1, 
        saveFile="bigbang.Rdata", 
        saveFrequency=50,
        saveVariable="bigbang",
        callBackFuncGALGO=function(...) 1,
        callBackFuncBB=plot,
        callEnhancerFunc=function(chr, parent) NULL,
        saveGeneBreaks=NULL,
        geneNames=NULL,
        sampleNames=NULL,
        bigbangUserData=NULL # additional user data for bigbang
        )

Arguments

`file`	The file containing the data. First row should be sample names. First column should be variable names (genes). Second row must be the class for every sample if `classes` is not provided.
`data`	If a file is not provided, `data` is the a data matrix or data frame with samples in columns and genes in rows (with its respective colnames and rownames set). If `data` is provided, `class` must be specified.
`classes`	if a file is not provided, specifies the classes for the data. If the `file` is provided and classes is specified, the second row of the file is considered as data.
`train`	A vector of the proportion of random samples to be used as training sets. The number of sets is determined by the length of `train`. The `train+test` should never be greather than 1. All sets are randomly chosen with the same proportion of samples per class than the original sample set.
`test`	A vector of the proportion of random samples to be used as testing sets. The number of sets is determined by the length of `train`. All sets are randomly chosen with the same proportion of samples per class than the original sample set.
`main`	A string or ID related to your project that will be used in all plots and would help you to distinguish results from different studies.
`classification.method`	The method to be used for classification. The current available methods (in this package) are `"knn", "mlhd", "svm", "nearcent" (nearest centroid), "rpart" (recursive partitioning trees), and "nnet" (neural networks, experimental, not recommendable), "user" is for classification problems but the user provides a specific function.`
`classification.test.error`	Vector of two weights specifing how the fitness function is evaluated to compute the test error. The first value is the weight of training and the second the weight of test. The default is c(0,1) which consider only test error. The sum of this values should be 1.
`classification.train.error`	Specify how the training set is divided to compute the error in the training set (in `evolve` method for `Galgo` object). The fitness function really compute `1-error` where `error` is always computed from the proportion of samples that has been incorrectly classified. `"kfolds"` (k-fold-cross-validation) compute `K` non overlapping sets (`classification.train.Ksets`) attempting to conserve class proportions. `"splits"` compute `K` (`classification.train.Ksets`) random splits. `"loocv"` (leave-one-out-cross-validation) compute `K=training samples`. `"resubstitution"` no folding at all; it is faster and provided for quick overviews.
`classification.train.Ksets`	The number of training set folds/splits. Negative means automatic detection (n=samples, max(min(round(13-n/11),n),3)).
`classification.train.splitFactor`	When `classification.train.error=="splits"`, specifies the proportion of samples used in spliting the training set.
`classification.rutines`	For most of the methods, `R` and `C` code has been provided. `C` code is preferred for performance reason, however finding mistakes is easier in R. Besides, the example code could be used as a guide for new user fitness functions. `"rpart"` has not `C` code. `"svm"` has only some improvments removing redundancy checks.
`classification.userFitnessFunc`	For `classification.method == "user"`, specify the function that would be used to compute the accuracy and class prediction. The required prototype is `function(chr, parent, tr, te, result)` where `chr` is the chromosome to be evaluated, a convertion using `as.numeric` is commonly needed to extract the exact values from the chromosome. `parent` would be the `BigBang` object where all their variables are exposed. The fitness function commonly use `parent$data$data`, which has been trasposed. `tr` is the vector of samples (rows) that MUST be used as training and `te` the samples that must be used as test. They can correspond to training and test in the evolution or in any other context (as the computation of the confusion matrix or the forward selection). The fitness function should return the result in two different formats, which is specified in the `result` parameter. `result` is `0` (zero) when the predicted class for the test is required (as an integer, not as a factor) otherwise the it is expected the number of correctly classified samples from the test vector.
`scale`	`TRUE` instruct to scale all rows for zero mean and unitary variance.
`knn.k`	For KNN method, `knn.k` is the number of nearest neighbours to consider.
`knn.l`	For KNN method, `knn.l` is the number
`knn.distance`	The distance to be used in KNN method. Possible values are `"euclidean", "maximum", "manhattan", "canberra", "binary", "minkowski", "pearson", "kendall", "spearman", "absolutepearson","absolutekendall", "absolutespearman"` (see `dist` method).
`nearcent.method`	For nearest centroid method, `nearcent.method` specify the method for computing the centroid (`"mean", "median"`).
`svm.kernel`	For SVM (support vector machines) method, specify the kernel method `"radial","polynomial","linear" or "sigmoid"` (see `svm` method in `e1071` package).
`svm.type`	For SVM method, specify the type of classificacion.
`svm.nu`	For SVM method and `nu-classification` specify the `nu` value.
`svm.degree`	For SVM method and `polynomial` kernel, specify the degreee value.
`svm.cost`	For SVM method, specify the `C` value (cost).
`nnet.`	Parameters for neural networks classification. See `nnet` package.
`geneFunc`	The function that provides random values for genes. The default is runifInt, which generates a random integer value with a uniform distribution.
`chromosomeSize`	Specify the chromosome size (the number of variables/genes to be included in a model). Defaults to 5. See `Gene` and `Chromosome` objects.
`populationSize`	Specify the number of chromosomes per niche. Defaults is min(20,20+(2000-nrow(data))/400). See `Chromosome` and `Niche` objects.
`niches`	Specify the number of niches. Defaults to 2. See `Niche`, `World` and `Galgo` objects.
`worlds`	Specify the number of worlds. Defaults to 1. See `World` and `Galgo` objects.
`immigration`	Specify the migration criteria.
`crossoverPoints`	Specify the active positions for crossover operator. Defaults to a single point in the middle of the chromosome. See `Niche` object.
`offspringScaleFactor`	Scale factor for offspring generation. Defaults 1. See `Niche` object.
`offspringMeanFactor`	Mean factor for offspring generation. Defaults to 0.85. See `Niche` object.
`offspringPowerFactor`	Power factor for offspring generation. Defaults to 2. See `Niche` object.
`elitism`	Elitism probability/flag/vector. Defaults to c(1,1,1,1,1,1,1,1,1,0.5) (elitism present for 9 generations followed by a 50% chance, then repeated). See `Niche` object.
`goalFitness`	Specify the desired fitness value (fraction of correct classification). Defaults to 0.90. See `Galgo` object.
`galgoVerbose`	`verbose` parameter for `Galgo` object.
`maxGenerations`	Maximum number of generations. Defaults to 200. See `Galgo` object.
`minGenerations`	Minimum number of generations. Defaults to 10. See `Galgo` object.
`galgoUserData`	Additional user data for the `Galgo` object. See `Galgo` object.
`maxBigBangs`	Maximum number of bigbang cycles. Defaults to 1000. See `BigBang` object.
`maxSolutions`	Maximum number of solutions collected. Defaults to 1000. See `BigBang` object.
`onlySolutions`	Save only when a solution is reach. Defaults to FALSE (to use all the information, then a filter can be used afterwards). See `BigBang` object.
`collectMode`	information to collect. Defaults to `"bigbang"`. See `BigBang` object.
`bigbangVerbose`	Verbose flag for `BigBang` object. Defaults to 1. See `BigBang` object.
`saveFile`	File name where the data is saved. Defaults to `NULL` which implies the name is a concatenation of `classification.method`, method specific parameters, `file` and `".Rdata"`. See `BigBang` object.
`saveFrequency`	How often the ``current'' solutions are saved. Defaults to 50. See `BigBang` object.
`saveVariable`	Internal `R` variable name of the saved file. Defaults to ``bigbang''. See `BigBang` object.
`callBackFuncGALGO`	`callBackFunc` for `Galgo` object. See `Galgo` object.
`callBackFuncBB`	`callBackFunc` for `BigBang` object. See `BigBang` object.
`callEnhancerFunc`	`callEnhancerFunc` for `BigBang` object. See `BigBang` object.
`saveGeneBreaks`	`saveGeneBreaks` vector for `BigBang` object. Defaults to `NULL` which means to be computed automatically (recommended). See `BigBang` object.
`geneNames`	The gene (variable) names if they differ from the first column in `file` or `rownames(data)`.
`sampleNames`	The sample names if they differ from first row in `file` or `colnames(data)`.
`bigbangUserData`	Additional user data for `BigBang` object (stored in `$data` variable in `BigBang` object returned).

Details

Wrapper function. Configure all objects from parameters.

Value

A ready to use bigbang object.
*** TO DO: EXPLAIN THE STRUCTURE OF "DATA" ***

Author(s)

Victor Trevino

Examples

bb <- configBB.VarSel(...)
bb
blast(bb)

[Package galgo version 1.0-10 Index]