forwardSelectionModels.BigBang {galgo} | R Documentation |
Gets the ``best'' models using top-ranked genes and a forward-selection strategy.
## S3 method for class 'BigBang': forwardSelectionModels(.O, filter="none", subset=TRUE, geneIndexSet=NULL, starti=NULL, endi=NULL, fitnessFunc=if (!is.function(.O$data$modelSelectionFunc)) .O$galgo$fitnessFunc else .O$data$modelSelectionFunc, minFitness=NULL, plot=TRUE, decision=c("overall", "average"), plot.type=c("lines", "boxplot"), approach=c("fitness", "error"), pch=20, result=c("all", "models", "fitness"), threshold=0.99, main=.O$main, mord=50, mcol=8, rcol=(if (mcol < 2) c(rep(1, mord), 0) else c(cut(1:mord, breaks = mcol, labels = FALSE), 0)), classFunc=.O$data$classFunc, compute.classes=is.function(classFunc), cex=1, ...)
filter |
The BigBang object can save information about solutions that did not reach the goalFitness . filter=="solutions" ensures that only chromosomes that reach the goalFitness are considered. fitlter=="none" take all chromosomes. filter=="nosolutions" consider only no-solutions (for comparative purposes). |
subset |
Second level of filter. subset can be a vector specifying which filtered chromosomes are used. It can be a logical vector or a numeric vector (indexes in order given by $bestChromosomes in BigBang object variable). If it is a numeric vector length one, a positive value means take those top chromosomes sorted by fitness, a negative value take those at bottom. |
geneIndexSet |
The genes index to use (ignoring filter and subset ). If this is not specified the indexes are computed using filter and subset . |
starti |
Vector of initial index positions of models to test. If specified, should be the same length than endi . If omitted, the default repeat 1 until the same length than endi . |
endi |
Vector of final index positions of models to test. |
fitnessFunc |
The function that evaluate the performance (fitness) of every model (chromosome). The real measure is the ``mean'' computed from the resulted values for every chromosome. Thus fitnessFunc can returns a single numeric value (as in $galgo$fitnessFunc ) or a numeric vector (as in $data$modelSelectionfunc ). The default is $data$modelSelectionFunc unless it is NULL and $galgo$fitnessFunc is used. |
minFitness |
The minimum fitness requested. All models with mean fitness above this value will be reported. NULL specify the usage of the maximum fitness from the results. "se*sp" use the maximum value computed by multipling the sensitivity and specificity when compute.classes==TRUE . |
decision |
Specify how to select the model. "overall" select the model based on the accuracy of all samples whereas "average" selects the model based in the average accuracy per class. If the number of samples per class is exactly the same, both results are equal. The default is "overall" . If classFunc is not specified or compute.classes==FALSE , decision is forced to "overall" . |
plot |
Logical value indicating whether the result should be displayed. |
plot.type |
"lines" draws a line joining points. "boxplot" add a boxplot when the fitnessFunc returns more than one value. |
approach |
"fitness" draws fitness. "error" draws error (1-fitness). |
result |
Specify the desired output. "models" will report only the models above the minFitness . "fitness" will report only the fitness of the models above the minFitness . "all" (default) will report both models and fitness in a list including all computed fitnesses and class prediction accuracies (if compute.classes==TRUE ). |
threshold |
Specify the percentage of minFitness for selecting models. |
mord |
Specify the number of top-ranked genes (*plot() and others *** MISSING ***). Defaults to 50. It should not be less than the maximum endi . |
mcol |
Specify the number of section for top-rank colouring.(*plot() and others *** MISSING ***) |
rcol |
Specify the colours of sections.(*plot() and others *** MISSING ***) |
classFunc |
Function that predict the class. The default is $data$classFunc . |
compute.classes |
Specify that class accuracies are desired (and plotted). In non-classification problems, it should be FALSE . |
pch,main,cex |
Plot parameters. |
... |
Other parameters used for plot , fitnessFunc and classFunc . |
It is expected that the fitnessFunc
computes the overall fitness (the proportion of correctly classify samples regardless of their classes). However, this value could be slightly different to the curve marked as "(avg)"
which is the average fitness per class. This difference is due to the different number of samples per class and the number of times specifc samples where used to be part of the test set in both, the fitness function and the class function.
Depends on result
.
Victor Trevino. Francesco Falciani Group. University of Birmingham, U.K. http://www.bip.bham.ac.uk/bioinf
Goldberg, David E. 1989 Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Pub. Co. ISBN: 0201157675
For more information see BigBang
.
*plot()
,
*heatmapModels()
,
*pcaModels()
.
#bb is a BigBang object fsm <- forwardSelectionModels(bb) fsm names(fsm) heatmapModels(fsm, subset=1) fsm <- forwardSelectionModels(bb, minFitness=0.9, fitnessFunc=bb$galgo$fitnessFunc) heatmapModels(fsm, subset=1) pcaModels(fsm, subset=1) fitnessSplits(bb, chromosomes=list(fsm$models[[1]]))