R: Gets the “best” models using top-ranked genes and a forward-selection strategy

forwardSelectionModels.BigBang {galgo}

R Documentation

Gets the “best” models using top-ranked genes and a forward-selection strategy

Description

Gets the ``best'' models using top-ranked genes and a forward-selection strategy.

Usage

## S3 method for class 'BigBang':
forwardSelectionModels(.O,
        filter="none",
        subset=TRUE,
        geneIndexSet=NULL,
        starti=NULL,
        endi=NULL,
        fitnessFunc=if (!is.function(.O$data$modelSelectionFunc)) .O$galgo$fitnessFunc else .O$data$modelSelectionFunc,
        minFitness=NULL,
        plot=TRUE,
        decision=c("overall", "average"),
        plot.type=c("lines", "boxplot"),
        approach=c("fitness", "error"),
        pch=20,
        result=c("all", "models", "fitness"),
        threshold=0.99,
        main=.O$main,
        mord=50,
        mcol=8,
        rcol=(if (mcol < 2) c(rep(1, mord), 0) else c(cut(1:mord, breaks = mcol, labels = FALSE), 0)),
        classFunc=.O$data$classFunc,
        compute.classes=is.function(classFunc),
        cex=1,
        ...)

Arguments

`filter`	The `BigBang` object can save information about solutions that did not reach the `goalFitness`. `filter=="solutions"` ensures that only chromosomes that reach the `goalFitness` are considered. `fitlter=="none"` take all chromosomes. `filter=="nosolutions"` consider only no-solutions (for comparative purposes).
`subset`	Second level of filter. `subset` can be a vector specifying which filtered chromosomes are used. It can be a logical vector or a numeric vector (indexes in order given by `$bestChromosomes` in `BigBang` object variable). If it is a numeric vector length one, a positive value means take those top chromosomes sorted by fitness, a negative value take those at bottom.
`geneIndexSet`	The genes index to use (ignoring `filter` and `subset`). If this is not specified the indexes are computed using `filter` and `subset`.
`starti`	Vector of initial index positions of models to test. If specified, should be the same length than `endi`. If omitted, the default repeat `1` until the same length than `endi`.
`endi`	Vector of final index positions of models to test.
`fitnessFunc`	The function that evaluate the performance (fitness) of every model (chromosome). The real measure is the ``mean'' computed from the resulted values for every chromosome. Thus `fitnessFunc` can returns a single numeric value (as in `$galgo$fitnessFunc`) or a numeric vector (as in `$data$modelSelectionfunc`). The default is `$data$modelSelectionFunc` unless it is `NULL` and `$galgo$fitnessFunc` is used.
`minFitness`	The minimum fitness requested. All models with mean fitness above this value will be reported. `NULL` specify the usage of the maximum fitness from the results. `"se*sp"` use the maximum value computed by multipling the sensitivity and specificity when `compute.classes==TRUE`.
`decision`	Specify how to select the model. `"overall"` select the model based on the accuracy of all samples whereas `"average"` selects the model based in the average accuracy per class. If the number of samples per class is exactly the same, both results are equal. The default is `"overall"`. If `classFunc` is not specified or `compute.classes==FALSE`, `decision` is forced to `"overall"`.
`plot`	Logical value indicating whether the result should be displayed.
`plot.type`	`"lines"` draws a line joining points. `"boxplot"` add a boxplot when the `fitnessFunc` returns more than one value.
`approach`	`"fitness"` draws fitness. `"error"` draws error (1-fitness).
`result`	Specify the desired output. `"models"` will report only the models above the `minFitness`. `"fitness"` will report only the fitness of the models above the `minFitness`. `"all"` (default) will report both models and fitness in a list including all computed fitnesses and class prediction accuracies (if `compute.classes==TRUE`).
`threshold`	Specify the percentage of `minFitness` for selecting models.
`mord`	Specify the number of top-ranked genes (`plot()` and others MISSING *). Defaults to 50. It should not be less than the maximum `endi`.
`mcol`	Specify the number of section for top-rank colouring.(`plot()` and others MISSING *)
`rcol`	Specify the colours of sections.(`plot()` and others MISSING *)
`classFunc`	Function that predict the class. The default is `$data$classFunc`.
`compute.classes`	Specify that class accuracies are desired (and plotted). In non-classification problems, it should be `FALSE`.
`pch,main,cex`	Plot parameters.
`...`	Other parameters used for `plot`, `fitnessFunc` and `classFunc`.

Details

It is expected that the fitnessFunc computes the overall fitness (the proportion of correctly classify samples regardless of their classes). However, this value could be slightly different to the curve marked as "(avg)" which is the average fitness per class. This difference is due to the different number of samples per class and the number of times specifc samples where used to be part of the test set in both, the fitness function and the class function.

Value

Depends on result.

Author(s)

Victor Trevino. Francesco Falciani Group. University of Birmingham, U.K. http://www.bip.bham.ac.uk/bioinf

References

Goldberg, David E. 1989 Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Pub. Co. ISBN: 0201157675

Examples

   #bb is a BigBang object
   fsm <- forwardSelectionModels(bb)
   fsm
   names(fsm)
   heatmapModels(fsm, subset=1)
   fsm <- forwardSelectionModels(bb, minFitness=0.9,
   fitnessFunc=bb$galgo$fitnessFunc)
   heatmapModels(fsm, subset=1)
   pcaModels(fsm, subset=1)
   fitnessSplits(bb, chromosomes=list(fsm$models[[1]]))

[Package galgo version 1.0-10 Index]