R: Wilcoxon-based variable selection in cross-validation (CV) and Monte-Carlo cross-validation (MCCV)

wilcox.selection.split {WilcoxCV}

R Documentation

Wilcoxon-based variable selection in cross-validation (CV) and Monte-Carlo cross-validation (MCCV)

Description

The function wilcox.selection.split performs variable ordering based on the Wilcoxon rank sum test for all niter CV or MCCV iterations.

Usage

wilcox.selection.split(x,y,split,algo="new",pvalue=FALSE)

Arguments

`x`	a matrix or a data frame of size n x p giving the expression levels of the p variables (genes) for the n observations (arrays). Variables correspond to columns, observations to rows.
`y`	a vector of length n giving the class membership for the n observations (arrays). `y` can be either a factor or a numeric and must be coded as 0,1.
`split`	A `niter` x `ntest` matrix giving the indices of the `ntest` observations included in each of the `niter` test sets, as generated by the functions `generate.split` or `generate.cv`. The i-th row of `split` gives the indices of the observations included in the test data set for the i-th random splitting iteration.
`algo`	either `"new"` or `"naive"`. If `type="new"`, the new fast method described in Boulesteix (2007) is used. If `type="naive"`, results are obtained by running the function `wilcox.test` `niter` times.
`pvalue`	Logical. Should p-values be returned?

Details

The Wilcoxon rank sum statistic is defined as the sum of the X-ranks of the observations with y=0. The Wilcoxon rank sum test is equivalent to the Mann-Whitney test. It is implemented in the function wilcox.test.

In the context of cross-validation (CV) or Monte-Carlo cross-validation (MCCV), wilcox.selection.split computes the Wilcoxon rank sum statistic for each iteration, for each variable. At each iteration, a subset of the n observations is excluded from the data set and considered as test data set. The indices of the observations considered as test set for each of the niter iterations are given in the niter x ntest matrix split.

Value

A list with the following components:

ordering.split A niter x p matrix giving the indices of the genes ordered by pvalue. For example, the first column of ordering.split gives the index of the variable with lowest pvalue in each of the niter random splitting iterations, the second column of ordering.split gives the index of the variable with the second lowest pvalue in each of the niter random splitting iterations. For the i-th iteration, the indices of the 50 best variables are given in the 50 first columns of row i.

pvalue.split Returned only if pvalue=TRUE. A niter x p matrix of pvalues. The element in the i-th row and j-th column is the pvalue of variable j in the i-th iteration.

Author(s)

Anne-Laure Boulesteix (http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/020_professuren/boulesteix/index.html)

References

A. L. Boulesteix (2007). WilcoxCV: an R package for fast variable selection in cross-validation. Bioinformatics 23:1702-1704.

Examples

# load WilcoxCV library
library(WilcoxCV)

# Generate data
x<-matrix(rnorm(1000),100,10)
y<-sample(c(0,1),100,replace=TRUE)

# Generate 50 MCCV splits with ratio 2:1 for a data set including 90 observations
my.split<-generate.split(niter=50,n=90,ntest=30)

# Compute the Wilcoxon rank sum statistic for the 50 iterations.
wilcox.selection.split(x=x,y=y,split=my.split,algo="new",pvalue=TRUE)

[Package WilcoxCV version 1.0-2 Index]