A promising alternative is to base prediction entirely upon the relative expression ordering of a small number of genes. The simplest varia tion on this theme is to base classification on ratios of expression values, first introduced in a heuristic way in and independently developed as a general, data driven procedure, the TSP algorithm, by, and later applied to learn cancer biomarkers and induce elementary predic tion rules for cancer diagnosis and prognosis in. It has also recently been applied to differentiate between gastrointestinal stromal tumors and leiomyosarcomas, resulting in a nearly perfect two gene classifier, and to pre dict response to the farnesyltransferase inhibitor tipi farnib in acute myeloid leukemia. Specifically, one need only compare the expression values among two genes, thus providing a specific hypothesis for follow up studies.
The TSP algorithm is illustrated in Figure 1 for the Lung data from, where the objective is to distinguish between malignant pleural mesothelioma and adenocarcinoma of the lung. The purpose of this example is only to visualize the TSP decision process, not to re analyze the Lung data. In the left panel, MPM and ADCA samples are well separated by comparing the expression values of the genes KIR2DL3 and ROCK2 whereas in the right panel the comparison is based on BIN1 and Anxa4. The high accuracy obtained, roughly 98%, corroborates the findings in, in which several genes are first identified based on fold changes, standard t tests, expression cutoffs, etc,and then multiple ratios are formed and used both individually and in combina tion.
In contrast, the pairs in Figure 1 are unrestricted, allowing for non differentially expressed genes to appear. Whereas ad hoc, and not rank invariant, the approach in illustrates the power and transparency of simple deci sion rules. A remaining obstacle to an even broader applicability of the TSP methodology is the heterogeneity of molecular mechanisms underlying the same disease phenotype. In cancer, for example, tumors that would look similar under a microscope can present different expression pat terns. Batimastat When this is the case, it is a challenge to identify sin gle pairs with good discrimination. This is illustrated by a simple, artificial example in Figure 2. There are two latent subclasses among the cancer samples, captured by the two relative orderings between gene 1 and gene 2. These could be two genes whose activity is suffi cient to activate the same cancer related pathway, or they could each flag the activation of alternative cancer related pathways. Using these two genes alone we cannot distin guish between a normal and an ill patient based on their relative ordering since the cancer phenotype can have either ordering.