binary classification with a reject option
Abstract
Motive: The classification methods typically used in bioinformatics classify every last examples, plane if the classification is ambiguous, for illustration, when the model is roughly the separating hyperplane in linear classification. For medical applications, it may be better to classify an example entirely when thither is a sufficiently in high spirits degree of accuracy, rather than class all examples with modest accuracy. Moreover, when all examples are grouped, the classification rule has no manipulate over the accuracy of the classifier; the algorithm just aims to produce a classifier with the smallest error rate possible. In our approach, we fix the accuracy of the classifier and thereby choose a desired risk of error.
Results: Our method consists of defining a rejection neighborhood in the feature space. This region contains the examples for which classification is ambiguous. These are spurned by the classifier. The truth of the classifier becomes a user-defined parameter of the classification rule. The task of the classification rule is to downplay the rejection region with the restraint that the error rate of the classifier cost bounded by the chosen target error. This approach is likewise exploited in the sport-selection step. The results computed on both polysynthetic and proper data show that classifier accuracy is importantly improved.
Availableness: Companion Site. http://gsp.tamu.edu/Publications/rejectoption/
Contact:edward@ece.tamu.edu, hanczar_blaise@yahoo.fr
Supplementary info:Supplementary data are available at Bioinformatics online.
1 INTRODUCTION
Microarrays provide synchronic expression measurements for thousands of genes and are directly utilised in many an fields of checkup research. One of the most likely applications is the foretelling of a biological parameter based on the gene-aspect profile. For example, locution profiles can be utilised to differentiate different types of tumors with several outcomes and thereby assist in the selection of a therapeutic discourse. This task consists of exploitation a training microarray dataset to build a classifier with which to make a prediction for an unacknowledged patient. Diverse methods from pattern recognition have been used: unsubdivided discriminant analysis (Dudoit et aliae., 2002), support transmitter machines (Furey et alii., 2000), system networks (Khan et al., 2001), etc. Even if these methods bring on classifiers with a good accuracy, precise often they are withal insufficiently accurate to be used in medical applications. A diagnostic surgery a option of therapeutic scheme mustiness comprise supported on a real high confidence classifier.
As typically applied in the context of gene-expression classification (e.g., in the antecedently cited works), classifiers assort completely examples level if the classification is dodgy, for exercise when the deterrent example is close to the separating hyperplane. Happening the other hand, a physician confronted with ambiguous symptoms may refer the patient role to another specializer instead of giving an unsafe diagnosing. If this concept is implemented in the classification model, then IT may be more reclaimable in applicative medial application. For instance, in cancer treatment, knowing the type of cancer is a crucial broker to defining a efficient therapeutic strategy. A classifier with a 20% error rate in predicting the genus Cancer character of an arbitrary patient may be useless. It can preferable to have a classifier that predicts the malignant neoplastic disease typecast of only a contribution of the patients with a very high accuracy, with the other patients beingness handled past opposite techniques.
In this article, we recall the concept of classification with reject option supported Chou dynasty's possibility. A rejection alternative is added to classical classification methods and determines whether a given example will be classified or rejected (not classified advertisement). Then we present our method acting of classification supported Chow's works (Chow, 1970) in the context of gene-formula data. The error rate of the classifier becomes a parameter of the classification rein that is chosen by the user. The eruditeness undertaking is to minimize the rate of rejection with respect to the given error rate. We show how to implement this sort of classifier in the context of wrapper feature article selection. We mental testing and show the usefulness of the proposed method on both artificial and real data.
2 THEORY OF Categorization WITH REJECT Choice
Debate a compartmentalisation problem with two classes, C={C 1, C 2}, where an example is characterized away a feature transmitter x∈R p and a label y∈C. The posterior chance is circumscribed by the Bayes's formula:
where p(C i ) is the prior chance of class C i , p(x|C i ) is the conditional probability of x given C i and p(x) is the probability of x. A classifier is a office f :R p →C which divides the feature space into two regions, R 1,R 2, one for each foretold assort, such that x∈R i means that f(x)=C i . The performance of a classifier is measure aside its error rate,
which is the probability of fashioning an incorrect classification. The accuracy of a classifier is defined as the probability of making a correct decision.
The classifier minimizing the error is called the Bayes classifier. It predicts the class having the highest posterior chance:
It is not possible to obtain a better accuracy than with the Thomas Bayes classifier.
If the accuracy of the Thomas Bayes classifier is not sufficient for the task at deal, then single potty take the approach not to classify all examples, only only the those for which the posterior probability is sufficiently high. Based on this principle, Chow (1970) bestowed an optimal classifier with reject option. A rejection region R reject is outlined in the feature space and completely examples belonging to this realm are jilted by the classifier. An example x is accepted only if the probability that x belongs to C i is higher than or equal to a tending probability threshold t:
The classifier rejects an example if the prediction is non sufficiently reliable. The rejection rate is the probability that the classifier rejects the lesson,
The acceptance rate is the chance that the classifier accepts an example,
In classification with reject option, we can delimit two types of error. The error, ɛ[f], is the chance of fashioning an incorrect compartmentalization. The conditional error,
is the chance of making an incorrect classification, given the classifier has accepted the example. We have the pursuing basic properties:
There is a general relation between the error and rejection rate: the error rate decreases monotonically while the rejection rate increases (Eats, 1970). Based on this relation, Chow proposes an optimal error versus reject tradeoff.
In Chow's theory, an optimal classifier can be found only if the true posterior probabilities are known. This is rarely the case in exercise. Fumera et Heart of Dixie. (2000) show that Chow's rule out does not execute well if a significant error in probability approximation is present. In this display case, they claim that defining divergent thresholds for each class gives better results. The compartmentalisation rule becomes:
Although this kind of classifier is common in the machine encyclopedism community of interests, information technology is rarely used in microarray-based classification. Musical note that this method is close-hauled to the notion of soft compartmentalisation. The main difference is that in cottony classification, the posterior probabilities are the output of the classifier. In sorting with reject option, a decision is successful based along these posterior probabilities. The end product of the classifier is a class or a rejection.
In classifier with rejection option, the key parameters are the thresholds t i that define the reject areas. Several strategies have been proposed to find an best reject rule. Landgrebe et al. (2006) delimit 3D ROC curves for a classifier, where the axes represent truth positive rate, the false positive rate rejected aside the classifier and the false positive rate accepted by the classifier. The optimal thresholds are chosen by maximizing the loudness under the 2D grade-constructed. Dubuisson and Masson (1993) propose a rejection rule for problems where the classes are not well known. They include two rejection options: an ambiguity reject when an example is situated in the area between several classes and a outdistance reject for examples far from the samples of legendary classes. Li and Sehi (2006) propose to control the fault instead of determination a trade-murder between rejection and error rates. They redevelop the job as: presented an fault rate for each class, design a classifier with the smallest rejection rate. Our approach is similar in that we propose to control the conditional computer error value of the classifier, not the error.
3 IMPLEMENTATION FOR BIOINFORMATICS
In this section, we present our method of categorization with reject choice in the context of gene-expression-based compartmentalization. A classifier with reject option is composed of two elements: a classifier sit and a set of thresholds. We explain this in the undermentioned sections and show how include this concept in the boast selection. In this article, we cumber our exploit to 2-class classification problems; multi-course of instruction trouble will be studied in future whole caboodle.
3.1 Classifier model
For binary classification, a classifier is a map f :R p →{0, 1}; however, a classifier can be defined via a discriminant function d :R p →R, where the sign of the function is used to predict the label of a given representative: d(x)≤0 implies f(x)=C 1 and d(x)>0 implies f(x)=C 2. By treating classification in this context, the distance, |d(x)|, of the output from the origin can be old to represent the confidence of the categorization. Of interest in the present circumstance is that, whereas Chow's possibility is defined victimisation the posterior probabilities, information technology is non necessary to cipher them to utilize a rejection rule. The rejection region can be outlined directly via d(x).
Figure 1 illustrates ii course of study-conditional densities for the discriminant, the density related to to C i determining probabilities corresponding to d(x) given C i . The ii vertical lines represent two thresholds, t 1 and t 2, the light hoar area between t 1 and t 2 being the rejection region. The area to the left of t 2 is the region where examples are classified into the class C 1. In this domain, the dusky gray area represents the probability p(f(x)=C 1,accept|y=C 2). We define the conditional error of class C 2 by
Equivalently,
which shows how the sour leaden region gives the conditional erroneous belief of class C 2. The conditional error ɛ1 cond is defined analogously. Note that the conditional errors depend on both thresholds.
Fig. 1.
Probability dispersion of the two classes along the classifier turnout.
Fig. 1.
Chance distribution of the two classes on the classifier output.
3.2 Threshold excerpt
The task is to quality thresholds to define regions for the two classes and the rejection region. This prime determines the error reject tradeoff. As seen in Section 2, single optimization strategies have been proposed. Our method is to fix a target shape error, ɛ i *, for each class. These conditional errors become parameters of the algorithm and the learning objective is not to belittle the error but to denigrate the rejection rate below the constraints ɛ i cond ≤ɛ i *. If t 1 and t 2 are two thresholds, t 2<t 1, and so the problem can represent pattern as an optimization problem with three constraints:
This minimization problem is represented by Figure 2. The two axes correspond to the values of the thresholds t 1 and t 2, and the three constraints are represented by the tierce lines, (1), (2) and (3). The domain of validity is represented by the white region. Minimizing t 1 −t 2 is equivalent to minimizing t 1 and maximizing t 2. The solution is represented on the figure by the junction point of the lines (1) and (2). Banker's bill the bound of the restraint (3) corresponds to classifiers where t 1 =t 2, i.e. classifiers with no reject option. On this line, the origin corresponds to the regular classifier where there is a single limen at 0.
Fig. 2.
Representation of the optimisation problem. The two axis tally the values of the two thresholds t 1 and t 2. The three lines (1), (2) and (3) represent the deuce-ac constraints of the optimization problem. The Stanford White region represents the domain of validity. The dotted lines represent the heuristic rule search to find the optimal solution.
Fig. 2.
Representation of the optimization problem. The two axis vertebra correspond the values of the two thresholds t 1 and t 2. The three lines (1), (2) and (3) represent the three constraints of the optimization trouble. The ovalbumin region represents the domain of validity. The specked lines represent the heuristic search to find the optimal solution.
We propose an iterative subroutine to witness the resolution of this optimization problem. For a given value of ɛ1 *, let the function g ɛ1 * (t 1)=t 2 [resp. g ɛ2 * (t 2)=t 1] gives the value of t 2 (resp. t 1) for any value of t 1(resp. t 2). t 1 and t 2 are initialized with their maximum and minimum values, respectively, in the search space, represented by the point on the upper left corner in Figure 2. We alternately minimize t 1 with respect to the restraint ɛ1 cond ≤ɛ1 * and maximize t 2 with respect to the restraint ɛ2 cond ≤ɛ2 *. At the i-th iteration, the threshold dyad is (t 1 i ,t 2 i ), and at the next iteration, t 1 i+1 =g ɛ1 * (t 2 i ) and t 2 i+1 =g ɛ2 * (t 1 i+1). This procedure is iterated until t 1 cannot be decreased and t 2 cannot constitute accumulated. The search is represented in Figure 2 by the dotted line.
Since the functions g ɛ1 * (t 1)=t 2 and g ɛ2 * (t 2)=t 1 are monotonely depreciative, the search converges to a specific solution, except in two special cases. First, when the domain of rigour is barren there is no root. It does not exist a classifier satisfying the constraints for the target errors. Second, there are several solutions, all solutions being of the type t 1 =t 2, meaning these solutions are stand for to classifiers with zero reject selection. In this case it not necessary to use a reject pick; the regular classifier is sufficiently accurate to respect the target errors.
Resolution this minimisation trouble requires estimating the density probabilities of the two classes on the classifier output. This appraisal is done aside Gaussian kernel density estimation method (Silverman, 1986), the rule being to applied a Gaussian distribution on all points and sum all these distributions.
Since the conditional errors hinge upon the classifier, information technology is immodest to use different subsets to learn the classifier and to compute the thresholds; otherwise the probability estimates used for finding the thresholds will tend to live low-coloured. This means that the training dataset, S train , should be split into S model and S thres , with the classifier erudite on S model and so the thresholds constructed using S thres and the learned classifier.
3.3 Feature selection
For sport selection we adapt sequential forward look (SFS) to classification with reject option. In the familiar application of SFS, the features providing the lowest error rate are selected; however, in the reject scenario, the selection criterion is no thirster the error rate simply is or else the size up of the rejection area. As the search return, we select the feature providing the lowest rejection rate under the dependant misplay constraints ɛ1 cond ≤ɛ1 * and ɛ2 cond ≤ɛ2 *. American Samoa we have previously noted, the threshold computation can fail when there is zero answer to the optimization trouble. If the selection of a feature leads to this showcase, then there is no classifier and this feature is directly removed from the potential drop selectable features for this iteration. For the next looping, this feature will glucinium tested again. In the case where each features lead to failing classifiers, the selection is done by selecting the feature that minimizes the error rate of the classifier with no reject option. This case may fall out in the first iterations of the feature selection, when the information contained in the chosen features does not earmark the building a classifier respecting the target error constraints.
4 RESULTS AND DISCUSSION
We present results showing the advantage of using a rejection choice in classification and the limitations of this method. The experiments wont some synthetic and real data. The experiments on synthetic data permit very accurate estimations of the mistake and rejection rates. The experiments on serious information require the use of sample distribution methods to estimate the error rate and it has been shown that these methods are inaccurate for small-sample problems (Hanczar et Heart of Dixie., 2007); nonetheless, we inst them under this codicil to illustrate the method connected real data, retention in mind that, as always with small samples, the experiments victimisation synthetic information are more definitive owing to healthier error appraisal. In all experiments, we are interested exclusively in the conditional computer error and to simplify the notation we will call this term the fault. We assume that the place errors are equal: ɛ* =ɛ1 * =ɛ2 *, and we compute only the total error rate ɛ. We compare our method acting with classifiers with no eliminate option and with classifiers using posterior probabilities. The classifier with hinder probabilities, described in Section 2, has a fixed pre-characterised doorsill. If the posterior probabilities are lower than this threshold, then the example is rejected. In the following sections, we present some representative results. Supplementary results and details on experimental design can live found in the company website.
4.1 Synthetic data
The synthetic data are generated from real microarray dataset. We use three microarray datasets: Costa Rican colon, breast and lung cancer datasets, which are detailed in the next sections. A dataset is reduced to its 30 best genes, based on their t-test tons. Then Gaussian mixing models are fit for each of the two classes. N/2 and 5000 examples are, respectively, generated for each class to form the training and trial set. Finally, 1970 noise features are added to the training and tryout sets. A noise feature is generated for the two classes from the same Gaussian distribution whose mean and standard mistake are of same order as the other features. Altogether, the synthetic data has two equally likely classes, a training set of N examples, a test set of 10000 examples, 30 relevant features and 1970 irrelevant features. Thomas More details are presented on the companion website.
Figure 3 shows the results on synthetic data generated from a colon cancer dataset, N = 200, the classification formula is the a SVM with simple center. The dotted dividing line corresponds to the error rate of the classifier with no reject option. In panel A, the full stemma represents the error rate of classifier with rule out option whose the target error rate is 0.1. In impanel B, the full job represents the error order of classifier using the posterior probabilities whose doorsill is to 0.1. The gray-haired histogram represents the rejection rate whose scale is connected the left axis. Up to 20 features are selected aside the SFS procedure. In panel A, we see that with atomic number 102 reject option the error rate decreases during the first foursome iterations and then stays around 0.15. If we apply our algorithmic program with target erroneousness rate 0.1, the error is always around 0.1. The reject rate is around 0.6 with a minimum at 0.4 for terzetto selected features. Note that in categorization with zero reject option the error rate begins to decrease strongly and so increase slowly with the number of chosen features, thereby exhibiting the peaking phenomenon (Hua et al., 2005). For the classifier with reject option, the computer error rank is stable about the target erroneous belief for any number of elite features. It is interesting to note that the peaking phenomenon send away make up observed with the rejection rate, the best solution in the reject background corresponding to the classifier that accepts the maximum of examples. In panel B, we see the wrongdoing range of classifier using posterior probabilities is between 0.07 and 0.08 and the rejection rate is higher than 0.75. Compared to our method, the classifier using tail end probabilities is more correct but rejects much examples. The tradeoff error/rejection is healthier in our method because in some methods we respect the constraints (error ≤0.1) but our method rejects less examples.
Fig. 3.
Result of classification on imitation data based on Colon dataset. N = 200 and the classification rule is a lineal SVM. The dotted line represents the error rate of classifier with no rejection. In board A, the overladen stoc represents the error rate of classifier with reject pick whose target error range is 0.1. In panel B, the full line represents the error value of classifier using the posterior probabilities whose threshold is to 0.1. The gray histogram represents the rejection charge per unit whose scale is happening the left axis.
Fig. 3.
Result of classification on artificial data based on colon dataset. N = 200 and the classification govern is a linear SVM. The dotted line represents the error rate of classifier with nary rejection. In panel A, the full line represents the error rate of classifier with reject choice whose target error rate is 0.1. In panel B, the full line represents the error rate of classifier victimization the posterior probabilities whose threshold is to 0.1. The gray histogram represents the rejection rate whose scale is along the left axis.
Other experiment using the colon cancer dataset has been cooked in which we change the objective error rank. We use the Lapp parameters as in the former experiment take out that the count of selected features is fixed to 10. The classifier with nobelium reject pick allay produces an error range of 0.15. We construct classifiers with reject option with different target error rates. The results are presented in the Name 4. We find out that the errors of classifiers are very around the object error, meaning that the restraint on target error is respected. The rejection rate decreases as the target erroneousness increases, going away from 0.91 for ɛ* =0.05 to 0.12 for ɛ* =0.15. Increasing the target error makes the problem easier, the threshold region decreases, and more examples are noncontroversial.
Fig. 4.
Results of classification with reject selection on arranged data. The full line represents the error value of the classifier and the dotted line represents the situation where error equal target misplay. The gray histogram represents the rejection rate whose plate is on the left axis.
Libyan Islamic Fighting Group. 4.
Results of classification with reject pick on artificial data. The ladened line represents the error rate of the classifier and the dotted line represents the situation where error equalized target area error. The gray histogram represents the rejection order whose scale is on the left axis.
At the last point of the figure (ɛ* =0.15), the poin error is the same A the error of the classifier with no reject pick, which has been found by directly applying the classification rule. One might expect that in this situation there would be none rejection sphere and all the examples would represent received. This is not the case: 12% of the examples are rejected, even through the classifier with no turn dow option shows that it would be potential to classify all examples at the quarry error. This manifest anomaly occurs because for the classifier with pooh-pooh option the training data have been evenly split into two sets, ace for model learnedness and the other for threshold computation. That means the classification rule is applied on only if the fractional of the preparation set in the case of classification with reject option and, therefore, the classifier designed with reject pick is less powerful than the classifier designed with nary reject option. This is the first restriction: if the fair game misplay is just about the error obtained by the classifier with nobelium reject option, then there is no gain to using the classifier with reject option.
Algorithm performance is influenced aside the training set size. Figure 5 shows the results on the lung cancer dataset with a SVM classifier. The error rates of the classifier without and with reject alternative are described with the cross and R-2 lines, respectively. The difficulty of classifier design depends on the size of the education set, the big the training set, the easier the design. Therefore, it is not appropriate to fix the same target error for all preparation set sizes. We consume selected to set the target fault to the uncomplete of the error obtained by the classifier with no reject option. This target error is represented away the flecked line. The rejection rate is represented by the Thomas Gray histogram. We see that for a training set size of N = 200 or more the objective error is redoubtable with a rejection rate between 0.25 to 0.5. For N = 100 the misplay rate of the classifier with no refuse option is 0.09 and the target misplay is 0.045. The classifier with reject option does not respect the target fault restraint, its error being 0.06. With N = 50 the fault rate of the classifier with no reject option is 0.18 and the target error is 0.09. There are atomic number 102 results for the classifier with reject option because classifier construction fails, there were no solution to the optimization problem during the threshold computation stone's throw. These problems are cognate to the density estimations of the classes on the classifier output. This estimation is finished N/4 examples for each of the two classes, which means 12 and 25 examples for the N = 50 and N = 100 problems, respectively. When the number of examples is too low, the tightness estimations are very away and lead to bad spurn options. This is the second limitation of the method. If the number of examples is as well low to approximate accurately the class densities, and so the classifier construction may betray or the classifier not observe the target error constraints.
Fig. 5.
Result of classification on artificial data based on lung dataset. N = 200 and the classification rule is a rectilinear SVM. The cross business line represents the error rate of classifier with no rejection alternative. The ring line represents the computer error rate of classifier with reject option whose object error pace is diagrammatic by the broken line. The gray histogram represents the rejection rate whose scale is on the left axis.
Fig. 5.
Consequence of sorting along artificial information based connected lung dataset. N = 200 and the classification ruler is a linear SVM. The thwartwise line represents the mistake rate of classifier with no rejection option. The ring line represents the error rate of classifier with reject option whose target error order is represented by the dotted phone line. The gray histogram represents the rejection rate whose musical scale is on the left bloc.
4.2 Literal data
We have got applied our approach on trinity real microarray datasets. We have used the lung malignant neoplastic disease dataset (Bhattacharjee, 2001) whose the task is to discriminating the adenocarcinmas from the other typewrite of cancers. The information contains 139 adenocarcinomas and 64 cancers of another type. The Colon cancer dataset (Alon et alii., 1999) contains the genetic profile of 39 patients affected by a colon cancer and 23 non-affected patients. The breast cancer dataset has (van de Vijver, 2002) 295 patients studied by a breast cancer, 115 belonging to the good-prospect separate and 180 to the miserable-prognosis class. We own rock-bottom the three datasets to a selection of the 2000 genes with highest variance.
Dissimilar the synthetic data, there is no exam do to judge classifier performances. We use k-fold cross-validation, which is an iterative aspect procedure where the data are randomly divided into k subsets. During the i-th loop, the feature article selection and model learning are done happening the k−1 subsets not containing the i-atomic number 90 subset and the fashioned classifier is evaluated connected the i-Th subset. The final estimate is the mean of the results of the k iterations. We utilisation 10-fold cross-validation in our experiments. As noted previously, cross-substantiation is not very reliable in elflike try settings (Hanczar et alia., 2007), and therefore these results should be viewed with caution.
Figure 6 shows the results of classification on the lung cancer dataset arsenic a function of the number of selected features. The target erroneous belief of each class is frozen to 0.05. In control board A, the classification govern is the Fisher discriminant. Owing to the shrill variance of double cross-validation, the error curves are unsteady; all the same, we can put forth close to putative statements. The erroneousness rate for the classifier with no reject option is decreasing until 15 features and so stays around 0.08. For the classifier with reject option, the error rate is higher than the target error rate and the rejection rate is high with <10 features, perhaps meaning that there are insufficient features, which would be consistent with the classifier with atomic number 102 reject choice. From 10 features onward, the error rate respects the target error constraints and the rejection rate is stabilized at around 0.5. In panel B the compartmentalization rules is SVM with linear gist. The erroneous belief of the classifier with no turn away option is around 0.1. For whatsoever number of selected features, the classifier with reject alternative reaches the target erroneous belief plac. From 3 features ahead, the rejection rate is stabilized at around 0.37. These results indicate that happening real data, using a reject option with the two classifiers improves their accuracy.
Libyan Islamic Grou. 6.
Result of classification on lung cancer dataset. In panel A, the classification rule is the Fisherman discriminant. Panel B is a SVM with linear kernel. The speckled production line represents the error rate of classifier with no rejection option and the full line represents the error range of classifier with scorn choice whose target mistake rate is 0.05. The gray histogram represents the rejection rate whose graduated table is on the left axis.
Fig. 6.
Result of classification on lung Cancer the Crab dataset. In board A, the categorization rule is the Fisher discriminant. Panel B is a SVM with collinear inwardness. The dotted line represents the misplay plac of classifier with No rejection option and the full line represents the error rate of classifier with reject selection whose target computer error rate is 0.05. The gray histogram represents the rejection range whose scale is on the left bloc.
Every bit previously remarked, there are limitations to these methods. E.g., a low list of examples has a bad impact on the results. For the colon cancer dataset, the classifier with no spurn option has an error rate from 0.15 to 0.17. With the target computer error hard to 0.1, the error rate of classifier with reject choice is highly variable, from 0.14 to 0.53, and is very much high than the target error. The rejection rate is very inebriated, always>0.9. These indigent results are non unexpected because the colon cancer dataset contains 39 and 23 patients for the two classes. That means during the hybridize-validation procedure, the probability densities of the 2 classes are estimated with only 17 and 10 examples, respectively. With indeed small a number of examples, density estimation is very inaccurate and leads to wrong thresholds.
The breast cancer dataset illustrates another limitation of trying to improve compartmentalisation truth with a reject option. The target error is set to 0.2 and the error of the classifier with reject option varies between 0.28 to 0.53. The rejection rate is identical sharp, >0.85. Furthermore, classifier construction fails 75% of the time. Therein case, the problem does not come from the threshold computation but from the feature-label distribution and the class split in the sample information. The error rate of the classifier with atomic number 102 reject option is between 0.3 and 0.35 but the good-prognosis class represents only 34% of all examples. This means that the classifier has the same truth as the majority classifier that predicts all examples to be in the pathetic-diagnosis class. In issue, the classifier does non discriminate between the two class densities. Computation of the threshold cannot improve the accuracy of the classifier. This result demonstrates the last restriction of our method acting: if the stock classifier has no discriminatory power, and then the incorporation of a rule out option wish not improve its accuracy.
5 Last
We have presented a new approach of the sorting of factor-expression data. The precept is to add an cull option to the regular classifier. Lone the examples for which the classification is sufficiently reliable are grouped. The rejection region is defined by two thresholds. If an good example belongs to the reject domain, then the example is rejected; otherwise, it is unchallenged. Unlike regular classifier, the proposed method allows the user to ascendance the error rate of the classifier. The error rate become a parameter of the classifier design and performance now depends of the rejection rate. The classifier respecting the target error constraint with minimal rejection rate is the Best. We have also shown how to include this approach in feature selection. A reject pick can be added to many classification rules. We have proven information technology on the Fisherman discriminant and SVM.
We have shown on both synthetic and real data that this method can significantly improve classifier accuracy; nevertheless, we have shown three conditions for which the method acting cannot be used: (1) if the target erroneousness is close-fitting to the error obtained by the classifier with nary reject option, then there is no benefit to use the classifier with scorn option; (2) if the sample size is too low to obtain a decent estimates of the class densities, then the classifier excogitation may fail or the classifier not respect the target error constraint and (3) if the rhythmical classifier lacks discriminatory power on the far side that of the majority classifier on the sampling data, then adding of a rejection option will non meliorate its truth. Using a classifier with constraint selection can facilitate the construction of more time-tested classifiers for medical application where trust of the diagnosing must be precise high.
ACKNOWLEDGEMENTS
We would corresponding to acknowledge the Translational Genomics Research Institute, the French Ministry of Foreign Personal matters, and the NSF (CCF-0514644) for providing support for this research.
Conflict of Interest. none declared.
REFERENCES
, et alibi.
Broad patterns of gene aspect revealed by clustering analysis of tumor and median colon tissues probed away oligonucleotide arrays
,
PNAS
,
1999
, vol.
96
(pg.
6745
-
6750
)
.
Classification of human lung carcinomas away mRNA expression profiling reveals distinct adenocarcinoma subclasses
,
PNAS
,
2001
, vol.
98
(pg.
13790
-
13795
)
.
On optimum recognition error and reject tradeoff
,
IEEE Trans. Inf. Possibility
,
1970
, vol.
16
(pg.
41
-
46
)
, .
A statistical decision rule with sketchy knowledge about classes
,
Pattern Recognit.
,
1993
, vol.
26
(pg.
155
-
165
)
, et al.
Comparing of discrimination methods for classification of tumors victimisation factor expression information
,
J. Am. Stat. Assoc.
,
2002
, vol.
97
(pg.
77
-
87
)
, et al. .
Multiple reject thresholds for improving classification dependability
,
Advances in Pattern Recognition: Joint IAPR International Workshops, SSPR 2000 and SPR 2000.
,
2000
Spain
Alicante
pg.
863
, et al.
Support transmitter machine classification and validation of cancer tissue samples using microarray expression data
,
Bioinformatics
,
2000
, vol.
16
(pg.
906
-
914
)
, et al.
Decorrelation of the true and estimated classifier errors in high-dimensional settings
,
EURASIP J. Bioinform. Syst. Biol.
,
2007
Article ID 38473.
, et al..
Optimal number of features as a mathematical function of sample size for various classification rules
,
Bioinformatics
,
2005
, vol.
21
(pg.
1509
-
1515
)
, et al.
Sorting and diagnostic prediction of cancers using factor grammatical construction profiling and artificial neuronal networks
,
Nat. Med.
,
2001
, vol.
7
(pg.
673
-
679
)
, et al.
The interaction between classification and reject performance for distance-founded eliminate-choice classifiers
,
Pattern Recognit. Lett.
,
2006
, vol.
27
(pg.
908
-
917
)
, .
Confidence-based classifier design
,
Pattern Recogn.
,
2006
, vol.
39
(pg.
1230
-
1240
)
.,
Density Estimation.
,
1986
Chapman and Hall
.
A gene-expression touch as a prognosticator of survival in breast cancer
,
N. Engl. J. Med.
,
2002
, vol.
347
(pg.
1999
-
2009
)
Source notes
Associate Editor: Olga Troyanskaya
© The Author 2008. Published aside Oxford University University Weigh. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org
binary classification with a reject option
Source: https://academic.oup.com/bioinformatics/article/24/17/1889/263502
Posted by: nguyenmayind.blogspot.com

0 Response to "binary classification with a reject option"
Post a Comment