Skip to main content Skip to section menu

U.W. Bangor - School of Informatics - Mathematics Preprints 1999

Pattern Recognition & Fuzzy Systems


99.04 : KUNCHEVA, L.I.

Using measures of similarity and inclusion for multiple classifier fusion by decision templates

Abstract:

Decision templates (DT) are a technique for classifier fusion for continuous-valued individual classifier outputs. The individual outputs considered here sum up to the same value (e.g., statistical classifiers, yielding some estimates of the posterior probabilities for the classes). First, the DT fusion algorithm is explained. Second, we show that two similarity measures (S_1 and S_2) and two inclusion indices (I_1 and I_2) between fuzzy sets (see Dubois and Prade, 1980) produce the same DT classifier. The equivalence is proven by showing that for every object submitted for classification, all 4 measures induce the same ordering on the set of class labels (through DT fusion), thereby assigning the object to the same class.

Published in:

Fuzzy Sets and Systems 122 (2001) 401-407.

Download:

gzipped postscript: lkfss.ps.gz


99.05 : KUNCHEVA, L.I. and STEIMANN, F.

Fuzzy Diagnosis (Editorial)

Abstract:

Starting from the pioneering publication of Lotfi Zadeh in 1965, fuzzy sets have been applied to many fields in which uncertainty plays a key role. Medicine, often on the borderline between science and art, is an excellent exponent: vagueness, linguistic uncertainty, hesitation, measurement imprecision, natural diversity, subjectivity -- all these are prominently present in medical diagnosis.
The paper defines "fuzzy diagnosis" in broad sense and outlines two stages:
Stage 1 -- using patient record as the main source of information, AI type of modeling, no automatic tuning of membership functions; and
Stage 2 -- medical signal and image processing, merging AI and pattern recognition methodologies, automatic tuning of membership functions and extraction of rules from data.
Next we discuss when and why we need fuzzy class labels in medicine, and point again at the contradiction between transparency and accuracy of fuzzy diagnostic systems.

Published in:

Artificial Intelligence in Medicine 16(2) (1999) 121-128.

Download:

gzipped postscript: lkaim.ps.gz


99.16 : BEZDEK, J.C., KELLER, J.M., KRISHNAPURAM, R. & KUNCHEVA, L.I.,

Will the real Iris data please stand up?

Abstract:

This correspondence points out several published errors in replicates of the well-known IRIS data, which was collected in 1935 by Anderson, but first published in 1936 by Fisher.

Published in:

IEEE Transactions on Fuzzy Systems 7(3) (1999) 368-369.


99.17 : BEZDEK, J.C. & KUNCHEVA, L.I.,

Fuzzy pattern recognition

Abstract:

An encyclopaedia article on fuzzy pattern recognition. Fuzzy labelling and fuzzy clustering are the main topics.

Published in:

Wiley Encyclopedia of Electrical and Electronics Engineering,
John G. Webster (ed.), John Wiley and Sons, 8 (1999) 173-181.


99.22 : KUNCHEVA, L.I. & BEZDEK, J.C.

Presupervised and postsupervised prototype classifier design

Abstract:

The generalized nearest prototype classifier (GNPC) uses ``soft'' labeling of the prototypes in the classes. Based on how the prototypes are found we distinguish between _presupervised_ and _postsupervised_ GNPC designs. We derive the conditions for optimality (relative to the standard Bayes error rate) of two designs where prototypes represent: (1) the components of class-conditional mixture densities (presupervised design) or (2) the components of the unconditional mixture density (postsupervised design). An artificial data set and the ``satimage'' data set from the database ELENA are used to experimentally study the two approaches. A Radial Basis Function (RBF) network is used as a representative of each GNPC type. Neither the theoretical nor the experimental results indicate clear reasons to prefer one of the approaches. The postsupervised GNPC design tends to be more robust and less accurate than the presupervised one.

Published in:

IEEE Transactions on Neural Networks 10(5) (1999) 1142-1152.

Download:

gzipped postscript: lktnn.ps.gz


99.23 : KUNCHEVA, L.I., WRENCH, J., JAIN, L.C. & AL-ZAIDAN, A.S.

A fuzzy model of heavy metal loadings in Liverpool Bay

Abstract:

We design a fuzzy model of the loadings of 10 heavy metals in Liverpool bay. Each metal concentration is associated with a fuzzy set ``contaminated'', defined over the set of 70 sampling sites. The higher the concentration, the higher the degree of membership of the site. Six overall loading indices are calculated using aggregation connectives between fuzzy sets. The loading indices are then interpolated and plotted on a map. A visual inspection shows that: (i) product aggregation is most indicative for the locations of the disposal grounds; (ii) mean aggregation reflects well the sediment movement in the bay; (iii) maximum aggregation indicates all highly contaminated sites. The proposed fuzzy model is easy to implement and the results are directly interpretable.

Published in:

Environmental Modeling and Software 15 (2000) 161-167.

Download:

gzipped postscript: lkems.ps.gz


99.31 : KUNCHEVA, L.I. & JAIN, L.C.

Nearest Neighbor Classifier: Simultaneous Editing and Feature Selection

Abstract:

Nearest neighbor classifiers demand significant computational resources (time and memory). Editing of the reference set and feature selection are two different approaches to this problem. Here we encode the two approaches within the same genetic algorithm and simultaneously select features and reference cases. Two data sets were used: the Satimage data and a generated data set. The GA was found to be an expedient solution compared to editing followed by feature selection, feature selection followed by editing, and the individual results from feature selection and editing.

Published in:

Pattern Recognition Letters 20 (1999) 1149-1156.

Download:

gzipped postscript: lkprl.ps.gz


99.32 : KUNCHEVA, L.I. & JAIN, L.C.

Designing classifier fusion systems by genetic algorithms

Abstract:

We suggest two simple ways to use a genetic algorithm (GA) to design a multiple classifier system. The first GA version selects disjoint feature subsets to be used by the individual classifiers, whereas the second version selects (possibly) overlapping feature subsets and also the types of the individual classifiers. The two GAs have been tested with four real data sets: Heart, Satimage, Letters, and Forensic glasses (10-fold cross-validation, except for Satimage where we used only two splits). We used 3-classifier systems and basic types of individual classifiers (the linear and quadratic discriminant classifiers and the logistic classifier). The multiple classifier systems designed with the two GAs were compared against classifiers using: (a) all features; (b) the best feature subset found by the sequential backward selection (SBS) method; and (c), the best feature subset found by a GA (individual classifier!). We found that: (1) the multiple classifier system derived through the GA, Version 2, yielded the smallest training error rate in all experiments; (2) with Satimage and Forensic glasses data it also produced the smallest test error rate. Generalizing on the basis of these experiments is not straightforward because the differences between the error rates in the comparison appeared to be too small. GA design can be made less prone to overtraining by including in the fitness function penalty terms accounting for the number of features used.

Published in:

IEEE Transactions on Evolutionary Computation 4 (2000) 327--336.

Download:

gzipped postscript: lkec.ps.gz


99.33 : KUNCHEVA, L.I., BEZDEK, J.C. & DUIN, R.P.W.

Decision Templates for Multiple Classifier Fusion

Abstract:

Multiple classifier fusion may generate more accurate classification than each of the constituent classifiers. Fusion is often based on fixed combination rules like the product and average. Only under strict probabilistic conditions can these rules be justified. We present here a simple rule for adapting the class combiner to the application. c decision templates (one per class) are estimated with the same training set that is used for the set of classifiers. These templates are then matched to the decision profile of new incoming objects by some similarity measure. We compare 11 versions of our model with 14 other techniques for classifier fusion on the Satimage and Phoneme datasets from the database ELENA. Our results show that decision templates based on integral type measures of similarity are superior to the other schemes on both data sets.

Published in:

Pattern Recognition 34 (2001) 299-314.

Download:

gzipped postscript: lkpr.ps.gz


99.37 : BEZDEK, J.C. & KUNCHEVA, L.I.

Point prototype generation and classifier design

Abstract:

We consider point prototype construction for nearest prototype classifier design. The distinctions between pre- and post-supervised learning, and also between selection and extraction of point prototypes are discussed. Numerical examples based on the Iris data are given to contrast and compare various models. Our calculations suggest that: (i) presupervision yields better nearest prototype classifiers than post-supervision, independent of the type of prototypes chosen; (ii) selection is (arguably) better than extraction for finding point prototypes for classification; (iii) in post-supervised sequential (local) methods such as vector quantization may produce better prototypes than batch (global) methods such as fuzzy c-means when the number of prototypes is larger than the number of labeled classes; and (iv) among post-supervised designs, self-organizing feature maps produce classifiers that are intermediate between those based on local and global prototype updating.

Published in:

E, Oja and S. Kaski (eds.) Kohonen Maps, Elsevier, Amsterdam (1999) 71-96.


99.44 : AL-ZAIDAN, A.S. & KUNCHEVA, L.I.

Selecting fuzzy connectives to represent Heavy Metal Distribution in Liverpool Bay

Abstract:

This paper continues our previous work on constructing indices of the distribution of heavy metals in Liverpool bay. Heavy metal concentrations are measured regularly on a grid of locations (sites) in the bay. Each metal concentration is associated with a fuzzy set "contaminated", defined over the set of sites. Six overall loading indices are calculated using aggregation connectives between fuzzy sets: product, minimum, geometric mean, arithmetic mean, competition jury and maximum. Here we select a set of distinct indices using several measures of similarity between fuzzy sets. The measures and their properties are introduced and discussed. The indices are grouped by thresholding the similarity measures and a representative of each group is chosen. We conclude that a set of four loading indices based on the {minimum, product, maximum, and arithmetic mean} is required to describe fully the metal distribution, resolving the waste dumping site, highlighting the highly contaminated (or endangered) regions, and accounting for the sediment movement in the bay.

Published in:

Proc. 4th International Conference on Knowledge-Based Intelligent Engineering Systems & Allied Technologies (KES'2000), Brighton (2000) 602-605.

Download:

gzipped postscript: lkKES00b.ps.gz


Site footer