Improving OPTICS Algorithm with Imperialist Competitive Algorithm: Choosing Automatically Best Parameters
Abstract
Clustering based on similarity is one of the most important stages in data analysis and a beneficial tool for data mining. There are a wide range of data from which a clear pattern cannot be driven using common clustering methods, data with unusual shapes. Scientists introduced density-based clustering algorithms to resolve this issue and enable clustering for this kind of data.Among all density-based clustering algorithms, Ordering points to identify the clustering structure (OPTICS) is an algorithm for finding density-based clusters in spatial data. Density-based methods describe clusters as dense areas among diffused ones. An informal definition of cluster is: “For each intra-cluster point, neighborhood with specific radius Ɛ, must contains minimum number of point’s µ.” It’s very difficult to specify these two parameters for any types of data and needs great effort setting up them to achieve desired results.The main goal of this research is to use meta-heuristic methods especially Imperialist Competitive Algorithm (ICA) to precise estimation of these parameters (Ɛ, µ) so that we can apply them to OPTICS Algorithm to achieve accurate and high quality clusters for any data sets. In order to evaluation of mentioned method, we utilized data sets specialized for classification and removed class label. Comparing our results with original one, proved that our method is produced a precise class label.
Keywords
Full Text:
PDFReferences
C. J. Matheus, P. K. Chan, G. Piatetsky-Shapiro, Systems for knowledge discovery in databases, Knowledge and Data Engineering, IEEE Transactions on 5 (6) (1993) 903-913.
N. Soni, A. Ganatra, Categorization of several clustering algorithms from different perspective: a review, International Journal of Advanced Research in Computer Science and Software Engineering 2 (8) (2012) 63-68.
Ankerst, Mihael, et al. "OPTICS: ordering points to identify the clustering structure." ACM Sigmod Record. Vol. 28. No. 2. ACM, 1999.
M. Ester, H.-P. Kriegel, J. Sander, X. Xu, A density-based algorithm for discovering clusters in large spatial databases with noise., in: Kdd, Vol. 96, 1996, pp. 226-231.
M. C. Cowgill, R. J. Harvey, L. T. Watson, A genetic algorithm approach to cluster analysis, Computers & Mathematics with Applications 37 (7) (1999) 99-108.
M. Mitchell, An introduction to genetic algorithms, MIT press, 1998.
J. H. Holland, Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence., U Michigan Press, 1975.
A. Colorni, M. Dorigo, V. Maniezzo, et al., Distributed optimization by ant colonies, in: Proceedings of the first European conference on artificial life, Vol. 142, Paris, France, 1991, pp. 134-142.
S. Kirkpatrick, M. P. Vecchi, et al., Optimization by simmulated annealing, science 220 (4598) (1983) 671-680.
E. Atashpaz-Gargari, C. Lucas, Imperialist competitive algorithm: an algorithm for optimization inspired by imperialistic competition, in: Evolutionary Computation, 2007. CEC 2007. IEEE Congress on, IEEE, 2007, pp. 4661-4667.
J. C. Dunn, Well-separated clusters and optimal fuzzy partitions, Journal of cybernetics 4 (1) (1974) 95-104.
T. Calinnski, J. Harabasz, A dendrite method for cluster analysis, Communications in Statistics-theory and Methods 3 (1) (1974) 1-27.
M. K. Pakhira, S. Bandyopadhyay, U. Maulik, Validity index for crisp and fuzzy clusters, Pattern recognition 37 (3) (2004) 487-501.
D. L. Davies, D. W. Bouldin, A cluster separation measure, Pattern Analysis and Machine Intelligence, IEEE Transactions on (2) (1979) 224-227.
C.-H. Chou, M.-C. Su, E. Lai, A new cluster validity measure and its application to image compression, Pattern Analysis and Applications 7 (2) (2004) 205-220.
archive.ics.usi.edu.
www.mathworks.com.