Link Based Cluster Ensemble Approach with Different Similarity Measures
In Data mining, to subdivide the dataset Clustering algorithms generally use a distance metric based (e.g., Euclidean) similarity measure such that data points belonging to same partition are more similar than points in different partitions. When the data is categorical, Clustering becomes more challenging problem, that is, when there is no inherent distance measure between data values. Various clustering algorithms are available to cluster or categorize the datasets. Some algorithms cannot be directly applied for clustering of categorical data. The latent ensemble information matrix presents only cluster data point relations, with many entries being left unknown. This paper presents an analysis that shows this problem degrades the quality of the clustering result, and it presents a new link-based approach, which improves the conventional matrix through similarity between clusters in an ensemble, by discovering unknown entries. Hence, a graph partitioning technique is applied to a weighted bipartite graph that is obtained from the refined matrix, to obtain the final clustering result,.
Keywords- Clustering, categorical data, cluster ensembles, data mining, similarity measures.