Search

Talk

Structure validation in clustering by stability analysis

  • Joachim M. Buhmann (Department of Computer Science, ETH Zürich)
A3 01 (Sophus-Lie room)

Abstract

Partitioning of data sets into groups defines an important preprocessing step for compression, prototype extraction or outlier removal. Various criteria of connectedness or proximity have been proposed to group data according to structural similarity but in general it is unclear which method or model to use. In the spirit of information theory we propose a decision process to determine the extractable information from data conditioned on a hypothesis class of structures. Maximizing the amount of information which can be reliably learned from data in the presence of noise selects appropriate models. Empirical evidence for this model selection concept is provided by cluster validation in bioinformatics and in computer security, i.e., the analysis of microarray data and multilabel clustering of Boolean data for role based access control.

Katharina Matschke

MPI for Mathematics in the Sciences Contact via Mail