19th GAMM-Seminar Leipzig on
High-dimensional problems - Numerical treatment and applications

Max-Planck-Institute for Mathematics in the Sciences
Inselstr. 22-26, D-04103 [O->]Leipzig
Phone: +49.341.9959.752, Fax: +49.341.9959.999

  19th GAMM-Seminar
January, 23th-25th, 2003
  Abstracts ->
  All seminars  
  All proceedings  
  Abstract Michael Griebel, Thu, 13.30-14.20 Previous Contents Next  
  Sparse grid methods for density estimation and regression in data mining
Michael Griebel (Uni Bonn)

High dimensional data are nowadays collected in huge amounts and in many different scientific and commercial areas. Examples are internet portals, e-warehouses or financial transactions but also hyperspectral imagery, DNA microarrays or astronomy. These data need to be analyzed to find meaningful structure in the data which then can help in further decision making. This is the aim of data mining. Here, important tasks are to estimate the density of given data points and to obtain a classifier function for given data points with associated class labels by means of regression.

We present a new approach to the density estimation and regression problem. It is based on the regularization network approach but, in contrast to other methods which employ ansatz functions associated to data points, we use basis functions coming from a grid in the usually high-dimensional feature space for the minimization process. To cope with the curse of dimensionality, we employ sparse grids. Thus, in the most simple case, only tex2html_wrap_inline19 instead of tex2html_wrap_inline21 grid points and unknowns are involved. Here d denotes the dimension of the feature space and tex2html_wrap_inline25 gives the mesh size.

Or methods compute a density estimate or a nonlinear classifier but scales only linearly with the number of data points. It is well suited for data mining applications where the amount of data is very large, but where the dimension of the feature space is moderately high. We further extend the method to so-called anisotropic sparse grids, where different a-priori chosen mesh sizes can be used for the discretization of each attribute, and we discuss a new dimension-adaptive sparse grid technique with superior properties for higher dimensional problems.

We test the method using standard test problems from the UCI repository and problems with huge synthetical data sets in up to 14 dimensions. It turns out that our new method achieves correctness rates which are competitive to that of the best existing methods.

Acknowledgement: This is joint work with Jochen Garcke, IAM, Univ. Bonn.

    Previous Contents Next  

Last updated:
30.11.2004 Impressum
Concept, Design and Realisation
[O->]Jens Burmeister (Uni Kiel), Kai Helms (MPI Leipzig)
Valid HTML 4.0!