Sparse grid methods for density estimation and regression in data mining
Michael Griebel (Uni Bonn)
High dimensional data are nowadays collected in huge amounts and in many different scientific and commercial areas. Examples are internet portals, ewarehouses or financial transactions but also hyperspectral imagery, DNA microarrays or astronomy. These data need to be analyzed to find meaningful structure in the data which then can help in further decision making. This is the aim of data mining.
Here, important tasks are to estimate the density of given data points
and to obtain a classifier function for given data points with associated class labels by means of regression.
We present a new approach to the density estimation and regression problem.
It is based on the regularization network approach
but, in contrast to other methods which employ ansatz functions associated
to data points, we use basis functions coming from a grid in the usually
highdimensional feature space for the minimization process.
To cope with the curse of dimensionality, we employ sparse grids. Thus, in the most simple case,
only instead of grid points and unknowns
are involved. Here d denotes the dimension of the feature space and
gives the mesh size.
Or methods compute a density estimate or a nonlinear classifier but scales only linearly
with the number of data points.
It is well suited for data mining applications
where the amount of data is very large, but where the dimension of the
feature space is moderately high.
We further extend the method to socalled anisotropic sparse grids,
where different apriori chosen mesh sizes can be used for the discretization of
each attribute, and we discuss a new dimensionadaptive sparse grid technique with superior properties for higher dimensional problems.
We test the method using standard test problems from the UCI repository and
problems with huge synthetical data sets in up to 14 dimensions.
It turns out that our new method achieves
correctness rates which are competitive to that of the best existing methods.
Acknowledgement: This is joint work with Jochen Garcke, IAM, Univ. Bonn.
