Some Statistical Challenges Coming with Non-Euclidean Data
- Stephan Huckemann (University of Göttingen)
Abstract
A wide range of statistical applications rely on the central limit theorem (CLT) which assures for random vectors, upon existence of second moments, asymptotic square-root-n-normality of the mean. For deviates taking values in a Riemannian manifold, the celebrated CLT of Bhattacharya and Patrangenaru (2005) extends this result to Fréchet means of the intrinsic distance, under a collection of additional assumptions.
In the sequel the Battacharya-Patrangenaru CLT has been extended to Fréchet means of more general distances, e.g. Procrustes distances on shape spaces to "distances" between data and a data-descriptor, a geodesic say, and even to nested sequences of random descriptors, such as principal nested spheres.
In particular, the latter is a generalization of the classical asymptotics of principal components by Anderson (1963). Still, for all of these general scenarios, analogs of the above mentioned additional assumptions are required to hold true. Meanwhile, the geometric meaning of these assumptions has been exemplary studied: the distribution near the cut locus of the mean may be responsible for rates slower than square-root-n with non-normal limiting distributions, and we call this phenomenon smeariness. Asymptotic nonnormality may also be observed on noncomplete manifolds, even without cut loci. For nonmanifold spaces allowing only for a Riemannian stratification, for example BHV phylogentic tree spaces, "infinitely fast" rates have been observed, making straightforward statistical testing impossible, this phenomenon is called stickiness.