To classify an observation, we assume that each class can be represented by a probability distribution, which might be the result of a previous estimate. An older but famous example is provided by Fisher's classification of iris species based on length measurements of their sepals and petals with class-related distributions. With the increasing relevance of machine learning methods, classification is a current research topic. Applications include object classification in image recognition or text classification, often referring to the example of spam filters. Although classification problems arise almost everywhere in the digital world and numerous algorithmic solutions are being worked on, even elementary mathematical foundations seem to have been treated only incompletely or for special cases so far.
Framing classification in terms of statistical decision theory, we consider a classification problem as a family of probability distributions $(P_i:i \in I)$ with a finite class index set $I$ being the decision space, and investigate several optimality criteria of randomised decision procedures. In this regard, we obtained the result that a generalization of the Neyman-Pearson lemma characterizes all admissible procedures, that is, procedures with minimal error probabilities. In certain binary problems, this characterization yields procedures representable by class separating nonlinear hypersurfaces. Note that hyperplanes therefore generally do not provide admissible classification, even if the training data should be linearly separable. The aim of this talk is to present some further geometrical conditions for admissibility based on the risk set, and to deduce an analytical method for determining admissible procedures, in particular those that additionally fulfill the minimax condition, and to indicate further questions we intend to pursue.