| Title: | Quantitative Taxonomy Methods of A.A. Lyubishchev (1943) |
|---|---|
| Description: | Implements the multivariate classification methods of Alexander Alexandrovich Lyubishchev (1890-1972), as described in his 1943 manuscript 'Programma obshchey sistematiki' Lyubishchev (1943) <https://www.zin.ru/animalia/coleoptera/rus/lyubis05.htm> and published in Lubischew (1962) <https://www.jstor.org/stable/2527894>. Provides divergence_coefficient() for measuring separation between groups on continuous features, scatter_ellipse() for fitting covariance ellipses per class, transgression() for detecting ellipse overlap, and classify() for Bayesian posterior classification. These methods predate and are more general than the binary-character similarity coefficients of Sokal and Sneath (1963) that appear in other R packages. |
| Authors: | Akzhan Berdeyev [aut, cre] |
| Maintainer: | Akzhan Berdeyev <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.0 |
| Built: | 2026-06-23 09:26:59 UTC |
| Source: | https://github.com/cran/lyubishchev |
Assigns posterior class probabilities to a new specimen using the Edgeworth-Pearson multivariate Gaussian likelihood for each class scatter ellipse. For each class the log-likelihood of the specimen under a multivariate normal with the class mean and covariance is computed, and a softmax over the per-class log-likelihoods yields posterior probabilities.
classify(specimen, ellipses)classify(specimen, ellipses)
specimen |
A numeric vector of feature values for a single observation. |
ellipses |
A named list of scatter ellipses as returned by
|
The log-likelihood for class is
where is the number of features, and are
the class mean and covariance, and is the specimen.
A named list with one element per class. Each element is a list with components:
Squared Mahalanobis distance from the specimen to the class centroid.
Multivariate Gaussian log-likelihood of the specimen under the class.
Posterior probability of the class (softmax over the per-class log-likelihoods). Posteriors sum to 1 across classes.
Lubischew, A.A. (1962). On the use of discriminant functions in taxonomy. Biometrics, 18(4), 455-477.
ellipses <- scatter_ellipse(iris[, 1:4], iris$Species) specimen <- c(5.1, 3.5, 1.4, 0.2) result <- classify(specimen, ellipses) sapply(result, function(r) r$posterior)ellipses <- scatter_ellipse(iris[, 1:4], iris$Species) specimen <- c(5.1, 3.5, 1.4, 0.2) result <- classify(specimen, ellipses) sapply(result, function(r) r$posterior)
Computes Lyubishchev's divergence coefficient between two groups
measured on one or more continuous features. The coefficient summarises the
standardised separation between the group means, summed across features:
where and are the mean and (sample) variance
of feature in group . Features whose pooled variance is zero
are skipped to avoid division by zero.
divergence_coefficient(a, b)divergence_coefficient(a, b)
a |
A numeric matrix or data frame for the first group, with one row per observation and one column per feature. A numeric vector is treated as a single-feature group. |
b |
A numeric matrix or data frame for the second group, with the same
columns (features) as |
This is the measure described in Lyubishchev's 1943 manuscript and later published in English by Lubischew (1962). It predates and is more general than the binary-character similarity coefficients of Sokal and Sneath (1963), operating directly on continuous measurements.
A single numeric value, the divergence coefficient . Larger
values indicate greater separation between the groups.
Lyubishchev, A.A. (1943). Programma obshchey sistematiki [Program of General Systematics]. Manuscript, 22 November 1943.
Lubischew, A.A. (1962). On the use of discriminant functions in taxonomy. Biometrics, 18(4), 455-477.
setosa <- as.matrix(iris[iris$Species == "setosa", 1:4]) versicolor <- as.matrix(iris[iris$Species == "versicolor", 1:4]) divergence_coefficient(setosa, versicolor)setosa <- as.matrix(iris[iris$Species == "setosa", 1:4]) versicolor <- as.matrix(iris[iris$Species == "versicolor", 1:4]) divergence_coefficient(setosa, versicolor)
Fits a covariance ellipse to each class in a labelled multivariate data set.
For every class the function computes the centroid (mean vector), the
feature covariance matrix and the sample size. These ellipses are the
building blocks for transgression and classify.
scatter_ellipse(X, y)scatter_ellipse(X, y)
X |
A numeric matrix or data frame of observations, with one row per observation and one column per feature. |
y |
A vector of class labels of length |
A named list with one element per class. Each element is itself a list with components:
Numeric vector of feature means for the class.
Feature covariance matrix for the class.
Integer count of observations in the class.
The names of the list are the class labels (coerced to character).
Lubischew, A.A. (1962). On the use of discriminant functions in taxonomy. Biometrics, 18(4), 455-477.
ellipses <- scatter_ellipse(iris[, 1:4], iris$Species) ellipses[["setosa"]]$mean ellipses[["setosa"]]$n_samplesellipses <- scatter_ellipse(iris[, 1:4], iris$Species) ellipses[["setosa"]]$mean ellipses[["setosa"]]$n_samples
Tests whether two class scatter ellipses overlap, in Lyubishchev's sense of "transgression" between groups. The centroids are compared using the squared Mahalanobis distance under the pooled covariance of the two classes, and that distance is compared against a chi-squared threshold with degrees of freedom equal to the number of features. When the Mahalanobis distance is below the threshold the groups are deemed to transgress (overlap).
transgression(ellipses, class_a, class_b, confidence = 0.95)transgression(ellipses, class_a, class_b, confidence = 0.95)
ellipses |
A named list of scatter ellipses as returned by
|
class_a |
Name (character) of the first class in |
class_b |
Name (character) of the second class in |
confidence |
Confidence level for the chi-squared threshold, between 0 and 1. Defaults to 0.95. |
A list with components:
Squared Mahalanobis distance between the two centroids under the pooled covariance.
Chi-squared threshold at the requested confidence with degrees of freedom equal to the number of features.
Logical; TRUE when the distance is below the
threshold (the ellipses overlap).
Ratio of the Mahalanobis distance to the threshold. Values above 1 indicate well-separated groups.
Lubischew, A.A. (1962). On the use of discriminant functions in taxonomy. Biometrics, 18(4), 455-477.
ellipses <- scatter_ellipse(iris[, 1:4], iris$Species) transgression(ellipses, "versicolor", "virginica")ellipses <- scatter_ellipse(iris[, 1:4], iris$Species) transgression(ellipses, "versicolor", "virginica")