# Biarchetype analysis

Simultaneous learning of samples and features based on extremes

archetypal analysis
matrix factorization
Published

November 16, 2022

## Introduction

Archetypal analysis was introduced by Cutler & Breiman (1994). Let $$X$$ be a (real-valued) data matrix where the rows represent the samples of the data and the columns, the features. They defined the archetypes as convex combinations of the data samples, i.e. $$Z = BX$$ where $$B$$ is a stochastic matrix. In addition, the data samples are approximated by convex combinations of the archetypes, i.e. $$X \simeq AZ$$ where $$A$$ and is also stochastic matrix. This is equivalent to solve the following optimization problem:

\begin{aligned} \mathop{\mathrm{arg\,min\,}}_{A,B} \quad & \|X - ABX \|^2 \\ \textrm{s.t.} \quad & \\ & \sum\nolimits_{k=1}^K A_{mk} = 1 \text{ with } A_{mk} \in [0, 1] \text{ for each } m=1,\dots, M \\ & \sum\nolimits_{m=1}^M B_{km} = 1 \text{ with } B_{km} \in [0, 1] \text{ for each } k=1,\dots, K \\ \end{aligned} \tag{1}

TODO: Interpretation

## BiArchetype Analysis

In BiAA, the archetypes are assumed to be convex combinations of the data in both dimensions, i.e. $$Z = BXC$$ where $$B$$ and $$C$$ are stochastic matrices. At the same time, the data is approximated by convex combinations of the archetypes, i.e. $$X \simeq AZD$$ where $$A$$ and $$D$$ are also stochastic matrices.

This is equivalent to solve the following optimization problem:

\begin{aligned} \mathop{\mathrm{arg\,min\,}}_{A,B,C,D} \quad & \ell(X|ABXCD) \\ \textrm{s.t.} \quad & \\ & \ell(X | \tilde{X}) \text{ should be a loss function} \\ & \sum\nolimits_{k=1}^K A_{mk} = 1 \text{ with } A_{mk} \in [0, 1] \text{ for each } m=1,\dots, M \\ & \sum\nolimits_{m=1}^M B_{km} = 1 \text{ with } B_{km} \in [0, 1] \text{ for each } k=1,\dots, K \\ & \sum\nolimits_{n=1}^N C_{nl} = 1 \text{ with } D_{nl} \in [0, 1] \text{ for each } l=1,\dots, L \\ & \sum\nolimits_{l=1}^L D_{ln} = 1 \text{ with } D_{ln} \in [0, 1] \text{ for each } n=1,\dots, N \\ \end{aligned} \tag{2}

In Equation 2, just as Seth & Eugster (2016) proposed for archetypal analysis, $$\ell$$ could be a negative log-likelihood function. Therefore,

• for Bernoulli distributions $$\ell$$ is defined as $\ell(X | \tilde{X}) = -\sum_{m=1}^M \sum_{n=1}^N X_{mn}\ln (\tilde{X}_{mn}) + (1 - X_{mn}) \ln (1 - \tilde{X}_{mn}) \tag{3}$

• and for normal distributions, $\ell(X | \tilde{X}) = MN \ln \left(\sigma {\sqrt {2\pi }}\right) + {\frac {1}{2\sigma^2}}\sum_{m=1}^M \sum_{n=1}^N \left( {X_{mn}-\tilde{X}_{mn} }\right)^{2} \tag{4}$

## Example

Code
import numpy as np
import matplotlib.pyplot as plt

r = np.arange(0, 2, 0.01)
theta = 2 * np.pi * r
fig, ax = plt.subplots(
subplot_kw = {'projection': 'polar'}
)
ax.plot(theta, r)
ax.set_rticks([0.5, 1, 1.5, 2])
ax.grid(True)
plt.show()

## References

Cutler, A., & Breiman, L. (1994). Archetypal analysis. Technometrics, 36, 338–347. https://doi.org/10.1080/00401706.1994.10485840
Mørup, M., & Hansen, L. K. (2012). Archetypal analysis for machine learning and data mining. Neurocomputing, 80, 54–63. https://doi.org/10.1016/j.neucom.2011.06.033
Seth, S., & Eugster, M. J. A. (2016). Probabilistic archetypal analysis. Machine Learning, 102, 85–113. https://doi.org/10.1007/S10994-015-5498-8/FIGURES/14

## Citation

BibTeX citation:
@online{2022,
author = {},
title = {Biarchetype Analysis},
date = {2022-11-16},
url = {https://aleixalcacer.com/posts/2022-06-11_archetypal-analysis},
langid = {en}
}