Simultaneous learning of samples and features based on extremes

archetypal analysis

matrix factorization

Published

November 16, 2022

Introduction

Archetypal analysis was introduced by Cutler & Breiman (1994). Let \(X\) be a (real-valued) data matrix where the rows represent the samples of the data and the columns, the features. They defined the archetypes as convex combinations of the data samples, i.e. \(Z = BX\) where \(B\) is a stochastic matrix. In addition, the data samples are approximated by convex combinations of the archetypes, i.e. \(X \simeq AZ\) where \(A\) and is also stochastic matrix. This is equivalent to solve the following optimization problem:

\[
\begin{aligned}
\mathop{\mathrm{arg\,min\,}}_{A,B} \quad & \|X - ABX \|^2 \\
\textrm{s.t.} \quad & \\
& \sum\nolimits_{k=1}^K A_{mk} = 1 \text{ with } A_{mk} \in [0, 1] \text{ for each } m=1,\dots, M \\
& \sum\nolimits_{m=1}^M B_{km} = 1 \text{ with } B_{km} \in [0, 1] \text{ for each } k=1,\dots, K \\
\end{aligned}
\tag{1}\]

TODO: Interpretation

BiArchetype Analysis

In BiAA, the archetypes are assumed to be convex combinations of the data in both dimensions, i.e. \(Z = BXC\) where \(B\) and \(C\) are stochastic matrices. At the same time, the data is approximated by convex combinations of the archetypes, i.e. \(X \simeq AZD\) where \(A\) and \(D\) are also stochastic matrices.

This is equivalent to solve the following optimization problem:

\[
\begin{aligned}
\mathop{\mathrm{arg\,min\,}}_{A,B,C,D} \quad & \ell(X|ABXCD) \\
\textrm{s.t.} \quad & \\
& \ell(X | \tilde{X}) \text{ should be a loss function} \\
& \sum\nolimits_{k=1}^K A_{mk} = 1 \text{ with } A_{mk} \in [0, 1] \text{ for each } m=1,\dots, M \\
& \sum\nolimits_{m=1}^M B_{km} = 1 \text{ with } B_{km} \in [0, 1] \text{ for each } k=1,\dots, K \\
& \sum\nolimits_{n=1}^N C_{nl} = 1 \text{ with } D_{nl} \in [0, 1] \text{ for each } l=1,\dots, L \\
& \sum\nolimits_{l=1}^L D_{ln} = 1 \text{ with } D_{ln} \in [0, 1] \text{ for each } n=1,\dots, N \\
\end{aligned}
\tag{2}\]

In Equation 2, just as Seth & Eugster (2016) proposed for archetypal analysis, \(\ell\) could be a negative log-likelihood function. Therefore,

for Bernoulli distributions \(\ell\) is defined as \[
\ell(X | \tilde{X}) = -\sum_{m=1}^M \sum_{n=1}^N X_{mn}\ln (\tilde{X}_{mn}) + (1 - X_{mn}) \ln (1 - \tilde{X}_{mn})
\tag{3}\]

and for normal distributions, \[
\ell(X | \tilde{X}) = MN \ln \left(\sigma {\sqrt {2\pi }}\right) + {\frac {1}{2\sigma^2}}\sum_{m=1}^M \sum_{n=1}^N \left( {X_{mn}-\tilde{X}_{mn} }\right)^{2}
\tag{4}\]