Visualise your fitted
non-linear dimension reduction model
in the high-dimensional data space

Jayani P. G. Lakshika

Joint work with Prof Dianne Cook, Dr Paul Harrison, Dr Michael Lydeamore, Dr Thiyanga S. Talagala

Motivation

Single-cell gene expression: same data, different NLDR + hyper-parameters

Which is the most reasonable representation of the structure(s) present in the
high-dimensional data?

How do you decide which is the most reasonable representation?

This is the published figure.

Peripheral Blood Mononuclear Cells (PBMC)

Here is the \(9\text{-}D\) data viewed using a grand tour, linear projections into \(2\text{-}D\).

Software: langevitour

Show “model-in-the-data-space”

data-in-the-model-space

model-in-the-data-space

data-in-the-model-space

What is the model?

data-in-the-model-space

model-in-the-data-space

Overview of method

1. Construct the \(2\text{-}D\) model

2. Lift the model into high-dimensions

Steps of the algorithm

1. Construct the \(2\text{-}D\) model

NLDR layout, b. hexagon bins (hex_binning() and geom_hexgrid()), c. bin centroids (merge_hexbin_centroids()), d. triangulated centroids (tri_bin_centroids(), gen_edges(), update_trimesh_index(), and geom_trimesh()).

Steps of the algorithm

2. Lift the model into high-dimensions

avg_highd_data()

show_langevitour()

Factors for fitting and measuring fit

NLDR layout, different methods and different hyper-parameters
Number of bins
Bin start position
Low density removal (find_low_dens_hex())

HBE in high-dimensions: The square root of the sum of squared differences between observed and fitted values (glance())

\[\sqrt{\frac{1}{n}\sum_{h = 1}^{b}\sum_{i = 1}^{n_h}\sum_{j = 1}^{p} ({x}_{hij} - C^{(p)}_{hj})^2}\] \(n =\) the number of observations,

\(b =\) the number of bins,

\(n_h =\) the number of observations in \(h^{th}\) bin,

\(p =\) the number of variables,

\({x}_{hij} =\) the \(j^{th}\) dimensional data of \(i^{th}\) observation in \(h^{th}\) hexagon.

HBE of candidates

Chosen fit for PBMC data set

tSNE with perplexity: 30

Clusters with small separations, non-linear clusters

Densed points, filled out clusters

Prediction into \(2\text{-}D\)

Predict a new observation’s value in the NLDR, for any method (predict_emb())
For a new observation
- Determine the closest bin centroid in high-dimensions using fitted model
- Predict it to be the centroid of this bin in \(2\text{-}D\)

quollr

questioning how a high-dimensional object looks in low-dimensions using r

Interactivity

Summary

Provided a method to create a model from a NLDR layout that
can be displayed with the data to assess the fit.

Make it easier for researchers to make better decisions on which
NLDR layout is best for their work.

It has the additional benefit that for any method you can now
provide predictions for new data, of where these points will be
positioned in the NLDR.

R package

Draft paper

Jayani P.G. Lakshika

Collaborators: Prof Dianne Cook, Dr Paul Harrison, Dr Michael Lydeamore, Dr Thiyanga S. Talagala

Visualise your fitted non-linear dimension reduction model in the high-dimensional data space

Motivation

How do you decide which is the most reasonable representation?

Overview of method

Steps of the algorithm

Steps of the algorithm

Factors for fitting and measuring fit

HBE of candidates

Chosen fit for PBMC data set

Prediction into \(2\text{-}D\)

Interactivity

Summary

Visualise your fitted
non-linear dimension reduction model
in the high-dimensional data space