Interactive diagnostics for NLDR visualizations

Background

Non-linear dimension reduction (NLDR) methods, for example t-SNE or UMAP, are popular for visualizing high-dimensional data, and we can often identify clustering in the resulting views. However, these methods may also introduce artifacts in the resulting representation, showing clear patterns that are not actually present in the data. To identify such issues, it is useful to compare the NLDR views with a display that is using linear dimension reduction, for example using a grand tour, which are faithful representations of the data distribution. The two approaches should be considered complementary, with NLDR making it easier to find even small groups in the data, and tour methods allowing us to verify such findings, and potentially also to interpret them in terms of the original variables in the data.

In practice we want linked displays showing both the NLDR and the tour view, where linked brushing can be used to trace groups of points across the two displays, such that an analyst can investigate structures identified in the NLDR display.

Related work

This idea has previously been implemented in R using a Shiny app using vegawidget for the interactive display. The implementation is limited in particular with respect to display types available, and this is challenging in particular in the case of large datasets. A much more general implementation of interactive tour displays has since become available in the detourr R package. Related work is also available in the quollr package, to show the 2D model in a tour of the high-dimensional space.

Details of your coding project

In this project you will use Shiny, NLDR implementations and the detourr package to implement a GUI that can enable a flexible exploration of NLDR views via linked tour displays. In particular we will also include the slice tour and sage displays and explore to what extend these can help when diagnosing NLDR displays for large datasets. In addition, you will implement a visualization based on the quollr method for display in detourr. The app will be developed as an R package and should be made available via CRAN at the end of the project.

Expected impact

NLDR has become ubiquitous in the visualization of high-dimensional data, but the limitations are well known. We will provide an accessible tool that can be used to diagnose NLDR views, allowing users to identify potential issues, select views that are more accurately representing structures in the data and to build trust in interpretations derived from the NLDR view.

Mentors

EVALUATING MENTOR: Ursula Laa [email protected] has contributed to the liminal package development and has extensive experience with multivariate data visualization and the development of R packages. She has mentored a successful GSoC project in 2024.
Eun-Kyung Lee [email protected] is an expert in tour visualizations and has been developing R packages since 2003.

Tests

Contributors, please do one or more of the following tests before contacting the mentors above.

Easy: install liminal, run one of the examples described in https://sa-lee.github.io/liminal/articles/liminal.html and capture an interesting view obtained by using the linked brushing.
Medium: Use the detourr R package to generate two different tour paths of any example data, show both tours side-by side with linked brushing between the two displays.
Hard: Write a small Shiny application that shows the t-SNE view for a simple data example (e.g. palmerpenguins), where the user can select the perplexity parameter from the GUI and the display gets updated when the value was changed.

Solutions of tests

Contributors, please post a link to your test results here.

EXAMPLE CONTRIBUTOR 1 NAME, LINK TO GITHUB PROFILE, LINK TO TEST RESULTS.

Divendra Yadav, GitHub Profile,Easy Solution,Medium Solution ,Hard Solution
Afraaz Ali GITHUB, EASY TEST ,MEDIUM TEST, HARD TEST
Vaibhav Manihar, GitHub, Easy Test, Medium Test, Hard Test

Jiayi Qian - Github link

Provide feedback

Saved searches

Use saved searches to filter your results more quickly