-
Notifications
You must be signed in to change notification settings - Fork 7
bddashboard: Interactive Biodiversity Data Dashboard
Aggregating publicly available biodiversity data from heterogeneous sources (e.g., scientific research, citizen science, and natural history collections), has the potential to answer a staggering variety of research questions. Yet, biodiversity data are prone to various data quality issues and biases, which may invalidate its usage in research. Furthermore, complex technical and analytical skills are required for handling biodiversity big-data. The bdverse is a family of R packages that form a general framework for facilitating biodiversity data science. It comprises various packages in a hierarchical structure - providing different R functionalities and a GUI (modular Shiny apps), that can easily be adopted by users, with or without programming capabilities. Hopefully, the bdverse will serve as a sustainable and agile infrastructure that enhances the value of biodiversity data by allowing users to conveniently employ R, for user-level data standardization, exploration, quality assessment, and cleaning.
Data quality issues may encompass a missing, doubtful or wrong information in one of the record's many attributes (e.g. taxonomic, spatial or temporal), a formatting inconsistency or a potential duplication due to various aggregation mechanisms (most are untraceable). Furthermore, another type of quality control is vital - the identification and removal of data that is not necessarily erroneous, but rather unsuitable for a particular application or purpose (i.e. data fitness for use). These case-specific procedures derive from a user’s own research questions, its intended analysis and algorithms, the data being used, and the properties of the chosen species/ taxonomic group. Hence, without even acknowledging the challenges of developing a robust research analysis, building and performing a comprehensive data quality assessment is overwhelmingly demanding. Therefore, supplying users with a flexible, reproducible and exceptionally user-friendly toolset is the only practical course of action. Diagnostic visualization can unveil hidden patterns and anomalies in the data, and allow quick and efficient exploration of massive datasets. The development of an interactive and flexible dashboard, that can be easily deployed locally or remotely, is a highly valuable biodiversity informatics tool.
To the best of our knowledge, no user-level biodiversity data dashboard exists. The closest project is the Rshiny LifeWatch Data Explorer, developed by the Flanders Marine Institute (VLIZ) in 2015-2016. This interactive online tool gives access to only sensor data collected in the framework of the Flemish LifeWatch project.
The dashboard.demo package created by Rahul Chauhan during GSOC 2019 is one of the ongoing projects in the bdverse. It provides an interactive shiny dashboard that allows users to visualize different aspects of biodiversity data such as temporal, taxonomic and spatial without worrying about coding. During the project, Rahul evaluated different interactive plots libraries (e.g. plotly, r2d3, D3), explored and implemented different types of interactivity and reactivity, and master shiny modularization. Last year’s project serves as a stepping stone for developing a production-ready, state-of-the-art biodiversity data dashboard.
Currently, a basic prototype of the dashboard is ready (dashboard.demo R package) and it can be found here. Till now, we have explored all major visualization libraries in R and concluded that plotly is the most mature and showcasing best value for effort visualizations. Other than plotly, we have concluded that leaflet is showing the most diverse and high-quality mapping features, as also, DT (DataTables library) for rendering tabular data.
dashboard.demo currently has functions to draw limited interactive and reactive visualizations but a major portion of plotly and leaflet still remains to be explored. In-short, dashboard.demo still lacks the majority of exploration methods, interactivity and is confined only to limited specific use cases. More interactive visualizations with drill-down capabilities will allow users to choose suspicious records and execute further exploration analysis. These can largely facilitate user-level data cleaning procedures.
We have identified five key features , by incorporating them we will be able to develop a flexible and robust dashboard template:
- Data field selection feature - to enhance dashboard flexibility.
- Summary tables - to supply easy-to-capture data overview.
- Interactive data tables - via DT (an R interface to the DataTables library).
- Interactive maps - via Leaflet for R (open-source JavaScript libraries for interactive maps).
- Interactive and reactive visualization - via Plotly for R.
During the last GSoC, we have already explored various different visualization packages and compared them. In the end, we created a dashboard as proof of concept. In this project, we plan to work on and improve these five major features within the dashboard.demo, which will enable graphic interactivity with 'drilling down' capabilities, and prepares it for a CRAN released by developing sufficient testing, CI/CD integration and submit it for a software peer review.
Currently, dashboard.demo has confined only to specific use cases. DataSummary tab is made to give an overview of your dataset before visualization. One of the main tasks of this project is to explore and add more plots and visualization tools that help in summarizing the data.
The leaflet is one of the most popular packages for spatial visualization. During the last GSoC, we have explored all major packages that are used for spatial visualization and find out that the leaflet is the main player. Although we have explored leaflet, it's full functionality is not yet implemented in the dashboard.demo. One of the tasks in this project is to utilize leaflet and use it's remaining potential in dashboard.demo, and most importantly, identify the weaknesses of each feature type.
We have implemented some of the visualizations from plotly but we still have a large number of these visualizations remaining to be implemented in the dashboard.demo. Our target is to find out and add those visualizations which can be interactive as well as can be used reactively with other visualization for all 3 categories i.e spatial, temporal and taxonomic.
Our prototype dashboard.demo contains reactive elements. This was implemented so that different plots can communicate with each other. Our next target is to improve user experience for field selection by adding more reactive elements and plots in our dashboard.
Interactivity is one of the important and most liked features of dashboard.demo. Although plots are interactive because of plotly, we still need to increase interactivity in tables. One of the important tasks is to make tables to interact with plots.
One of the most important tasks of this project is to develop a framework for comprehensive tests for a dashboard, as this is crucial for the production version. In order to identify bugs and failures, dashboard.demo will be tested with different datasets which vary in size, taxon, and data publisher. Bugs will be fixed and workarounds will be developed for failing features (omitted if unresolved), coupled with the development of appropriate unit testing. Integration tests for reactivity and shinytest for the UI will also be developed. Once a sufficient suite of tests will be implemented, different CI/CD strategies will be evaluated, in seeking a good balance between test sensitivity and test maintenance.
The bddashboard
package will be submitted to rOpenSci for a software peer review, and a short manuscript for JOSS will be written. After its hopefully smooth acceptance, we will submit it to CRAN. The packages will be added to the bdverse family of packages and a new bdverse
version will be pushed to CRAN.
R, Shiny (advanced level), data visualizations, HTML, Javascript, testing (testthat; shinytest), CI/CD.
Developing novel interactive visualizations coupled with a modular dashboard system for biodiversity data, that can easily be employed by R experts and novices alike; will undoubtedly promote biodiversity research. Feasibly, this project has the potential to spark a practical scheme for encapsulating key interactivity and reactivity functionality with testing units within a bdvis object. Engineering such an object will significantly speed up our ability to develop a diverse collection of dashboards, without compromising for robustness and integrity.
- Thiloshon Nagarajah [email protected] is the Shiny lead of the bdverse development team. He was past GSoC and GCI student for Fedora Project, Sahana Foundation and R Language. Thiloshon joined bdverse as a Google Summer of Code student developer in 2017 and has been a student, contributor, mentor and now, a core member of the bdverse team. All things Shiny of bdverse is the magic of Thiloshon.
- Vijay Barve [email protected] is the author and maintainer of bdvis and a key member in the bdverse development team. Vijay is a biodiversity data scientist and has been a GSoC student and mentor since 2012 with the R project organization. Vijay has contributed to several packages on CRAN.
- Tomer Gueta [email protected] is the founding director of the bdverse project. He is a postdoctoral fellow at the Faculty of Civil and Environmental Engineering at the Technion, working with Prof. Yohay Carmel. His research deals with developing tools and methodologies for data-intensive biodiversity research. During the last three years, Tomer served as a GSoC mentor with the R project organization.