WiLDSI Data Portal

This portal was developed as part of the WiLDSI project with input from the DSI Scientific Network to enable the exploration and quantification of use & provision of nucleotide sequence data (NSD) / Digital Sequence Information (DSI) in the scientific literature. The underlying data set is the result of a ETL pipeline that extracts and links sequence records from the European Nucleotide Archive to citations in open-access publications aggregated in Europe PubMed Central. The dataset is updated regularly using automatic methods.

A weekly database dump is available for download: https://wildsi-dl.ipk-gatersleben.de/data/

This web application enables the discovery of data appropriate for bio-geographical studies, the exploration of collaborative networks, and the profiling of the flow of access and benefit relating to sequence data, for example:

A data note and a research article published in tandem at GigaScience provide more detailed information on our methods for extracting and linking nucleotide sequence data with associated publications, as well as interpretation and potential implications of these results:

A persistent copy of the dataset version used in these papers is published under the DOI: 10.5447/ipk/2021/8.

The Team

IPK Gatersleben, Germany

  • Jorge Garcia (Development, Design) orcidlinkedinGitHub
  • Matthias Lange, PhD (Concept, Supervision, Advisory) orcidlinkedinGitHub
  • Jens Freitag, PhD (Supervision, Use cases) orcidlinkedin

Leibniz-Institute DSMZ, Germany

  • Andrew L. Hufton, PhD (Development, Use cases, Test) ORCIDlinkedinGitHub
  • Amber H. Scholz, PhD (Project lead, Use cases) orcidlinkedin

WiLDSI Project repositoryGitHub