Many labs have existing infrastructure and workflows; BDDS utilities have been designed to work independently of other BDDS components as well as work with existing frameworks. BDDS has developed various utilities that allow users to find correlations in data (SORC Dashboard), discover outliers (BDQC), unambiguously name and identify research data products (MINID) as well as assemble large and complex data sets (BDBag). The BDBag package is a collection of utilities for working with BagIt packages. Ensuring data integrity during exchange between components becomes critical when dealing with large data sets where records may get lost during the transfer process.
Due to the constantly changing nature of biological data, annotations, and tools, validating results becomes difficult not only for reviewers and peers but also for data publishers. MINID solves this problem by providing data publishers a system to unambiguously identify data products.
Additional Utilities
Code for Ultrafast Comparison of Personal Genomes
https://github.com/gglusman/genome-fingerprints
Reproducible high performance pipeline for generating footprints using docker containers
Docker container (https://hub.docker.com/r/bd2kbdds/dnase_footprinting/), workflow definitions (https://bdds.globusgenomics.org/workflow/list_published)
Software for TReNA at Bioconductor. It utilizes footprints.
http://bioconductor.org/packages/release/bioc/html/trena.html
The footprints can be accessed at www.trena.org (which is a redirect to github page with download instructions).