Here we present the extended support material for our 'Reproducibility for Everyone' workshop at ASPB #PlantBio18. All material is available under a CC BY 4.0 license.

Please feel free to remix, copy, distribute, use, improve, and snowball

The extended handout is available as pdf, word, and google doc.

The extended presentation is available as pdf, pptx, and google doc.

Supported by:

ASPB, Addgene,, eLife, CodeOcean, and free labor in the love of better science

The handout is also copied below for easy of use.

Reproducibility Resources & Tools

Data management

Harvard University Data Management page https://datamanagement.hms.har...
Kbroman Lab (Short primer on data storage and handling form Kbroman)
Purdue Library (Short primer on data management and file naming conventions)
Data One Best Practices (Detailed resource on how to handle data throughout its life-cycle)
Mantra (Free online course for those who handle digital data)

Electronic Lab Notebooks (ELN)

Harvard University ELN guide (Great summary about current ELNs and what they do)
Benchling (free)
Evernote (free and $)
Labguru ($)
sciNote (open source, free)
Open Science Framework (free)


Github (code repository; free for public repos)
Jupyter Notebooks (open source web-app for creating & sharing live code, equations, and more)
Code Ocean (computational reproducibility platform; free to upload, share & publish executable code with DOI; pay for more computing time over freemium limit)
Conda and BioConda and  (A operating system independent package environment manager for the command line)
Docker and Biocontainers and (A container ecosystem to package code and data on the command line.
Binder (A tool to make your github repository an online docker image run in the cloud)
Galaxy (A web and graphic interface based bioinformatics platform. Needs local set-up for larger data handling.)


Addgene (nonprofit plasmid repository)
CiteAb (antibody search engine with results sorted by citations)
Quartzy (manage lab inventory)


Bio-Protocol (A peer-reviewed protocol journal; free to read & publish) (an open access repository of science methods; free to read & publish)


DataDryad (curated digital repository; free to access, $120 to publish dataset up to 20GB)
Figshare (free digital repository, 5GB per file limit)
Zenodo (free digital repository; 50GB per dataset limit)

Data Visualization

Beyond Bar Graphs (Free Tools & Resources for Creating More Transparent Figures for Small Datasets)
Interactive Dotplot Tool (create dotplots, box plots, violin plots, show subgroups or display clusters of non-independent data)
Interactive Linegraph Tool (examine different summary statistics, focus on groups, time points or conditions of interest, examine lines for any individual in the dataset, view change scores):
Other free tools:

Tutorial - Plotting in R on youtube

Customized interactive visualizations (Shiny)

Claus Wilke blog post (contains several links to his upcoming book about datavisiulization)


Collection of useful resources
Tutorial - Data Analysis and Visualization in Python
Data Carpentry: An Introduction to Python for Data Analysis and Visualization - Tracy Teal PyCon 2016 Tutorial
PyData Packages (incl. Matplotlib, Seaborn, Numpy, Pandas, and many more important for data analysis and visualization)

Statistical Analysis

Handbook of Biological Statistics! and (Web page from John H. McDonald and others form University of Delaware with pdf download links to free book on stats in Biology and its R implementation).
Scipy stats lectures (Lecture on stats in python using scipy) see also for more stats in python
Nature Stats for Biologist resources

Practical tips for reproducibility

1.    Plan for reproducibility before you start
a.    Write a study plan or protocol and track new versions.
b.    Set-up a reproducible project using an electronic lab notebook to organize and track your work. Avoid saving proprietary file formats.

2.    Keep track of things
a.    Preregister important study design and analysis information. Free tools to help you make your first registration include AsPredicted, Open Science Framework, and Registered Reports. Clinical trials use
b.    Track changes to your files using version control.
c.    Document everything done by hand in a README file and data dictionary. Karl Broman’s Data Organization module:

3.    Report your research transparently
a.    Share your protocols and interventions explicitly and transparently.
b.    Write a transparent report. Guidelines from the Equator Network or processes like Registered Reports can help.

4.    Archive & share your materials
a.    Share and licence your research
i.    Data
     1.    Avoid supplementary files, licence, and share your data using a repository. How to License Research Data:
ii.    Materials & reagents
1.    Licence your published materials so they can be reused. Creative Commons License Picker:
2.    Deposit reagents and seeds with repositories like Addgene, and seed repositories
iii.    Software
1.    Licence your code using Code Ocean or Github. Open Source Initiative: About Open Source Licences: 

Further reading

●    Ten Simple Rules for Reproducible Computational Research:
●    Reproducibility in Science:
●    Open Science MOOC: and https://opensciencemooc.github...
●    Tools and Resources for Reproducibility Series at
●    Managing Laboratory Notebooks
●    General File and Folder Organization
●    File Naming Conventions

Example studies

Gene family innovation, conservation and loss on the animal stem lineage
○    Paper:
○    Protocols:
○    Data:

A robust method for transfection in choanoflagellates illuminates their cell biology and the ancestry of animal septins
○    Paper:
○    Protocols:
○    Constructs:

Implicating candidate genes at GWAS signals by leveraging topologically associating domains
○    Paper:
○    Code:
○    Docker workflow:

mcSCRB-seq: sensitive and powerful single-cell RNA sequencing
○    Protocol:
○    Paper:
○    Code:

TransRate: reference-free quality assessment of de novo transcriptome assemblies
○    Paper:
○    Code:
○    Tutorial:

Genomic insights into members of the candidate phylum Hyd24-12 common in mesophilic anaerobic digesters
○    Paper:
○    Code:

Experimenting with Reproducibility: a case study of Robustness in Bioinformatics
○    Paper:
○    Code:

A Bayesian Mixture Modelling Approach For Spatial Proteomics
○    Paper:
○    Code: