Material for our #PlantBio18 "Reproducibility for Everyone" workshop
Here we present the extended support material for our 'Reproducibility for Everyone' workshop at ASPB #PlantBio18. All material is available under a CC BY 4.0 license.
Please feel free to remix, copy, distribute, use, improve, and snowball
The handout is also copied below for easy of use.
Reproducibility Resources & Tools
Harvard University Data Management page https://datamanagement.hms.har...
Kbroman Lab http://kbroman.org/dataorg/ (Short primer on data storage and handling form Kbroman)
Purdue Library http://guides.lib.purdue.edu/c... (Short primer on data management and file naming conventions)
Data One Best Practices https://www.dataone.org/best-p... (Detailed resource on how to handle data throughout its life-cycle)
Mantra https://mantra.edina.ac.uk/ (Free online course for those who handle digital data)
Electronic Lab Notebooks (ELN)
Harvard University ELN guide https://tinyurl.com/Harvard-EL... (Great summary about current ELNs and what they do)
Benchling https://benchling.com/ (free)
Evernote https://evernote.com/ (free and $)
Labguru https://www.labguru.com/ ($)
sciNote https://scinote.net/ (open source, free)
Open Science Framework https://osf.io/ (free)
Github https://github.com/ (code repository; free for public repos)
Jupyter Notebooks http://jupyter.org/ (open source web-app for creating & sharing live code, equations, and more)
Code Ocean https://codeocean.com/ (computational reproducibility platform; free to upload, share & publish executable code with DOI; pay for more computing time over freemium limit)
Conda and BioConda https://conda.io/docs/ and https://bioconda.github.io/ (A operating system independent package environment manager for the command line)
Docker and Biocontainers https://docs.docker.com/ and http://biocontainers.pro (A container ecosystem to package code and data on the command line.
Binder https://mybinder.org/ (A tool to make your github repository an online docker image run in the cloud)
Galaxy https://usegalaxy.org/ (A web and graphic interface based bioinformatics platform. Needs local set-up for larger data handling.)
Addgene https://www.addgene.org/ (nonprofit plasmid repository)
CiteAb https://www.citeab.com/ (antibody search engine with results sorted by citations)
Quartzy https://www.quartzy.com/ (manage lab inventory)
Bio-Protocol https://bio-protocol.org/ (A peer-reviewed protocol journal; free to read & publish)
protocols.io http://protocols.io/ (an open access repository of science methods; free to read & publish)
DataDryad http://datadryad.org/ (curated digital repository; free to access, $120 to publish dataset up to 20GB)
Figshare http://datadryad.org/ (free digital repository, 5GB per file limit)
Zenodo https://zenodo.org/ (free digital repository; 50GB per dataset limit)
Beyond Bar Graphs (Free Tools & Resources for Creating More Transparent Figures for Small Datasets) https://tinyurl.com/ecrbeyondb...
Interactive Dotplot Tool http://statistika.mfub.bg.ac.r... (create dotplots, box plots, violin plots, show subgroups or display clusters of non-independent data)
Interactive Linegraph Tool (examine different summary statistics, focus on groups, time points or conditions of interest, examine lines for any individual in the dataset, view change scores): http://statistika.mfub.bg.ac.r...
Other free tools: https://twitter.com/T_Weissger...
Tutorial - Plotting in R on youtube
Customized interactive visualizations (Shiny) https://www.frontiersin.org/ar...
Claus Wilke blog post http://serialmentor.com/blog/2... (contains several links to his upcoming book about datavisiulization)
Collection of useful resources https://github.com/schmelling/...
Tutorial - Data Analysis and Visualization in Python
Data Carpentry: An Introduction to Python for Data Analysis and Visualization - Tracy Teal PyCon 2016 Tutorial
PyData Packages (incl. Matplotlib, Seaborn, Numpy, Pandas, and many more important for data analysis and visualization) https://pydata.org/downloads.h...
Handbook of Biological Statistics! http://www.biostathandbook.com... and http://rcompanion.org/rcompani... (Web page from John H. McDonald and others form University of Delaware with pdf download links to free book on stats in Biology and its R implementation).
Scipy stats lectures https://tinyurl.com/scipystats (Lecture on stats in python using scipy) see also https://www.statsmodels.org/st... for more stats in python
Nature Stats for Biologist resources https://www.nature.com/collect...
Practical tips for reproducibility
1. Plan for reproducibility before you start
a. Write a study plan or protocol and track new versions.
b. Set-up a reproducible project using an electronic lab notebook to organize and track your work. Avoid saving proprietary file formats.
2. Keep track of things
a. Preregister important study design and analysis information. Free tools to help you make your first registration include AsPredicted, Open Science Framework, and Registered Reports. Clinical trials use Clinicaltrials.gov.
b. Track changes to your files using version control.
c. Document everything done by hand in a README file and data dictionary. Karl Broman’s Data Organization module: http://kbroman.org/dataorg/pag...
3. Report your research transparently
a. Share your protocols and interventions explicitly and transparently.
b. Write a transparent report. Guidelines from the Equator Network or processes like Registered Reports can help.
4. Archive & share your materials
a. Share and licence your research
1. Avoid supplementary files, licence, and share your data using a repository. How to License Research Data: http://www.dcc.ac.uk/resources/how-guides/license-research-data.
ii. Materials & reagents
1. Licence your published materials so they can be reused. Creative Commons License Picker: https://creativecommons.org/ch...
2. Deposit reagents and seeds with repositories like Addgene, and seed repositories
1. Licence your code using Code Ocean or Github. Open Source Initiative: About Open Source Licences: https://opensource.org/licenses.
● Ten Simple Rules for Reproducible Computational Research: http://journals.plos.org/plosc...
● Reproducibility in Science: http://ropensci.github.io/repr...
● Open Science MOOC: https://opensciencemooc.eu/ and https://opensciencemooc.github...
● Tools and Resources for Reproducibility Series at protocols.io: goo.gl/r7GKMA
● Managing Laboratory Notebooks http://colinpurrington.com/tip...
● General File and Folder Organization https://zapier.com/blog/organi...
● File Naming Conventions http://www.exadox.com/en/artic...
Gene family innovation, conservation and loss on the animal stem lineage
○ Paper: https://doi.org/10.7554/eLife....
○ Protocols: dx.doi.org/10.17504/protocols.io.kwscxees
○ Data: https://doi.org/10.6084/m9.fig...
A robust method for transfection in choanoflagellates illuminates their cell biology and the ancestry of animal septins
○ Paper: https://doi.org/10.1101/343111
○ Protocols: http://www.protocols.io/groups...
○ Constructs: http://www.addgene.org/Nicole_...
Implicating candidate genes at GWAS signals by leveraging topologically associating domains
○ Paper: https://dx.doi.org/10.1038/ejh...
○ Code: https://zenodo.org/record/1639...
○ Docker workflow: https://zenodo.org/record/1665...
TransRate: reference-free quality assessment of de novo transcriptome assemblies
○ Paper: https://dx.doi.org/10.1101%2Fg...
○ Code: https://github.com/Blahah/tran...
○ Tutorial: http://hibberdlab.com/transrat...