Where do you start when you want to sequence the first representative species of an entire class of land plants? Ferns are notorious for having gigantic genomes, with some as large as 145 Gb (more than a thousand times larger than Arabidopsis thaliana). Here, Dr. Fey-Wei Li discusses some of the complexities his team faced in sequencing not one, but two fern genomes (Azolla filiculoides and Salvinia cucullata) that were published recently in Nature Plants (article found here). Of particular note - Azolla co-exists with a nitrogen fixing symbiont that has allowed farmers to use Azolla as a "green manure" for over 1,000 years in SE Asia. So Fey-Wei decided to go ahead and sequence the genome of the symbiont (Nostoc azollae) to assist in better understanding this intriguing symbiosis. Fey-Wei's paper has a bit for all stripes of Big-Data enthusiasts: genomics, transcriptomics, comparative and evolutionary analyses - enjoy!
Congratulations on sequencing not just the first, but the first two genomes from an entire class of land plants! What were some of the first things you needed to take into consideration before initiating this genome sequencing project?
Fay-Wei: Genome size! Ferns are notoriously known for humongous genomes (>10Gb), but we were able to find “outliers” (i.e. Azolla and Salvinia) that we could work with.
Do you think that this project would have been feasible without PacBio long read sequencing technology?
Fay-Wei: We could definitely sequence and assemble the genomes without PacBio, but the quality would not be as nice.
Along those lines, did the WGD in Azolla create assembly problems?
Fay-Wei: Not really. The WGDs happened a while ago and the sequences are divergent enough to not cause much trouble. Plus having PacBio long reads helped.
It sounds like you initially went into this project aiming to sequence just Azolla until your flow cytometry results revealed that Salvinia cuculata was 1/3rd the size of Azolla. Were you anticipating that Salvinia would have a small genome, or was this a particularly fortuitous finding?
Fay-Wei: We suspected that Azolla’s relatives might have smallish genomes too, and that’s why we looked into Salvinia. But we were still pleasantly surprised by the finding!
Fern genomes sound crazy! Why do you think many fern genomes are so large, and why do you think Salvinia’s genome is so reduced?
Fay-Wei: It’s a bit hard to say now, because we don’t have a “normal” fern genome to compare. But we are working on it! (My gut says it’s highly reduced).
In your paper you discuss the relationship between Azolla and its particularly interesting and ancient symbiont, the nitrogen fixing cyanobacteria Nostoc. Was the lack of HGT between Azolla and Nostoc surprising to you, given the ancient origins of the symbiosis and the presence of HGT events between Azolla and other bacteria?
Fay-Wei: We were indeed somewhat perplexed by this result. But I should note that although we found no evidence of HGT, it doesn’t mean there aren’t any (e.g. some might have escaped our detection).
What are your preferred ways of storing and sharing the data generated, particularly across so many groups?
Fay-Wei: We set up a FTP site to share raw data, and for the assemblies and annotations, Google Drive is pretty handy.
Do you have any genome tools or references that you would like to promote?
Fay-Wei: You can BLAST, browse, and download the genome data at fernbase. Why not include ferns in your next big data analyses?
If you were to start a similar project today, would you generate, organize, store, or otherwise deal with your data in any different manner? If so, how?
Fay-Wei: I’d be more careful at version control. It’s a nightmare when multiple versions are floating among the collaborators, even if the difference is only one gene model.
What was the most difficult/challenging part of completing this project?
Fay-Wei: Convince everyone (including me) it’s time to wrap up and write the paper.
Do you have any advice or insight for graduate students, post-docs, or budding computational biologists looking to know more about computation?
Fay-Wei: EVERY BIOLOGIST SHOULD KNOW HOW TO CODE!!!!