Researcher Dan Fergus pipettes a DNA sample at the North Carolina Museum of Natural Sciences. Photo (C) Paige Brown.
The process of DNA sequencing requires many copies of each strand of DNA. Fortunately, these copies can be made even without the help of the cells in which the DNA was originally found. Before we sequence the 16S rRNA gene, we make many copies of it from the DNA of all the microbes in our original sample. To do so, we use one of the most important methods in molecular biology, polymerase chain reaction or PCR. This reaction enables us to do two important things: 1.) Make billions of copies of the selected region of the 16S rRNA gene; and 2.) Add a short piece of extra DNA to the ends of each copy to tag it in a way that tells us exactly which sample it came from. The tag is bit like putting your name on the first page of a book; your name doesn’t change the information inside, but it let’s everyone know it is your book. PCR itself is usually not a terribly difficult method, but we can run into a lot of trouble doing it with microbe samples. We want to study the microbes collected from specific locations in your home or on your body; the trouble is that microbes are everywhere and can very easily contaminate our samples. If just a few microbes from your hands, tabletop, or breath get in any of the components of the PCR reaction, it can sink the study. Thus, we have to take great care to open tubes and work with the swabs under the cleanest conditions possible. We want to be sure we are getting microbes from our swabs and not from our researchers or the lab bench!
After PCR, we have billions of copies of 16S rRNA gene fragments tagged with individual identifiers that uniquely label each sample. At this point, we can actually mix them all the samples together for sequencing. We will generally use one of two next-generation sequencing technology platforms, named for and by the companies that patented them, 454 or Illumina. Next-generation sequencing is a set of technologies that has become available in the last decade that has allowed us to move from sequencing just a handful of DNA fragments at a time to sequencing tens of thousands and up to billions of fragments all at once. By applying next-generation sequencing to our microbe projects, we can produce thousands of sequences from hundreds of samples all at once. We choose one next-generation sequencing technology over another based on cost, convenience, and the number and lengths of DNA sequences they will produce. Illumina, for example, can produce 3 billion sequence reads of approximately 200 nucleotides each while 454 does about 1 million reads of up to 700 nucleotides. These sequencing methods are performed at a core facility for genomics research and can take as little as a day for 454 on up to longer than a week for Illumina, once the samples are loaded into the machines. Usually the turn-around time for submitting samples to receiving sequence results is a few weeks. We receive our results as a massive electronic sequence file.
When we learn that the sequences are done we first feel excitement followed by the realization that we have now produced gigabytes of data that need to be turned into meaningful results. Here, the challenge of seeing can turn anxious. One approach to dealing with these data is to work with individuals whose expertise is in data itself (rather than say microbes). Another is to rely upon software developed specifically to deal with such data. We often choose the latter and use a program called QIIME (pronounced “chime”). We import all our data into QIIME and give it the list of the special identifier tags we added during PCR so that the program can sort all of the sequences based on the samples from which they originated. QIIME groups the results for each sample into highly similar sequences and compares those groups to a database of known microbial 16S sequences, called the “greengenes” database, to figure out the identity of the microbes in each sample.
In some cases we can only identify microbes at very broad taxonomic levels – like phylum or family. In other cases, we can identify microbes all the way down to the level of the genus or even, in some cases, the species. Our ability to do so has to do with how closely sequences in our sample match the known sequences in the database, and how much those sequences vary from one another. In the approach I have described here, we almost never know which particular strain of a microbe we are dealing with. For example, while many species of the genus Staphylococcus are beneficial partners of humans, living on and protecting our skin from pathogens, other species of Staphylococcus are pathogens. The difference between partner and pathogen, friend and foe, can be as subtle as a few nucleotides. As a result, seeing a complete picture of who lives on or around us often requires additional steps on top of what I have described here; typically it requires a more complete reading of the DNA for the specific kinds of microbes we are particularly interested in studying.
In the end, our computer analyses result in huge tables where the hundreds of species we find are listed in the rows, the sample numbers are listed in the columns, and the relative abundance of each species in each sample is listed in the cells of the table. These tables are very familiar to biologists who study animals or plants, but the process we use to make these tables, the process I have just described, is very different.
These tables are fun to look through, but we must do a little more to really understand the data. Here’s where the information you contributed in Participant Questionnaires comes in, information like whether you have dogs at home or use antibacterial soap. We combine this information with the table of species to figure out what factors affect the microbes that live with us or on us. In one approach, we use a mathematical test called Principal Components Analysis to look for clear groups of microbial communities among the samples and to test whether any of the factors about where and how you live might explain why you have one microbial community or another. For example, using this test we have found that homes with dogs had different microbes living on their TV screens and pillowcases than homes without dogs. Our analyses didn’t answer why your dog has such an effect, or why so many other things do and don’t have effects. It just means we have to get busy doing more science!