In this guest post, Dan Fergus, a researcher at the North Carolina Museum of Natural Sciences, picks up where Rob left off in the previous post, explaining how we use genetics and molecular biology to see the invisible life that covers our bodies and homes.
Many of you have participated in one of our microbiome projects, using sterile swabs to collect bacteria and archaea from your pillow, your doorframes, or even your belly button. You then close that swab back in its tube, seal it in an envelope and anxiously wait to learn the identity of the microbes that you kindly provide a nice comfortable home. But, you may wonder, what is the process that gets us from a dirty looking swab to the identification of your microbes? Hopefully we can shed some light on this question by providing this short primer to walk you through the lab work from beginning to end (in essence, through our metaphorical lens).
When we receive samples in the mail the first thing we do is stick them in the freezer. We need to have enough samples to make processing time and cost-effective. While we wait for enough samples to come in, the freezer is the best place for them. By keeping the samples chilled to -20⁰C (-4⁰F) we can slow or stop microbial growth, making, for a while, the animate world inanimate. While a few microbes are extremophiles that can survive very cold temperatures, our -20⁰C freezers are cold enough to greatly slow or even completely stop their reproduction. In general, most of the microbes we sample will not survive storage in the freezer. Another important thing about the freezing process is that it protects the DNA inside the microbes. DNA left at room temperature is very susceptible to getting broke down, or degraded, by enzymes or other bacteria. Putting them in the freezer keeps the degradation process to a minimum so that we can wait weeks or months to acquire enough samples before proceeding to the next step of the analysis.
Once we have enough samples in hand, it is time to extract the microbes’ DNA. To do this, we have to break the microbe cells open. This can be done mechanically by crushing or grinding the microbes or chemically with detergents that dissolve the cell membranes just like dish soap does for the grease clinging to your dirty pots and pans. In our lab we typically use a combination of both methods. We put the end of that swab in a tube with a solution that contains small beads (imagine tiny pieces of gravel) and a little detergent. We then shake that tube vigorously for about 15 minutes. The beads bang and smash the microbes over and over, breaking the cells open. As this happens, DNA begins to spill out. After 15 minutes of thorough shaking, we’re left with a well-mixed soup of microbe guts that includes broken cell parts, proteins, and DNA.
Next it is time to clean up. We need to isolate the DNA from the other broken up material. We do this by, and I find this step amazing, running it through a small silica (silica is a glasslike material) strainer (called a column) that grabs the DNA and lets other debris run through. In the presence of very salty and acidic conditions, DNA will bind very tightly to silica, allowing us to wash away all the other impurities and cell debris while the DNA stays bound to the column. When we are sure we have sufficiently removed all the impurities from the DNA, we then add a little bit of warm (and pure) water to release the DNA from the silica column (ordinary tap water is full of microbes and DNA so it won’t work), all clean and ready to use in the next step. Though this process can be done in just a few hours, it can take several days to clean up and isolate all of the samples that have accumulated in the freezer.
Now we must prepare the DNA for sequencing. But we don’t want to read the entire genomes of all the microbes on the swab. Instead we want to look at a short segment (about 250 nucleotides long – nucleotides are the A’s, G’s, T’s and C’s that compose DNA) of one well-studied gene called 16S ribosomal RNA – known as 16S rRNA, for short. This gene is found in all bacteria (and also in that other ancient lineage of single-celled life, the archaea) and spells out the recipe needed for the cell to make some of the machinery involved in making new proteins. Parts of the 16S rRNA gene sequence are extremely variable among different strains of microbes, providing a “fingerprint” that we can use to distinguish one type of microbe from another.
The process of DNA sequencing requires many copies of each strand of DNA. Fortunately, these copies can be made even without the help of the cells in which the DNA was originally found. Before we sequence the 16S rRNA gene, we make many copies of it from the DNA of all the microbes in our original sample. To do so, we use one of the most important methods in molecular biology, polymerase chain reaction or PCR. This reaction enables us to do two important things: 1.) Make billions of copies of the selected region of the 16S rRNA gene; and 2.) Add a short piece of extra DNA to the ends of each copy to tag it in a way that tells us exactly which sample it came from. The tag is bit like putting your name on the first page of a book; your name doesn’t change the information inside, but it let’s everyone know it is your book. PCR itself is usually not a terribly difficult method, but we can run into a lot of trouble doing it with microbe samples. We want to study the microbes collected from specific locations in your home or on your body; the trouble is that microbes are everywhere and can very easily contaminate our samples. If just a few microbes from your hands, tabletop, or breath get in any of the components of the PCR reaction, it can sink the study. Thus, we have to take great care to open tubes and work with the swabs under the cleanest conditions possible. We want to be sure we are getting microbes from our swabs and not from our researchers or the lab bench!
After PCR, we have billions of copies of 16S rRNA gene fragments tagged with individual identifiers that uniquely label each sample. At this point, we can actually mix them all the samples together for sequencing. We will generally use one of two next-generation sequencing technology platforms, named for and by the companies that patented them, 454 or Illumina. Next-generation sequencing is a set of technologies that has become available in the last decade that has allowed us to move from sequencing just a handful of DNA fragments at a time to sequencing tens of thousands and up to billions of fragments all at once. By applying next-generation sequencing to our microbe projects, we can produce thousands of sequences from hundreds of samples all at once. We choose one next-generation sequencing technology over another based on cost, convenience, and the number and lengths of DNA sequences they will produce. Illumina, for example, can produce 3 billion sequence reads of approximately 200 nucleotides each while 454 does about 1 million reads of up to 700 nucleotides. These sequencing methods are performed at a core facility for genomics research and can take as little as a day for 454 on up to longer than a week for Illumina, once the samples are loaded into the machines. Usually the turn-around time for submitting samples to receiving sequence results is a few weeks. We receive our results as a massive electronic sequence file.
When we learn that the sequences are done we first feel excitement followed by the realization that we have now produced gigabytes of data that need to be turned into meaningful results. Here, the challenge of seeing can turn anxious. One approach to dealing with these data is to work with individuals whose expertise is in data itself (rather than say microbes). Another is to rely upon software developed specifically to deal with such data. We often choose the latter and use a program called QIIME (pronounced “chime”). We import all our data into QIIME and give it the list of the special identifier tags we added during PCR so that the program can sort all of the sequences based on the samples from which they originated. QIIME groups the results for each sample into highly similar sequences and compares those groups to a database of known microbial 16S sequences, called the “greengenes” database, to figure out the identity of the microbes in each sample.
In some cases we can only identify microbes at very broad taxonomic levels – like phylum or family. In other cases, we can identify microbes all the way down to the level of the genus or even, in some cases, the species. Our ability to do so has to do with how closely sequences in our sample match the known sequences in the database, and how much those sequences vary from one another. In the approach I have described here, we almost never know which particular strain of a microbe we are dealing with. For example, while many species of the genus Staphylococcus are beneficial partners of humans, living on and protecting our skin from pathogens, other species of Staphylococcus are pathogens. The difference between partner and pathogen, friend and foe, can be as subtle as a few nucleotides. As a result, seeing a complete picture of who lives on or around us often requires additional steps on top of what I have described here; typically it requires a more complete reading of the DNA for the specific kinds of microbes we are particularly interested in studying.
In the end, our computer analyses result in huge tables where the hundreds of species we find are listed in the rows, the sample numbers are listed in the columns, and the relative abundance of each species in each sample is listed in the cells of the table. These tables are very familiar to biologists who study animals or plants, but the process we use to make these tables, the process I have just described, is very different.
These tables are fun to look through, but we must do a little more to really understand the data. Here’s where the information you contributed in Participant Questionnaires comes in, information like whether you have dogs at home or use antibacterial soap. We combine this information with the table of species to figure out what factors affect the microbes that live with us or on us. In one approach, we use a mathematical test called Principal Components Analysis to look for clear groups of microbial communities among the samples and to test whether any of the factors about where and how you live might explain why you have one microbial community or another. For example, using this test we have found that homes with dogs had different microbes living on their TV screens and pillowcases than homes without dogs. Our analyses didn’t answer why your dog has such an effect, or why so many other things do and don’t have effects. It just means we have to get busy doing more science!