Wednesday, August 29, 2012

Join in the chorus!

The other day I was learning a dance choreography from a good friend of mine.  She was having trouble remembering which combination went where, because the song repeats itself many times, and it's hard to know "where" in the song you are just by hearing a clip.

It struck me that this challenge was not unlike the challenge that scientists face trying to assemble genomes.  Genomes often contain repeat regions: stretches of sequence that are repeated in other parts of the genome.  In a way, repeat regions are like the chorus of a song.  When you hear a clip of only the chorus, you might not know if it's at the beginning of the song, somewhere in the middle, or near the end.

The trick with dancing a choreography that has lots of repeats is to memorize the bit that comes right before and right after each repeat.  Those bits help knit the repeat to its correct position in the song.  Genome assemblers can do the same thing if they receive sequences containing a bit of flanking DNA along with the repeat region.  Then each genetic chorus has its proper context.  

Saturday, April 14, 2012

Short Stories and Novels

When you get right down to it, your genome--the sum total of your DNA--is a story.  Sure, it's not written in English: DNA has its own code, with far fewer letters than we humans use to describe our world.  Yet this one code describes organisms as complex as ourselves and as elegantly simple as a virus.  As someone working in the field of DNA sequence analysis, I am thrilled to live in an age when whole genome sequencing is not only possible, it's becoming faster and cheaper by the year.  Still, the process of determining an entire genome is quite the puzzle. 

I want you to imagine your genome is right in front of you.  Maybe it's a big, leatherbound tome or a crisp, new paperback.  In a perfect world, you would just open the book and read it, cover to cover.  But the challenge of DNA sequencing is that the current technology can't do that, yet.  Instead of reading your genome cover to cover, you can only see, for example, 10 words at a time.    Somehow, you have to put the story back together. 

If someone gave you an envelope with the fragments of a haiku in it, you could probably paste the whole thing together all on your own.  But if someone hands you a shoebox with "The Cat in the Hat" fragments instead, it might take you a bit longer to put together.  And what if someone delivers you an office space filled with boxes of a hefty Stephen King novel, eh?  You'd probably need alot of time and alot of help. 

Bigger stories mean bigger problems.  That is the challenge of sequencing a genome, in a nutshell.  There are plenty of things that make the problem more difficult.  For example:
  • What if your "story" is a poem with repeated phrases?  How do you know where to put the fragment that matches 6 different places?  
  • What if, instead of seeing 10 words at a time, you could see 20 but there would be lots of typos?
  • What if the story is so huge it would take you several lifetimes to put together on your own? 
On the other hand, what can make solving the problem easier?
  • What if I gave you a draft copy of your story, and had you match the fragments to the draft?
  • What if I gave you a roomful of interns to help?
  • Or better yet, what if I gave you a roomful of computers to help?
Where is the technology right now?  Well, the stretches of DNA we can read at one time become longer and longer.  Imagine how much easier it is to piece together a story if you have whole chapters instead of sentences!  But you still might get the wrong answer, if the story long or very complex.  So these days scientists use computers to help solve the problems as fast as possible.  Over time, reading the sequence becomes cheaper, and piecing the story together becomes faster.  Truly, it is an exciting time to be a life scientist!  The ethical implications of all the information we glean from genomes, of course, is a discussion for another day . . .