Wednesday, August 29, 2012

Join in the chorus!

The other day I was learning a dance choreography from a good friend of mine.  She was having trouble remembering which combination went where, because the song repeats itself many times, and it's hard to know "where" in the song you are just by hearing a clip.

It struck me that this challenge was not unlike the challenge that scientists face trying to assemble genomes.  Genomes often contain repeat regions: stretches of sequence that are repeated in other parts of the genome.  In a way, repeat regions are like the chorus of a song.  When you hear a clip of only the chorus, you might not know if it's at the beginning of the song, somewhere in the middle, or near the end.

The trick with dancing a choreography that has lots of repeats is to memorize the bit that comes right before and right after each repeat.  Those bits help knit the repeat to its correct position in the song.  Genome assemblers can do the same thing if they receive sequences containing a bit of flanking DNA along with the repeat region.  Then each genetic chorus has its proper context.