Thursday, March 26, 2020

Deciphering Caesar's Cipher

Right now I'm taking Harvard's CS50x online computer science course: https://cs50.harvard.edu/x/2020/

Since I have no "formal" computer science training, I wanted to go back and take a course that would give me some of the fundamentals my colleagues have.  You'd be surprised how much coding you can do without that degree, for certain.  But being able to do a thing is not the same as knowing why you do it the way you do.

This week my problem set included implementing a version of Caesar's Cipher--a way to encrypt messages by shifting the letter by a pre-arranged value.  For example, if you and I both agreed we'd shift every letter by 1, then 'HELLO' would become 'IFMMP.'  Which takes some thought but you can do it in your head, right?  But what if I told you we'd be shifting them by 20?  A+20 doesn't immediately make sense.  But fortunately computers already store letters as numbers. 

BEHOLD THE ASCII TABLE

So now it should be as simple as adding the starting letter's value to the key, then finding the new key in that ASCII table, right?  Oh yeah that's what I thought, until I remembered that the alphabet is 26 characters long. 

If your starting letter is A and your key is 20, you'll still land somewhere in the same alphabet.
If your starting letter is A and your key is 26, you'll land beyond Z . . . and then what do you do? 

Side note: "The Land Beyond Z" sounds like a pretty atmospheric novel title, doesn't it?  

As a human, you probably already know what to do: figure out how far past Z you've gone, and drag your finger back to A on that ASCII and keep counting.  You've got to "wrap around" to the beginning of the alphabet again.  But how do we tell a computer to do that?  I'll give you a short version, and a longer version:


TL;DR

Use of modulo in this case is like this game Cliffhangers on Price is Right!  When you know what's beyond your range, what remains, you can pick up this poor yodeler and set him back at square one and let him climb whatever steps he had left.  I'm going to pretend this mountain is 0-indexed because then it's perfect for the alphabetic range:



Longer Version / Deeper Dive / Down the Rabbit Hole We Go

encrpyted letter = ((starting letter) - 'A' + key) % 26 + 'A'

It works, and late at night I got here by googling around.  It worked but I didn't understand why.  Now I think I do.  Let's start here:

(starting letter) - 'A

A through Z is our valid range of characters.  If we take that Cliffhanger game analogy, past Z is a cliff!  So this part of the equation tells us how far away from our start, from A, we are right now.  

((starting letter) - 'A' + key)

Adding the key tells us how far we'd go from where we are, with no limits in place.  How far past the cliff's edge would the yodeler go, if he couldn't fall?

((starting letter) - 'A' + key) % 26
Here's where it gets fun!  Or at least, fun to me.  % in code can mean 'modulo' or 'remainder', where you are dividing a number by another number, and handing back what's left.  Here we are dividing the distance we could travel by 26, because that's how far the valid range goes.  How far past the cliff does our yodeler go?  If the number is smaller than 26, then the remainder is 0.  If the number is greater than 26, we get back what's left over--how far he's gone past the cliff.  

((starting letter) - 'A' + key) % 26 + 'A'
Now we know how far he's gone past the cliff, so we add that number to our starting point, 'A.'  In other words, instead of falling off the cliff, we picked up the yodeler and dropped him back at square one, at 'A', and then allowed him to climb the remaining steps to our new letter.


Let's practice this some more!

So if we take that complicated example, starting with A and having to change it by 26:

Given 
A = 65
Z = 90
Key = 26

((65)-65) = 0
Relative to our starting point, A, we're at 0

0 + 26 = 26
We want to move by 26 letters

26 % 26 = 0
There's nothing left over.

0 + 65 = 65
Fun!  By hurrying up we've stayed exactly in the same place.  Or more precisely, our yodeler moved the full length of our range and ended up back at the beginning of his climb.  

If we used 27 as our key instead . . .

((65)-65) = 0
Relative to our starting point, A, we're at 0

0 + 27 = 27
We want to move by 27 letters

27 % 26 = 1
We've got 1 left over.  We've exceed our range by one.

1 + 65 = 66
66 is B.  So the yodeler traveled the range, had one step left over, and instead of stepping into thin air, we picked him up and plopped him back at A and took one more step.  

To me, what's neat about all of this is that it doesn't matter how many times bigger than 26 the key is: with modulo we don't care how many times one number goes into another, only what's left over.  We know our starting point, so we apply the steps left to that.  

What if that poor yodeler never had to fall off a cliff!?  A thought for another time, perhaps.  Thanks for joining me on this mental journey! 



From DNA code to computer code

Hey all, it's been awhile right?  A lot has happened since I last posted here.  Significant milestones have included:

  • Completing my PhD thesis in Medical Microbiology and Immunology
  • Joining the private sector to work on bioinformatic software . . . 
  • Then pivoting to software for medical diagnostics
  • THEN pivoting both learning and location to work on software in the Financial Technology space in a whole new state!  
With that big set of shifts, the topics I'm going to cover will also shift.  I'm learning lots of good things and I want to share them!  

The fun part for me is that code is code is code.  And by that, I mean I've spent years with DNA code and lately I've spent my time instead studying computer code.  They are different.  They are the same.  Stay tuned!

Monday, June 16, 2014

Waiting for the moonflower

I grew up on a ridge in rural West Virginia.  Before the scurry of summer camps, internships, and eventually grad school and Adult Life, I spent many summer days just hanging around the house, helping out around the garden, flipping over rocks, catching crawdads.  For our family, one of our favorite evening activities was to sit out on the front lawn, our chairs faced in towards the house.  We were waiting for the moonflowers to open. 

Our moonflowers, more specifically Datura inoxia in this case, opened in the evening, around 8PMish ET if I recall.  Usually there would only be 1 or 2 blossoms opening each night.  We'd sit and watch the white umbrella-like bloom puff, slowly unfurl, eventually pop open.  And the fragrance was pure magic: like the cleanest soap you've ever smelled. 

I still marvel at the regularity of this event: it would always be around the same time each night, within 20 minutes or so.   While some folks were waiting for their regularly scheduled TV program, we were happy waiting for the moonflower's show. 

Some day, I'd like to have my own moonflowers to watch in the evening.  Unfortunately, they are not good for households with small children: they are part of the Solanaceae family, related to nightshade.  The entire plant contains dangerous alkaloids, such as atropine, and even touching their leaves is not a good idea.  Beautiful but dangerous, as many plants are. 

These days, my schedule revolves more around daycare dropoffs/pickups, heading to and from the office, and naptimes and bedtimes.  Hard to imagine a time when all we did in the evening was sit, chat, and wait for a flower to open!

Wednesday, August 29, 2012

Join in the chorus!

The other day I was learning a dance choreography from a good friend of mine.  She was having trouble remembering which combination went where, because the song repeats itself many times, and it's hard to know "where" in the song you are just by hearing a clip.

It struck me that this challenge was not unlike the challenge that scientists face trying to assemble genomes.  Genomes often contain repeat regions: stretches of sequence that are repeated in other parts of the genome.  In a way, repeat regions are like the chorus of a song.  When you hear a clip of only the chorus, you might not know if it's at the beginning of the song, somewhere in the middle, or near the end.

The trick with dancing a choreography that has lots of repeats is to memorize the bit that comes right before and right after each repeat.  Those bits help knit the repeat to its correct position in the song.  Genome assemblers can do the same thing if they receive sequences containing a bit of flanking DNA along with the repeat region.  Then each genetic chorus has its proper context.  

Saturday, April 14, 2012

Short Stories and Novels

When you get right down to it, your genome--the sum total of your DNA--is a story.  Sure, it's not written in English: DNA has its own code, with far fewer letters than we humans use to describe our world.  Yet this one code describes organisms as complex as ourselves and as elegantly simple as a virus.  As someone working in the field of DNA sequence analysis, I am thrilled to live in an age when whole genome sequencing is not only possible, it's becoming faster and cheaper by the year.  Still, the process of determining an entire genome is quite the puzzle. 

I want you to imagine your genome is right in front of you.  Maybe it's a big, leatherbound tome or a crisp, new paperback.  In a perfect world, you would just open the book and read it, cover to cover.  But the challenge of DNA sequencing is that the current technology can't do that, yet.  Instead of reading your genome cover to cover, you can only see, for example, 10 words at a time.    Somehow, you have to put the story back together. 


If someone gave you an envelope with the fragments of a haiku in it, you could probably paste the whole thing together all on your own.  But if someone hands you a shoebox with "The Cat in the Hat" fragments instead, it might take you a bit longer to put together.  And what if someone delivers you an office space filled with boxes of a hefty Stephen King novel, eh?  You'd probably need alot of time and alot of help. 

Bigger stories mean bigger problems.  That is the challenge of sequencing a genome, in a nutshell.  There are plenty of things that make the problem more difficult.  For example:
  • What if your "story" is a poem with repeated phrases?  How do you know where to put the fragment that matches 6 different places?  
  • What if, instead of seeing 10 words at a time, you could see 20 but there would be lots of typos?
  • What if the story is so huge it would take you several lifetimes to put together on your own? 
On the other hand, what can make solving the problem easier?
  • What if I gave you a draft copy of your story, and had you match the fragments to the draft?
  • What if I gave you a roomful of interns to help?
  • Or better yet, what if I gave you a roomful of computers to help?
Where is the technology right now?  Well, the stretches of DNA we can read at one time become longer and longer.  Imagine how much easier it is to piece together a story if you have whole chapters instead of sentences!  But you still might get the wrong answer, if the story long or very complex.  So these days scientists use computers to help solve the problems as fast as possible.  Over time, reading the sequence becomes cheaper, and piecing the story together becomes faster.  Truly, it is an exciting time to be a life scientist!  The ethical implications of all the information we glean from genomes, of course, is a discussion for another day . . . 

Tuesday, January 26, 2010

Quorum call, without voices

Is there anybody out there?
When it comes to political intrigue, it’s important to know whether you stand united or alone. Are you surrounded by friends or enemies? Is it time to lead the charge or wait for reinforcements? Tallying how many supporters you have—or how many enemies are waiting in your midst—can be critical to your success.

For bacteria, things aren’t much different. Bacteria survey their surroundings for others of their kind, and others NOT of their kind. They may wait until they have sufficient numbers to launch an attack on your body, or to coordinate effective mining of a nutrient source. But bacteria don’t have eyes, ears, or a mouth; they cannot see, hear, or speak. So how do they tell who is out there?

The peril of the tennis ball
Time to use your imagination, folks. Imagine you are in a windowless room. The door is closed. You are wearing a blindfold and earplugs. You cannot speak. In your hand is a fuzzy tennis ball. When you throw the tennis ball, it bounces off the walls until, eventually, it stops. Think about the chances of the ball ricocheting off a wall and hitting you in the head.

Now remember: you cannot speak, you cannot hear, you cannot see. But, if I put another person in the room with a tennis ball to throw, do you think you’d be able to tell you were not alone in the room? Maybe. How about if I dropped in TEN people, each with one tennis ball? How often do you think you’d be smacked square in head?

It probably seems strange to imagine all this blindfolded tennis ball throwing, and I’m pretty sure it won’t give you a better understanding of human politics. But if you understand how you’d be able to sense the presence of more people in the room based on how often you got a tennis ball concussion, you understand the microbial behavior called quorum sensing.

Bacteria don’t throw tennis balls, they “throw” chemical signals
The tennis ball in our thought-experiment represents a quorum-sensing signal. These are chemical compounds that bacteria can “throw” out into the environment, and “catch” if they hit the cell. In this way, bacteria can sense if they are relatively alone, or if they are surrounded by thousands of others.

If you play around with the tennis ball room scenario, you can start to envision more complex issues with quorum sensing. Can you imagine ways to trick the bacteria into thinking they were not alone? Or trick a crowd of bacteria into thinking they were all alone?

Monday, January 25, 2010

Microbial Grappling Hooks

It's a common theme for my view of microbes: for many real-world problems, microbes have already developed a solution.

Want to turn biomass into fuel? Microbes can do it!
Want to seed clouds for rain? Microbes can do it!
Want to move around using a grappling hook? Microbes can do it!

I've had the honor of contributing a blog entry to the delightful blog "Small Things Considered" on the topic of microbial grappling hooks, known more technically as "Type IV pili." It's basically a rope or filament that is sticky at the end. The bacteria can extend this filament out of the cell, and if the sticky end catches on something the microbe can pull itself along by reeling the filament back in. This is just one of many fascinating structures bacteria can use to interact with the world around them.

If you want a little more technical detail on how Type IV pili work, check out my "Small Things Considered" blog entry on the subject!