Thursday, March 26, 2020

Deciphering Caesar's Cipher

Right now I'm taking Harvard's CS50x online computer science course: https://cs50.harvard.edu/x/2020/

Since I have no "formal" computer science training, I wanted to go back and take a course that would give me some of the fundamentals my colleagues have.  You'd be surprised how much coding you can do without that degree, for certain.  But being able to do a thing is not the same as knowing why you do it the way you do.

This week my problem set included implementing a version of Caesar's Cipher--a way to encrypt messages by shifting the letter by a pre-arranged value.  For example, if you and I both agreed we'd shift every letter by 1, then 'HELLO' would become 'IFMMP.'  Which takes some thought but you can do it in your head, right?  But what if I told you we'd be shifting them by 20?  A+20 doesn't immediately make sense.  But fortunately computers already store letters as numbers. 

BEHOLD THE ASCII TABLE

So now it should be as simple as adding the starting letter's value to the key, then finding the new key in that ASCII table, right?  Oh yeah that's what I thought, until I remembered that the alphabet is 26 characters long. 

If your starting letter is A and your key is 20, you'll still land somewhere in the same alphabet.
If your starting letter is A and your key is 26, you'll land beyond Z . . . and then what do you do? 

Side note: "The Land Beyond Z" sounds like a pretty atmospheric novel title, doesn't it?  

As a human, you probably already know what to do: figure out how far past Z you've gone, and drag your finger back to A on that ASCII and keep counting.  You've got to "wrap around" to the beginning of the alphabet again.  But how do we tell a computer to do that?  I'll give you a short version, and a longer version:


TL;DR

Use of modulo in this case is like this game Cliffhangers on Price is Right!  When you know what's beyond your range, what remains, you can pick up this poor yodeler and set him back at square one and let him climb whatever steps he had left.  I'm going to pretend this mountain is 0-indexed because then it's perfect for the alphabetic range:



Longer Version / Deeper Dive / Down the Rabbit Hole We Go

encrpyted letter = ((starting letter) - 'A' + key) % 26 + 'A'

It works, and late at night I got here by googling around.  It worked but I didn't understand why.  Now I think I do.  Let's start here:

(starting letter) - 'A

A through Z is our valid range of characters.  If we take that Cliffhanger game analogy, past Z is a cliff!  So this part of the equation tells us how far away from our start, from A, we are right now.  

((starting letter) - 'A' + key)

Adding the key tells us how far we'd go from where we are, with no limits in place.  How far past the cliff's edge would the yodeler go, if he couldn't fall?

((starting letter) - 'A' + key) % 26
Here's where it gets fun!  Or at least, fun to me.  % in code can mean 'modulo' or 'remainder', where you are dividing a number by another number, and handing back what's left.  Here we are dividing the distance we could travel by 26, because that's how far the valid range goes.  How far past the cliff does our yodeler go?  If the number is smaller than 26, then the remainder is 0.  If the number is greater than 26, we get back what's left over--how far he's gone past the cliff.  

((starting letter) - 'A' + key) % 26 + 'A'
Now we know how far he's gone past the cliff, so we add that number to our starting point, 'A.'  In other words, instead of falling off the cliff, we picked up the yodeler and dropped him back at square one, at 'A', and then allowed him to climb the remaining steps to our new letter.


Let's practice this some more!

So if we take that complicated example, starting with A and having to change it by 26:

Given 
A = 65
Z = 90
Key = 26

((65)-65) = 0
Relative to our starting point, A, we're at 0

0 + 26 = 26
We want to move by 26 letters

26 % 26 = 0
There's nothing left over.

0 + 65 = 65
Fun!  By hurrying up we've stayed exactly in the same place.  Or more precisely, our yodeler moved the full length of our range and ended up back at the beginning of his climb.  

If we used 27 as our key instead . . .

((65)-65) = 0
Relative to our starting point, A, we're at 0

0 + 27 = 27
We want to move by 27 letters

27 % 26 = 1
We've got 1 left over.  We've exceed our range by one.

1 + 65 = 66
66 is B.  So the yodeler traveled the range, had one step left over, and instead of stepping into thin air, we picked him up and plopped him back at A and took one more step.  

To me, what's neat about all of this is that it doesn't matter how many times bigger than 26 the key is: with modulo we don't care how many times one number goes into another, only what's left over.  We know our starting point, so we apply the steps left to that.  

What if that poor yodeler never had to fall off a cliff!?  A thought for another time, perhaps.  Thanks for joining me on this mental journey! 



From DNA code to computer code

Hey all, it's been awhile right?  A lot has happened since I last posted here.  Significant milestones have included:

  • Completing my PhD thesis in Medical Microbiology and Immunology
  • Joining the private sector to work on bioinformatic software . . . 
  • Then pivoting to software for medical diagnostics
  • THEN pivoting both learning and location to work on software in the Financial Technology space in a whole new state!  
With that big set of shifts, the topics I'm going to cover will also shift.  I'm learning lots of good things and I want to share them!  

The fun part for me is that code is code is code.  And by that, I mean I've spent years with DNA code and lately I've spent my time instead studying computer code.  They are different.  They are the same.  Stay tuned!