« Star Wars, opening night | Main | Those durned creatures »

May 24, 2005

Reading The Story of Life

dnaseq.jpeg

Who wants to learn about DNA sequencing? Come on, come on, it's pretty simple and really interesting! Of course you've all heard about the Human Genome Project and how they managed to figure out the sequence of all of those crazy letters that make up who we are, in so few words. I won't tell you everything about how the human genome was put together, but here's the start of it all...

First I should begin with a little review. Think way back into the mists of time... biology classes... ok, enough conjuring. Remember that DNA is a linear molecule made up of nucleotides, that pair one to another to make a double stranded molecule that twists into a double helix. All that stuff is cool for other things but for now, all you need to know is that DNA is like a very very long string of letters, using only the letters A,C,T,G. Well, you can also remember that DNA is copied - happens every time a cell divides, and is a pretty simple process. A DNA polymerase - a little molecular machine that makes DNA - starts at one end of the DNA molecule and works its way to the other, copying as it goes. One important thing: for particular reasons, this copying is always done in the same direction. It's not important what the chemical term is, but for now we'll assume that it always goes left to right (which is the way scientists write it anyways.)

The copying mechanism is simple. In order for anything to happen at all, the polymerase needs to have free nucleotides hanging around, to use as the raw materials to make the new piece of DNA. The process always begins with a primer - basically a little piece of a related molecule called RNA which gives the polymerase somewhere to start. It moves on to the first position that doesn't have a partner, and then sits there and waits for a free nucleotide to come on by. If that one matches, then great -- it sticks it on, and moves on. Otherwise, it doesn't use the mismatch one, but instead waits for the right match. (Usually. When it sticks the wrong one on, that's called a mutation and if it happens in real life, you could get anything from a different hair color to, well, very bad things...) And the thing just keeps going like this until it runs out of DNA to copy, then everything falls apart and you're done.

Scientists knew about this whole copying thing for a while before they even figured out how to get the sequence of what they were copying! But a very bright guy named Sanger figured out how to use this natural process to his advantage. It involves a special molecule called a dideoxy nucleotide. The only difference between a dideoxy nucleotide and a regular nucleotide is that the dideoxy one can be added to a growing DNA chain, but once it's on there, you can't add any more. It physically doesn't have the attachment site where the next one would go. "Well, that's all well and good," you might say, "but what good does it do us to break the thing as it's working?" Just remember, scientists are very good at carefully breaking things to figure out how they work...

It works like this. Let's go back and start copying again. But this time we'll throw in a few of these dideoxy nucleotides. What do you think will happen? Well, if we have a lot more of the regular nucleotides around, it'll be just fine -- until we happen to grab a dideoxy one. That one will go on just fine, but once it's on, the game's up, everything falls apart and you have to start over again. This all happens randomly, though, so you get a whole mess of different molecules. But, since DNA is always copied going the same direction, and starting from the same place (with the same primer) they all begin with the same sequence!

A picture helps:

Template: CTCACCCTGTAGGTGTTCCAGG
----------------------
Copies: GAGTGGGACATCCACAa
GAGTGGGa
GAGTGGGACATc
GAGt
GAGTGg
GAGTGGGACATCCACAAGGTc

That's just a mess. But what if we sort all of these sequences by their length...

Template: CTCACCCTGTAGGTGTTCCAGG
----------------------
Copies: GAGTGGGACATCCACAAGGTc
GAGTGGGACATCCACAa
GAGTGGGACATc
GAGTGGGa
GAGTGg
GAGt

Ahh, now we're getting somewhere. This is just a short list, but since there's so many molecules in solution, there will be a bunch for every possible length. If we could just take this mixed-up soup and sort it by the length of each DNA molecule, we could just read off the last nucleotide in each sequence and that would tell us the entire sequence!

Well, that sorting method exists. It's called gel electrophoresis, and while the nuts and bolts aren't too important, what is important to understand about it is that short, light pieces of DNA move through a gel much quicker than long, heavy pieces of DNA. If you look at the gel after it's finished separating everything, you see bands, and the bands at the far end are made of DNA molecules that are smaller than the bands at the near end. And if you do it just right, you can separate molecules that are only different by one nucleotide.

So then, the whole thing put together: You take the DNA you want to sequence, a bit of primer, a lot of regular nucleotides and a few dideoxy nucleotides, mix them all up for a little bit, then put them through gel electrophoresis and you get these bands where each band is made of a bunch of molecules which all have the same sequence and end in the same dideoxy nucleotide.

Uh oh, we forgot one thing! How do we actually figure out what that nucleotide really is? They used to do a kind of messy thing where they would actually do four of these things together, with only one kind of dideoxy nucleotide at a time, and use radioactivity to sense things... it took a lot of time and resources. Nowadays, it's really neat. What they do is use dideoxy nucleotides with little fluorescent molecules attached to them. Each kind of dideoxy nucleotide (A,C,T,G) has a different color attached to it. Then they take the gel that they got from gel electrophoresis and scan it with a laser and read all four colors at the same time. If you look at the top of this page, you'll see a short piece of one of these scans. Each different color line is a different kind of dideoxy nucleotide, and the peaks of those lines are the actual bands that you can see in the gel! Since each band has only one kind of dideoxy molecule, at the end, you only get one color per band. The letters above the peaks correspond to the sequence that the computer figured out -- simply by taking the color of the highest peak at each position. In case you were wondering, it's not too important why they're all different heights.

So that's how they read DNA sequences. If you bug me, I can tell you more about those RNA primers - they're key to the whole thing. Oh yeah, there's one other thing that you have to keep in mind. This process works great - for sequences up to about 500 nucleotides. Maybe 700 if you're lucky. More than that, the bands get too squished together. So then, the human genome - which has 3 billion nucleotides - had to be broken up into these little pieces. And each piece was sequenced. Several times, to make sure they didn't have any mistakes. And then a computer had to take these tiny little fragments and put them together... but that's a tale for another day.

Posted by kgutwin at May 24, 2005 07:40 PM

Trackback Pings

TrackBack URL for this entry:
http://www.gutwin.org/mt/mt-tb.cgi/238

Comments

Hey, this all sounds great and is very intersting, but how does this help to solve the problem of my Leukemia??

Posted by: Gramps at May 27, 2005 11:52 AM