Thursday, July 19, 2007

The Chemical Structure of Double-Stranded DNA

Double-stranded DNA consists of two complementary polynucleotide chains where the bases on one strand form hydrogen-bonded associations with the bases on the other strand. There are only two pairs of bases that can form regular interactions where the edge of one base match the edge of another so that the two bases are joined by hydrogen bonds while lying in the same plane.

The A/T and G/C base pairs each have one purine (A or G) and one pyrimidine (T or C) which means that the size of each of the base pairs from one side to the other is almost the same. When two polynucleotide strands are laid side to side, as in the figure, the distance between the sugar residues on the two strands is the same for every base pair. What this means is that double-stranded DNA is a very regular structure in spite of the fact that the sequence of base pairs can be very different in different parts of the molecule.

The two strands are said to be complementary because all the bases in one strand are paired with the complementary bases on the other strand (A with T and G with C). This can only happen in a way that generates a regular structure if the two strands are antiparallel. If you look at the structure shown above you can determine the directionality of each strand by following the rules described in DNA Is a Polynucleotide. The left hand strand runs in the 5′→3′ direction from top to bottom while the right hand strand reads 3′→5′ from top to bottom.

The discovery of the structure of DNA by Watson & Crick was only made possible when they realized that the two strands of the helix had to be antiparallel.

There’s a convention for writing DNA sequences. They all have to be written in the same direction and that’s 5′→3′. Thus, the sequence of the bases on the left hand strand is AGTC and the sequence of the bases on the right hand strand is GACT. This may seem a bit confusing if you don’t understand the convention.

One of the classic questions on undergraduate exams is to give the sequence of one strand of double-stranded DNA (e.g., TAACTGGCGGA) and ask students to write down the sequence of the other strand. You’d be surprised at how many students haven’t paid attention when we discuss antiparallel strands in DNA and naming conventions.

©:Laurence A. Moran and Pearson/Prentice Hall 2007


  1. Bah, anti-parallel questions are easy. It is when they start combining parallel and anti-parallel, coding and non-coding strands, mRNA, open reading frame, and translation in one question that things start getting a bit crazy. For instance "here is a single strand of DNA from the beginning of a protein, write down the mRNA sequence and the amino acid sequence". You have to find the start codon (which may or may not be on the strand given), find the open reading frame, then get direction correct, then convert it to an amino acid sequence using that reading frame.

    If you are in a particularly sadistic mood you can then make them figure out the most likely conformation for the polypeptide and the free energy loss from forming that conformation.

  2. When I ask those questions I usually add that it's a DNA sequence from the middle of a coding region and there's usually at least two open reading frames. :-)

  3. Yes, if it is a question about open reading frame or if you specified which was the coding strand.

    But if there are two open reading frames then you don't get a unique solution, and it has much harder to come up with a DNA sequence that can lead to two good folding configurations (i.e. ones you can solve without a computer) as opposed to just one.

    I suppose you could design a sequence so that there is only one open reading from even if you look at both strands, but that would be a long problem, checking for six open reading frames. If you wanted to be really evil you could leave out the 5' and 3' labels, making them check 12 open reading frames. But that is getting downright cruel not to mention unfeasibly long if you combine it with determining the proper folding configuration and the free energy change. If you add counting the number of conformational states and the free energy change from a point mutation that changes a residue from a long hydrophobic to short hydrophilic (or vice versus) you could have a whole molecular biology midterm with just one question.

  4. With two open reading frames you can almost always tell which one is correct by looking at the amino acid sequence. It only takes a little knowledge of proteins to see that some amino acid sequences are very unlikely.

    I'm not sure how you can write a sequence without identifying the 5' and 3' ends since the convention is that the 5' end of a sequence is always on the left. This would be the top strand of double-stranded DNA, by convention.

  5. This comment has been removed by the author.

  6. Sir, How can one tell if an amino acid sequence is correct or not??

  7. You can often recognize incorrect amino acid sequences because they contain odd combinations of amino acids and unusual compositions. Look for sequences that have lots of prolines and tryptophans.