Molecular Clocks

From SkepticWiki

Jump to: navigation, search

Contents

[edit] Introduction

We can produce an equation linking the number of generations since the separation of two populations, the mutation rate, the size of the genome, and the genetic difference between them.

We can use this equation in two ways.

First, we can use it to test our concepts of evolution and paleontology. If, from studying the fossil record, we think we know that two species have a common ancestor, and that we know how long ago they diverged, then we can put this together with our knowledge about mutation rates and genome sizes to predict how much genetic difference there should be between the two populations. We can then measure the actual divergence to see whether the evolutionary prediction is accurate.

Second, when the accuracy of these and other predictions has convinced us that the theory of evolution and of common descent is accurate, then we can start using the formula the other way round: if we know the genetic difference between two species, the mutation rate, and the genome size, then we can calculate the time since their divergence: we can use accumulation of genetic difference as a clock to tell us how long ago the last common ancestor of the two species lived.

[edit] The reasoning

In this section, we shall show how to derive a relationship between the number of generations since the separation of two populations, the mutation rate, the size of the genome, and the genetic difference between them.

We shall explain the most basic and simple model possible: in practice scientists use more biologically realistic and statistically sophisticated methods. However, this article should give you the basic ideas behind the concept of a molecular clock.

Also for the sake of simplicity, we shall do the calculation just for diploid organisms like ourselves.

Recall first of all that very nearly all mutations are neutral. Recall also that (as discussed in the article on Genetic Drift) if we have a number of neutral variations at a site within a population, one of these variants will eventually win out over the others simply as the result of random fluctuations in the proportions of the variants in the gene pool: all but one of the variants will eventually fluctuate to 0%, leaving one variant the undisputed champion, occurring in both copies of the site in every animal in the population. Such a variant (now a "variant" no longer) is said to have become fixed in the gene pool.

Now, a new neutral variation thrown up by mutation is one of 2N versions (some identical) of the site, where N is the size of the population (the 2, of course, is there because we are dealing with diploid organisms). Because the variation is neutral, this means that it has no better nor worse chance of going on to fixation in the gene pool than the other 2N-1 versions of the site. It follows that when it first arises, its probability of fixation is 1/2N. A more detailed version of the proof will be found here.

Now, let the probability of the mutation in an individual be μ. Then the probability of it arising in a generation will be 2Nμ. So the probability, in any generation, that such a mutation will arise and eventually go on to fixation is 2Nμ/2N; and since we can cancel the 2N on the top and bottom of this fraction this works out to be just equal to μ.

So, let M be the average probability of any mutation arising at any site, expressed in mutations per base pair per individual (i.e. the mutation rate for the population), and let the number of sites be s. Then it follows from the result just given relating the rate of mutation to the rate of fixations that the expected number of fixations per generation is simply given by Ms. Hence if g generations go by, the expected number of fixations is given by gMs.

So, consider what happens when you take a population and divide it into two populations that are unable to interbreed. Each of them will separately undergo different mutations and different fixations: after g generations, each will have undergone gMs fixation events. Therefore, the genetic difference between them since separation is given by gMs + gMs, or, more simply, 2gMs. We should of course require a slightly more complicated formula if for some reason the two populations breed at different rates.

The reader will notice that we have assumed that the two populations will have, and fix, different mutations. This is an acceptable approximation to reality, because the probability of them having any significant number of identical mutations at identical sites is ridiculously small, because mutation rates are low and genomes are huge.

You will also note that throughout our reasoning, we have used the approximation that all mutations are neutral. There are good reasons to think that this is a good approximation to the truth: from measurements of the mutation rate in humans[1], we can conclude that the average human being has about a hundred new mutations not inherited from his or her parents, without apparently doing much good or harm.

We may therefore draw the following conclusion:

Conclusion : If a population is separated into two non-breeding populations, then to a good degree of approximation the genetic difference between them will be equal to two times the number of generations since separation, times the mutation rate, times the number of sites in the genome.

There is one other useful approximation we can make. To measure the genetic difference between two populations, we would need to have complete genetic sequences for them both, which we usually don't. However, it is easy to count up the divergence in fewer sites than the whole genome, and plug into our formula the number of sites we looked at and the amount of genetic difference we found at those sites. By the Law of Large Numbers, as the number of sites we look at increases, the better this procedure will approximate the true figures we'd get from looking at entire genome sequences.

Another practical issue is that it is easier and cheaper to compare the proteins that the genes code for than the genes themselves, because it's easier to find, for example, comparable molecules such as histones or haemoglobins in the blood of organisms than to locate comparable gene sequences in the genes.

[edit] Testing evolution

The way to test any theory is to compare its predictions against reality. The results stated above, in conjunction with the fossil record, allow us to predict the amount of genetic difference between two species, where the fossil record is reasonably good.

Take humans and chimpanzees as an example. Paleontologists claim that the fossil record shows that they diverged about seven million years ago. This leads to a prediction about what we should expect to see if we look at the chimp and human genomes.

Take the average time between generations to be 20 years (not an unreasonable figure, given the documented lifespan and breeding habits of chimpanzees[2]). Hence, in seven million years we would have 350,000 generations.

The rate of single nucleotide substitutions in primates can be found directly by observing the rate at which people exhibit genetic diseases caused by dominant single nucleotide substitutions which are not inherited from their parents, and so represent new mutations: the figure is 1.7 × 10-8 single nucleotide substitutions per nucleotide per generation (figure from A.S. Kondrashov, Direct estimates of human per nucleotide mutation rates at 20 loci causing Mendelian diseases).

The last figure we need is the size of the genome: approximately three billion sites.

So plugging these figures into the equation derived above, we get a prediction: the divergence (counting only single nucleotide substitutions) between the chimpanzee and human genomes should be approximately 35,000,000 single nucleotide substitutions.

And it is (see Ebersberger et al, Genomewide Comparison of DNA Sequences between Humans and Chimpanzees).

[edit] Applying evolution

Once we have been convinced, by the success of this and other evolutionary predictions, that the theory of evolution is true, then we can start using genetic difference as a measure of how long ago the common ancestors of two species lived, by algebraicly rearranging our formula to read: the number of generations since separation is given by the amount of genetic difference divided by twice the mutation rate times the number of sites. This can allow us to refine our knowledge of the history of life in cases where our present knowledge of the fossil record is uninformative.

We emphasize again that this use of the formula rests on knowing that evolution has taken place. We can use the genetic difference between two species to calculate the time over which they have separated from their common ancestor only if they did, in fact, have a common ancestor. If they had, instead, been specially created by God with that amount of difference between them, then application of this method would still give us a "time since separation", but it wouldn't be accurate since they wouldn't, in fact, have separated.

[edit] Applying and testing creationism

The newer sort of creationist will admit the formation of species within "created kinds". In brief, the idea is that Noah only needed to take a few "created kinds" on the Ark, and that their lineages diverged to produce modern species: so, for example, he might only have taken two of the "cat kind" on the Ark, which, after the Flood, diversified into such things as lions, tigers, cheetahs and so forth. For more information, see our main article on Created Kinds

In principle, then, a creationist could use the genetic clock to find the date of separation of any two species that he classes as the same "created kind", and which he therefore believes to have diverged from a common ancestry.

We predict that no creationist will ever do so, partly because they have no interest in doing science, and partly because it would give them results that they really really don't want to hear.

[edit] Misconceptions

It is fairly easy to predict that any creationist exposed to information about molecular clocks will confuse the two methods we have outlined above, of testing and of using the theory of evolution, and start rambling about "evolutionary assumptions" and "circular reasoning".

We trust that this article has made the difference clear. We test evolution by taking the claims made by paleontologists about evolutionary events and their timing; we use these claims to predict degrees of genetic difference, and we see how the prediction compares to reality. The success of such predictions confirms that the paleontologists know what they're talking about; conversely, repeated failure would falsify the theory: if we found a hundred times more genetic difference between chimps and humans than is predicted by the claim that they diverged from a common ancestor seven million years ago, then this claim would have been disproven.

[edit] Related Articles

Personal tools