[Added later: Clearly the Nobel committee was influenced by this post.]
Our view of the world beyond our senses is necessarily influenced by the instruments we use to investigate it. There is far more protein in cells than nucleic acid; and to the naive observer, the primary sequence of an oligopeptide (with 20 possible monomer units) seems a far richer source of information than nucleic acids, with only 4. Consequently it wasn't until 1928 with Griffith's bacterial transformation experiments that people began to seriously think about nucleic acid instead of protein as the heredity chemical. The final coup came in 1953 when Watson and Crick solved the crystal structure and provided a mechanism for DNA to be the biological information carrier.
It hasn't been until the last two decades that we've begun to understand the importance and diversity of RNA in Earth's biochemistry, possibly because RNA is harder to work with than DNA. Tom Cech and Sidney Altman received the Nobel for their work in the 1980s showing that RNA has catalytic properties stemming from its extra (relative to DNA) 2' hydroxyl. Besides its crucial role in the ribosome, RNA can also catalyze its own splicing.
This last observation was critical for anyone trying to puzzle out the chemical origins of life on Earth. DNA by itself in solution is actually quite boring; it just sits there. If life began with DNA, it would have stopped there too. Crick and Orgel speculated about life's chemical origins in the 1960s, forced to the idea that there must have been either an all-protein world (from which the transition to DNA-as-heredity chemical is unclear) or an all-RNA world. They also (pessimistically) speculated that, so unlikely were these events, that perhaps replicator chemistry could diffuse more easily through space than we realize so that it only had to happen once in any given galaxy for it to spread to other stars (panspermia, first discussed by Arrhenius over half a century before; for a longer discussion on one angle on panspermia, go here). Once the work on RNA's catalytic (and autocatalytic) abilities began, the RNA-world began to seem much more feasible. As we now know, pointing to its initial favorability, RNA splicing and even the attachment of amino acids to tRNAs both proceed just fine on their own. Free nucleotides are still the universal energy currency of cells. Urey-Miller's experiments showed prebiotic processes that could produce amino acids; these studies were soon supplemented by observations that nucleobases occur spontaneously on asteroids and can have plausibly originated prebiotically.
So far this has been merely a review of the RNA world hypothesis. It's anything but proven, but it's the best story we have so far, and we even have pathogens active today that are merely self-reproducing (though host-dependent) RNAs (viroids). The idea is that once nucleic acid had been established as the dominant replicator, some interaction occurred between RNA and DNA that moved DNA upstream in the causal hierarchy; that is to say, put it chemically in charge, possibly owing to its greater stability and conservatism since it's less reactive and exists as backed-up double-stranded template and is less reactive. Like the seawater-like salt concentrations found in the cells of Earth organisms, the activity of catalytic RNAs is a biochemical fossil, along with the messenger intermediates delivering sequence information to the ribosomes, energy carriers like ATP, and cofactors like NAD.
Segregation of Fitness Factors Within Membranes
Whatever the molecular basis for Earth's first chemical replicators, if they had a diffusible element to their reproduction, they had a commonly recognized problem to solve: how to sequester advantage. That is, if there was a developing RNA world where autocatalytic RNAs had developed a set of consistent interactions to polymerize amino acids and take advantage of a more diverse chemistry, how did they keep neighboring strands from benefiting? Imagine an RNA molecule has the trick of polymerizing a nifty oligopeptide that fetches ribonucleotides just incrementally faster than any other neighboring RNA in solution, making RNA-1 increase its numbers faster than its competitors - right? Wrong. That nifty oligopeptide is going to diffuse away and is just as likely to help any other RNA in RNA-1's neighborhood. So, from RNA-1's fitness standpoint, why bother? Until there's a closed feedback loop - until that oligopeptide is for some reason more likely to associate with RNA-1 more than RNA-1's neighbors - there's no point. (Frequently overlooked in this scenario is defense - RNA-1 also wants to keep its neighbors from eating up all the free nucleotides around it, or poisoning RNA-1's phosphodiester formation process, or even hydrolyzing RNA-1 itself).
The solution, of course, was the lipid membranes that now surround all cells (but not all replicators). Membranes form a self-nonself boundary and sequester diffusible benefits while providing a defense against chemical predation. The details of how this might have begun are still up for grabs, since there would have to have been mechanisms to open the membrane from inside and to gather and react to information from outside. While admittedly all this is highly speculative just-so discussion, the central point is that it's very difficult to imagine how a well-elaborated RNA-protein interaction machinery could have developed prior to membrane encapsulation of RNAs and their associated products.
What Was the Mechanism for the Transition to the DNA World?
If we assume that cells are "about" making more nucleic acid, then DNA's stability and conservatism does in fact seem to make it a better reservoir of information. Still, RNA world speculations tend to be a little short on details on exactly how such a transition could have come about. At this point, we're talking about a reproducing RNA molecule surrounded by a membrane with some set of RNA > protein chemical rules - in modern-day terms, we have a lipid bilayer with ribosomal RNA that can reproduce. The rRNA reproduces either on its own or with the help of an RNA-RNA polymerase, like the one influenza still uses today (in fact, many eukaryotes also have endogenous RNA > RNA polymerases).
There are at least two ways we can imagine the transition occurring.
A. An intracellular transition. RNA-protein cells developed a reverse transcriptase that gradually assembled a DNA-mirror of the RNA genome of the cells. If the advantage of such a DNA-mirror was as a back-up in the event that the RNA genome is damaged, this could only have been selected for with some mechanism to convert DNA into RNA (in modern terms, transcription). Because of DNA's conservatism, cells which relied more and more on DNA would be favored, eventually leading to a cell where the only reproduction was of DNA, not RNA. One test for this model is to look for a most recent common ancestor of RNA-dependent RNA-polymerases that is older than the last common ancestor of reverse transcriptases, which is in turn older than classical DNA-dependent RNA polymerases (with reference to RNA Pol I which transcribes rRNA), which is in turn older than DNA polymerases. It should be pointed out that most eucaryotic cells code for reverse transcriptases, some of which are critical for DNA maintenance, but most of which do not obviously benefit anything but their own reproduction (selfish elements), and which comprise substantial portions of eukaryotic genomes. Selfish elements and junk DNA are thought to be absent from prokaryotic genomes due to selection pressure on fecundity.
B. Nucleus-as-endosymbiont. To look at cells in a non-nucleocentric way, eukaryotes have 3 membrane-bound organelles containing genomes: chloroplasts, mitochondria, and nuclei. Nuclei are unique among these three in that they export nucleic acids to interact extensively outside their own membranes. If there is latitude even in a world of highly "committed" biochemical structures (like the modern one) for the survival either RNA or DNA viruses, we can presume that there would have been room for a membrane-bound DNA viruses in the membrane-bound RNA-world. A DNA virus could infect an RNA-only cell (similar to Philip Bell's concept of "DNA virus as ancestor of all eukaryotic nuclei"). DNA viruses in the RNA world would need reverse transcriptase for their reproduction; if we also presume, in multiple infections, a viral pickup of the RNA-cells ribosomal and tRNA RNA genes, they would eventually be incorporated into the nucleus - or today, the DNA molecule. The obvious objection here is that this assumes the first DNA cell was a eukaryote. First, the model could still function without a membrane-bound DNA molecule; we just couldn't explain the membrane-segregated nucleus in terms of endosymbiosis. Second, assuming phylogenetic relationships do not conflict with this account, it can be further argued that as with selfish elements and junk DNA, ancestral cells having greater complexity than modern bacteria is not implausible; prokaryotes are a more stripped-down later version relying on greater simplicity, driven by the need for fecundity. That is to say, since the advent of DNA cells, prokaryotes have lost their internal membranes, rather than eukaryotes having gained them. Third, and most important for later points, phylogenetic relationships for the eukaryotic LUCA based on ribosomal RNA are at this stage still unclear.
Both theories fly in the face of the central dogma, but the RNA world is the best supported account of the origin of life, and details of the transition to the DNA world are sketchy. In the first, DNA is merely a backup for the (at that time) true RNA genome - a set of rRNA and possibly tRNA genes - and the DNA backup gradually usurps RNA's role. In the second, a DNA virus remains permanently in a cell and absorbs RNA genes for what would later become our rRNA genes (but coded in DNA).
Speculation about pre-DNA world biochemistry can be disorienting. Taken out of their central-dogmatic context, the definitions of genotype and phenotype become less clear - in the RNA world they overlap strongly - and there seems to be no clear causal starting point in the information cascade. Which leads to the question: is there even such a clear starting point in the modern DNA world?
Why the Centrality of DNA? Cyclic Cause and Effect
Humans are limited in our pattern recognition abilities and when we encounter a new complex phenomenon we necessarily think of it in isolable cause-and-effect narratives. Consequently, it's useful in biomedical research to think of DNA as being at the top of a causal cascade that results ultimately in the reproduction of more DNA (or in the production of protein, which results in the reproduction of more DNA). In this view, cells and whole organisms - phenotypes - are survival machines that DNA uses to make more of itself. Put concretely, the function of an apple tree isn't to make apples; it isn't even to make more apple trees. It's to make more apple tree DNA.
Of course, no DNA molecule ever reproduced itself without the help of a host of specialized proteins, and every DNA molecule in existence today is causally downstream of a set of protein-and-RNA mediated events going back billions of years (just as all of them are causally downstream from DNA). This seems trivial, but leads immediately to the question: why is DNA alone given a privileged place in that cyclic sequence of events?
There are two reasons this is so. In the first half of the twentieth century there were a number of cyclic biochemical processes elucidated, among them the urea cycle, glycolysis, and the central dogma of molecular biology. In all of these, some entity or set of entities A gives B gives C, a clear and isolable stepwise set of inputs and outputs. Second, special in the case of DNA, there is information. The carbohydrate monomers in a glycogen molecule are chemically equivalent, a chain of zero-zero-zero-zero. DNA contains a quaternary code that has no clear function apart from its information content - it has a meaning, in terms of corresponding to amino acids. Cells don't consume it for energy; cells don't build walls or tubes out of it. It's there to be read, and the only thing that determines its meaning is other DNA molecules.
Two Thought Experiments, and the Meaning of Life
Is that last statement true at all? Of course not. Without even getting into epigenetic phenomena, the on-its-own-quite-inert DNA molecule doesn't do anything that proteins (and the RNAs that made those proteins) don't let it. Without those RNAs and proteins, DNA "means" nothing. The multiple inputs upstream from DNA, and the relatively pristine outflows downstream from it, make it a convenient point in the process for us to manipulate cells. DNA only matters because of what it means, and its association with proteins that actually do the work of the cell. That fact that its conservatism allows us to trace ancestry doesn't force us to conclude that it's the point of the cell. Thought experiment: if tomorrow, physicists found some bizarre particle physics technique to trace cells based on something in the lipid membranes of lysosomes, would that mean the cell was about lysosomes?
No, you answer - unless that particle physics tag on lysosomes correlated with some property of lysosomes that interacted with the rest of the world in a consistent way, like the DNA-protein feedback loop. Then you'd have something.
Forgive the mental biology, but thought experiments allow us to think about problems without the anesthesia of the familiar masking patterns. So indulge me: imagine that the world's first biochemical string theorist is working on an exotic n-dimensional model of physics and discovers that, unique among known molecules, DNA has a prominent, nonrandom structure in the higher dimensions model. Applying this theory to phylogeny trees, it becomes clear that in those higher dimensions, some new property of DNA, its n-dimensional conformation, and not its linear sequence, are actually more conserved over time, and correlates better with protein function, than its 5' to 3' nucleotide sequence. Wouldn't that be exciting? It would become obvious that we shouldn't be so hung up on the primary sequence; we would then reasonably conclude that this higher-dimensional structure is somehow what cells are "about" and what evolution has been selecting for.
Given the title of the post, you've probably guessed where this is leading. My argument is that life on Earth is best understood in terms of an RNA World that to us seems masked by the DNA it uses as a backup. At least as much as it's about DNA, life on Earth is about ribosomes. Apple trees are one way that ribosomes can make more ribosomes, as are slime molds, great white sharks, and koalas. We can think of the function of DNA as either a) a backup of ribosomes and b) one stage in the manufacture of all the rest of the ancillary machines that ensure survival of ribosomes. How is this different from making the same argument about any other cellular element, that the point of life is proteasomes? Not only are rRNA sequences (not to mention tRNA anticodons) the most conserved nucleic acids in living things, what is conserved is the structure of the molecule, over and above its sequence. Even more specifically than ribosomes, what is being preserved and selected for and served in a giant feedback loop by the rest of the structures in the cell is this consistent set of RNA-protein interactions.
There are many possible objections to this shift in viewpoints, and among them I will address two: first, the existence of viruses, which are non-ribosome-containing replicators. Viruses are "genes that got away" but have no independent metabolism (or ability to reproduce) independently of ribosomes. In terms of this ribosome-centric hierarchy, viruses are a peripheral curiosity in the same way as prions, which are proteins that reproduce their shape (and therefore physical properties) but also cannot reproduce chemically in the absence of ribosomes; of course, in terms of biomedical relevance to human life, viruses are anything but a curiosity (but this may serve to obscure their hierarchical triviality). On the other hand, Viroids are RNA-only replicators which can, in fact, reproduce without ribosomes, using only RNA Pol II, and some are autocatalytic.
A second objection made frequently is that the structure of rRNA and the protein-nucleic acid that structure mediates (and the rich chemistry that sprouted around it on the young Earth) were driven by thermodynamics - that there's only one way to build the interaction surface, and only one set of 20 amino acids that could ever have been chosen. If the shape and activity of a ribosome are really just thermodynamic fate, then there's no heredity information in rRNA - no heritability information, no meaningful travel over time in the fitness landscape, and no possibility of "frozen accidents" that commit a system to climb toward local optima, any more than there is for a star or fire. Yes, fire spreads and consumes fuel, but the commitments of dumb physics force them to behave the way they do; there's no difference in the flame on a birthday candle whether you lit it from a lighter or at the edge of a burning pine forest. One way to test the question of whether ribosomes are stuck in one shape would be to run synthetic biology experiments with simple ribosome systems where you swap out the 20 legacy amino acids to search for a more robust set of monomers, with some kind of feedback to allow selection. But that would be an extremely complicated experiment. I rather think that the burden of proof is on the claimant who argues that the anticodon system we ended up with is necessarily the universally best one, and that RNA-descended aliens from Alpha Centauri would necessarily use the same set that we do.
- Our understanding of molecular biology and the evolution of the cell is constrained by the useful conventions we use to study chemical processes in the cell. Despite the wide acceptance of the RNA World hypothesis, we still view life on Earth as being about DNA.
- The transition from the RNA world to the DNA world was mediated by either development of reverse transcriptase and RNA polymerase activity, or an endosymbiosis event involving a DNA virus infection and uptake by the virus of rRNA genes.
- DNA is best thought of as a back-up of a) rRNA and b) all other types of catalytic molecules (today, mostly proteins) all of whose function is to ensure the survival of rRNA.
- rRNA genes are the most conserved among living things; notably, the structure of the rRNA itself is even more conserved than the primary sequence.
- A better way of thinking about life than considering it to be about DNA is to think of it as about the propagation of ribosomes, or specifically, about propagating a specific set of protein-RNA interactions mediated by a specific type of chemical structure.
Help Me Choose Posts for My Next Book :-)
19 hours ago