Origins of Life
The origin of life is a scientific problem which is not yet solved. There are plenty of ideas, but few established facts. It is generally agreed that all life today evolved by common descent from a single primitive life form. We do not know how this early form came about, but scientists think it was a natural process which took place perhaps 3.9 billion years ago. Researchers in the Evolutionary Bioinformatics Laboratory at the University of Illinois in collaboration with German scientists have been using bioinformatics techniques to probe the world of proteins for answers to questions about the origins of life. Proteins are formed from chains of amino acids and fold into three-dimensional structures that determine their function. According to crop sciences professor Gustavo Caetano-Anollés, very little is known about the evolutionary drivers for this folding.
To do this, they looked at all known protein structures as defined in the Structural Classification of Proteins (SCOP) database and mined their presence in 989 fully sequenced genomes. In a previous study, researchers in Caetano-Anollés's group used SCOP and genomic information to reconstruct phylogenomic trees that describe the history of the protein world. The current research is based on these types of trees.
Proteins are long-chain molecules built from small units known as amino acids. They are joined together with peptide bonds. They are biochemical compounds consisting of one or more polypeptides folded into a round or fibrous shape.
Proteins are essential to all cells. Like other biological macromolecules (polysaccharides and nucleic acids), proteins take part in virtually every process in cells.
"They are not the standard trees that people see in phylogenetic analysis," he said. "In phylogenetic analysis, usually the tips of the trees, the leaves, are organisms or microbes. In these, they are entire biological systems." In contrast, the leaves of these new trees are protein domains, which are compact evolutionary units of structure and function. Proteins are usually complex combinations of several domains.
"We have a world of about 90,000 of these structures, but they seem to be always producing the same designs," he said. Over the last 10 years, he has been part of the effort to map these designs, or folds, because they are determined by the way the protein chains fold on themselves. To date, approximately 1,300 folds have been characterized.
For the current study, the researchers identified protein sequences in the genomes that had the same folding structure as known proteins. They then used bioinformatics techniques to compare them to each other on a time scale to determine when proteins became part of a particular organism.
This allowed them to map protein structures and organisms onto a timeline. Directly calculating the folding speed for all of these proteins would be impossible with today's technology, so the researchers took advantage of the fact that a protein always folds at the same points and used a measure called Size Modified Contact Order (SMCO). Contact order is the ability of a protein to establish links between segments of the polypeptide chain. When points that are close together on the chain come together, they generally form helical structures; when distant points come together, they form beta strands that interact with each other and form sheets.
Contact order measures how many of the connections are local and how many are distant. Experimental studies have shown that it is correlated with folding speed. The measure is normalized (size modified) to take protein length, which affects folding speed, into account. They saw a peculiar pattern in the results.
"What we see is an hourglass," said Caetano-Anollés. "At the beginning, proteins seem not to be folding so fast. And then, as time progresses, there's a tendency to fold faster and faster. And then it reaches a critical point, and at this point we have a tendency that reverses, that seems to go back again to slow folding."
However, the tendency toward higher speed dominates. This point coincides with what he calls the Big Bang in protein evolution. Approximately 1.5 billion years ago, more complex domain structures and multi-domain proteins emerged with the appearance of multicellular organisms. Amino acid chains, which make up proteins, also became shorter at this point in time.
Why does folding speed matter? "If the protein does not fold, in the vast majority of cases it will not have a function. So folding implies functionality. And speed of folding implies speed of achieving that functionality," he explained.
"For a cell, that's very important, because if proteins are very slow folders, there is a time lag to when that function will be accessible to the cell." Fast folders are also less susceptible to aggregation, or clumping together, so they work faster. Moreover, proteins that fold rapidly are more likely to fold correctly.
"The complexities of the biological functions of molecules are still poorly understood," he said. "If we mix the world of molecular dynamics with the world of molecular evolution, we can then determine what aspects of sequences are important for molecular dynamics, and therefore, we can apply them to genetic engineering, synthetic biology, and so on."
For further information see Origins.
DNA image via Wikipedia.