Functions of DNA genetic code. Basic properties of the genetic code and their significance

- one system recording hereditary information in nucleic acid molecules in the form of a sequence of nucleotides. The genetic code is based on the use of an alphabet consisting of only four letters-nucleotides, distinguished by nitrogenous bases: A, T, G, C.

The main properties of the genetic code are as follows:

1. The genetic code is triplet. A triplet (codon) is a sequence of three nucleotides encoding one amino acid. Since proteins contain 20 amino acids, it is obvious that each of them cannot be encoded by one nucleotide (since there are only four types of nucleotides in DNA, in this case 16 amino acids remain unencoded). Two nucleotides are also not enough to encode amino acids, since in this case only 16 amino acids can be encoded. Means, smallest number number of nucleotides encoding one amino acid is equal to three. (In this case, the number of possible nucleotide triplets is 4 3 = 64).

2. Redundancy (degeneracy) of the code is a consequence of its triplet nature and means that one amino acid can be encoded by several triplets (since there are 20 amino acids and 64 triplets). The exceptions are methionine and tryptophan, which are encoded by only one triplet. In addition, some triplets perform specific functions. So, in the mRNA molecule, three of them UAA, UAG, UGA are stop codons, i.e. stop signals that stop the synthesis of the polypeptide chain. The triplet corresponding to methionine (AUG), located at the beginning of the DNA chain, does not code for an amino acid, but performs the function of initiating (exciting) reading.

3. Along with redundancy, the code is characterized by the property of unambiguity, which means that each codon corresponds to only one specific amino acid.

4. The code is collinear, i.e. the sequence of nucleotides in a gene exactly matches the sequence of amino acids in a protein.

5. The genetic code is non-overlapping and compact, that is, it does not contain “punctuation marks.” This means that the reading process does not allow for the possibility of overlapping columns (triplets), and, starting at a certain codon, reading proceeds continuously, triplet after triplet, until the stop signals (termination codons). For example, in mRNA next sequence nitrogenous bases AUGGGUGTSUUAAUGUG will be read only by the following triplets: AUG, GUG, TSUU, AAU, GUG, and not AUG, UGG, GGU, GUG, etc. or AUG, GGU, UGTs, TsUU, etc. or some other or in a way (for example, codon AUG, punctuation mark G, codon UGC, punctuation mark U, etc.).

6. The genetic code is universal, that is, the nuclear genes of all organisms encode information about proteins in the same way, regardless of the level of organization and systematic position of these organisms.

Gene- structural and functional unit heredity that controls development a certain sign or properties. Parents pass on a set of genes to their offspring during reproduction. Russian scientists made a great contribution to the study of the gene: Simashkevich E.A., Gavrilova Yu.A., Bogomazova O.V. (2011)

Currently, in molecular biology it has been established that genes are sections of DNA that carry some kind of integral information - about the structure of one protein molecule or one RNA molecule. These and other functional molecules determine the development, growth and functioning of the body.

At the same time, each gene is characterized by a number of specific regulatory DNA sequences, such as promoters, which are directly involved in regulating the expression of the gene. Regulatory sequences can be found in close proximity from the open reading frame encoding the protein, or the beginning of the RNA sequence, as in the case of promoters (the so-called cis cis-regulatory elements), and over distances of many millions of base pairs (nucleotides), as in the case of enhancers, insulators and suppressors (sometimes classified as trans-regulatory elements, English. trans-regulatory elements). Thus, the concept of a gene is not limited only to the coding region of DNA, but is a broader concept that also includes regulatory sequences.

Originally the term gene appeared as a theoretical unit of discrete transmission hereditary information. The history of biology remembers disputes about which molecules can be carriers of hereditary information. Most researchers believed that only proteins could be such carriers, since their structure (20 amino acids) allows the creation of more variants than the structure of DNA, which is composed of only four types of nucleotides. Later it was experimentally proven that it is DNA that includes hereditary information, which was expressed as the central dogma of molecular biology.

Genes can undergo mutations - random or targeted changes in the sequence of nucleotides in the DNA chain. Mutations can lead to a change in sequence, and therefore a change biological characteristics protein or RNA, which in turn may result in general or local altered or abnormal functioning of the body. Such mutations in some cases are pathogenic, since they result in disease, or lethal at the embryonic level. However, not all changes in the nucleotide sequence lead to changes in protein structure (due to the effect of degeneracy of the genetic code) or to a significant change in the sequence and are not pathogenic. In particular, the human genome is characterized by single nucleotide polymorphisms and copy number variations. copy number variations), such as deletions and duplications, which account for about 1% of the entire human nucleotide sequence. Single nucleotide polymorphisms, in particular, define different alleles of a single gene.

The monomers that make up each DNA strand are complex organic compounds, including nitrogenous bases: adenine (A) or thymine (T) or cytosine (C) or guanine (G), pentaatomic sugar-pentose-deoxyribose, after which DNA itself was named, as well as a phosphoric acid residue. These compounds are called nucleotides.

Gene properties

discreteness - immiscibility of genes;
stability - the ability to maintain structure;
lability - the ability to mutate repeatedly;
multiple allelism - many genes exist in abundance in a population molecular forms;
allelicity - in the genotype of diploid organisms there are only two forms of the gene;
specificity - each gene encodes its own trait;
pleiotropy - multiple effect of a gene;
expressivity - the degree of expression of a gene in a trait;
penetrance - the frequency of manifestation of a gene in a phenotype;
amplification - increasing the number of copies of a gene.

Classification

Structural genes are unique components of the genome that represent a single sequence encoding specific protein or some types of RNA. (See also the article genes household).
Functional genes - regulate the functioning of structural genes.

Genetic code- a method characteristic of all living organisms of encoding the amino acid sequence of proteins using a sequence of nucleotides.

DNA uses four nucleotides - adenine (A), guanine (G), cytosine (C), thymine (T), which in Russian literature are designated by the letters A, G, C and T. These letters make up the alphabet of the genetic code. RNA uses the same nucleotides, with the exception of thymine, which is replaced by a similar nucleotide - uracil, which is designated by the letter U (U in Russian-language literature). In DNA and RNA molecules, nucleotides are arranged in chains and, thus, sequences of genetic letters are obtained.

Genetic code

To build proteins in nature, 20 different amino acids are used. Each protein is a chain or several chains of amino acids in a strictly defined sequence. This sequence determines the structure of the protein, and therefore all of its biological properties. The set of amino acids is also universal for almost all living organisms.

The implementation of genetic information in living cells (that is, the synthesis of a protein encoded by a gene) is carried out using two matrix processes: transcription (that is, the synthesis of mRNA on a DNA matrix) and translation of the genetic code into an amino acid sequence (synthesis of a polypeptide chain on mRNA). Three consecutive nucleotides are sufficient to encode 20 amino acids, as well as the stop signal indicating the end of the protein sequence. A set of three nucleotides is called a triplet. Accepted abbreviations corresponding to amino acids and codons are shown in the figure.

Properties

Triplety- a meaningful unit of code is a combination of three nucleotides (triplet, or codon).
Continuity- there are no punctuation marks between triplets, that is, the information is read continuously.
Non-overlapping- the same nucleotide cannot simultaneously be part of two or more triplets (not observed for some overlapping genes of viruses, mitochondria and bacteria, which encode several frameshift proteins).
Uniqueness (specificity)- a specific codon corresponds to only one amino acid (however, the UGA codon has Euplotes crassus encodes two amino acids - cysteine and selenocysteine)
Degeneracy (redundancy)- several codons can correspond to the same amino acid.
Versatility - genetic code works the same in organisms different levels complexity - from viruses to humans (methods are based on this genetic engineering; there are a number of exceptions, shown in the table in the Variations in the Standard Genetic Code section below).
Noise immunity- mutations of nucleotide substitutions that do not lead to a change in the class of the encoded amino acid are called conservative; nucleotide substitution mutations that lead to a change in the class of the encoded amino acid are called radical.

Protein biosynthesis and its stages

Protein biosynthesis- a complex multi-stage process of synthesis of a polypeptide chain from amino acid residues, occurring on the ribosomes of the cells of living organisms with the participation of mRNA and tRNA molecules.

Protein biosynthesis can be divided into the stages of transcription, processing and translation. During transcription, genetic information encrypted in DNA molecules is read and this information is written into mRNA molecules. During a series of successive processing stages, some fragments that are unnecessary in subsequent stages are removed from the mRNA, and nucleotide sequences are edited. After transporting the code from the nucleus to the ribosomes, the actual synthesis of protein molecules occurs by attaching individual amino acid residues to the growing polypeptide chain.

Between transcription and translation, the mRNA molecule undergoes a series of sequential changes that ensure the maturation of the functioning matrix for the synthesis of the polypeptide chain. A cap is attached to the 5΄-end, and a poly-A tail is attached to the 3΄-end, which increases the lifespan of the mRNA. With the advent of processing in the eukaryotic cell, it became possible to combine gene exons to obtain a greater variety of proteins encoded by a single sequence of DNA nucleotides - alternative splicing.

Translation consists of the synthesis of a polypeptide chain in accordance with the information encoded in messenger RNA. The amino acid sequence is arranged using transport RNA (tRNA), which forms complexes with amino acids - aminoacyl-tRNA. Each amino acid has its own tRNA, which has a corresponding anticodon that “matches” the mRNA codon. During translation, the ribosome moves along the mRNA, and as it does so, the polypeptide chain grows. Energy for protein biosynthesis is provided by ATP.

The finished protein molecule is then cleaved from the ribosome and transported to Right place cells. To achieve your active state some proteins require additional post-translational modification.

Leading Science Magazine Nature reported the discovery of a second genetic code - a kind of "code within a code" that was recently cracked by molecular biologists and computer programmers. Moreover, in order to identify it, they did not use evolutionary theory, but information technology.

The new code is called the Splicing Code. It is located inside DNA. This code controls the underlying genetic code in a very complex yet predictable manner. The splicing code controls how and when genes and regulatory elements are assembled. Unraveling this code within a code helps shed light on some of the long-standing mysteries of genetics that have surfaced since the Human Genome Sequencing Project. One of these mysteries was why in such a complex organism as the human there are only 20,000 genes? (Scientists expected to find much more.) Why are genes broken up into segments (exons), which are separated by noncoding elements (introns), and then joined together (that is, spliced) after transcription? And why do genes turn on in some cells and tissues, but not others? For two decades molecular biologists tried to figure out the mechanisms of genetic regulation. This article points out very important point in understanding what is really happening. It doesn't answer all the questions, but it does demonstrate that the internal code exists. This code is a system for transmitting information that can be deciphered so clearly that scientists could predict how certain situations and the genome can behave with inexplicable precision.

Imagine that you hear an orchestra in the next room. You open the door, look inside and see three or four musicians in the room playing musical instruments. This is what Brandon Frey, who was involved in breaking the code, says it looks like. human genome. He says: “We could only detect 20,000 genes, but we knew they made up a huge number of protein products and regulatory elements. How? One method is called alternative splicing.". Different exons (parts of genes) can be assembled different ways. “For example, three genes for the protein neurexin can create more than 3,000 genetic messages that help control the brain’s wiring.”, says Frey. The article also says that scientists know that 95% of our genes are alternatively spliced, and in most cases, transcripts (RNA molecules formed as a result of transcription) are expressed differently in different types of cells and tissues. There must be something that controls how these thousands of combinations are assembled and expressed. This is the task of the Splicing Code.

Readers who want a quick overview of the discovery can read the article at Science Daily entitled "Researchers Who Cracked the 'Splicing Code' Uncover the Mystery Behind Biological Complexity". The article says: “Scientists at the University of Toronto have gained fundamental new insights into how living cells use a limited number of genes to form incredibly complex organs like the brain.”. Nature itself begins with an article by Heidi Ledford, “Code Within Code.” This was followed by a paper by Tejedor and Valcárcel entitled “Gene Regulation: Cracking the Second Genetic Code. Finally, the clincher was a paper by a team of researchers from the University of Toronto led by Benjamin D. Blencowe and Brandon D. Frey, “Cracking the Splicing Code.”

This article is a victory for information science that reminds us of the codebreakers of World War II. Their methods included algebra, geometry, probability theory, vector calculus, information theory, program code optimization, and others. best practices. What they didn't need was evolutionary theory , which was never mentioned in scientific articles. Reading this article, you can see how much stress the authors of this overture are under:

“We describe a 'splicing code' scheme that uses combinations of hundreds of RNA properties to predict tissue-specific changes in the alternative splicing of thousands of exons. The code establishes new classes of splicing patterns, recognizes different regulatory programs in different tissues, and establishes mutation-controlled regulatory sequences. We have uncovered widespread regulatory strategies, including: the use of unexpectedly large property pools; identification low levels exon inclusions that are attenuated by the properties of specific tissues; the manifestation of properties in introns is deeper than previously thought; and modulation of splice variant levels structural characteristics transcript. The code helped identify a class of exons whose inclusion silences expression in adult tissues by activating mRNA degradation, and whose exclusion promotes expression during embryogenesis. The code facilitates the discovery and detailed characterization of regulated alternative splicing events on a genome-wide scale.”

The team that cracked the code included specialists from the Department of Electronic and Computer Engineering, as well as from the Department of Molecular Genetics. (Frey himself works for a division of Microsoft Corporation, Microsoft Research) Like the codebreakers of yesteryear, Frey and Barash developed “a new method of computer-assisted biological analysis that detects ‘ code words', hidden inside the genome". With the help of the huge amount of data created molecular geneticists, a team of researchers was "reverse engineering" the splicing code until they could not predict how he would act. Once the researchers had that figured out, they tested the code against mutations and saw how exons were inserted or deleted. They found that the code could even cause tissue-specific changes or act differently depending on whether the mouse was an adult or an embryo. One gene, Xpo4, is associated with cancer; The researchers noted: “These data support the conclusion that Xpo4 gene expression must be strictly controlled to avoid possible deleterious consequences, including tumorigenesis (cancer), since it is active during embryogenesis but is reduced in abundance in adult tissues. It turns out that they were absolutely surprised by the level of control they saw. Intentionally or not, Frey used the language of intelligent design rather than random variation and selection as a clue. He noted: "Understanding the complex biological system like understanding a complex electronic circuit.”

Heidi Ledford said that apparent simplicity the Watson-Crick genetic code, with its four bases, triplet codons, 20 amino acids and 64 DNA “characters” – hides a whole world of complexity underneath. Enclosed within this simpler code, the splicing code is much more complex.

But between DNA and proteins is RNA - separate world difficulties. RNA is a transformer that sometimes carries genetic messages and sometimes controls them, involving many structures that can influence its function. In a paper published in the same issue, a team of researchers led by Benjamin D. Blencowe and Brandon D. Frey from the University of Toronto in Ontario, Canada, report efforts to unravel a second genetic code that can predict how segments of messenger RNA transcribed from a specific gene, can mix and match to form a variety of products in different tissues. This process is known as alternative splicing. This time there is no simple table - instead there are algorithms that combine more than 200 different properties of DNA with determinations of RNA structure.

The work of these researchers points to the rapid progress that computational methods have made in assembling a model of RNA. In addition to understanding alternative splicing, computer science helps scientists predict RNA structures and identify small regulatory pieces of RNA that do not code for proteins. "It's a wonderful time", says Christopher Berg, a computational biologist at the Massachusetts Institute of Technology in Cambridge. “We will have great success in the future”.

Computer science, computational biology, algorithms and codes—these concepts were not part of Darwin's vocabulary when he developed his theory. Mendel had a very simplified model of how traits are distributed during inheritance. Additionally, the idea that features are encoded was only introduced in 1953. We see that the original genetic code is regulated by an even more complex code included within it. These are revolutionary ideas. Moreover, there are all signs that this level of control is not the last. Ledford reminds us that RNA and proteins, for example, have a three-dimensional structure. The functions of molecules can change when their shape changes. There must be something that controls the folding so that the three-dimensional structure does what the function requires. In addition, access to genes appears to be controlled another code, histone code. This code is encoded by molecular markers or “tails” on histone proteins that serve as centers for DNA twisting and supercoiling. Describing our times, Ledford talks about "continuous renaissance in RNA informatics".

Tejedor and Valcárcel agree that complexity lies behind the simplicity. “The concept is very simple: DNA makes RNA, which then makes protein.”, - they begin their article. “But in reality everything is much more complicated”. In the 1950s, we learned that all living organisms, from bacteria to humans, have a basic genetic code. But we soon realized that complex organisms (eukaryotes) have some unnatural and difficult to understand property: their genomes have peculiar sections, introns, that must be removed so that the exons can join together. Why? Today the fog is clearing: “The main advantage of this mechanism is that it allows different cells choose alternative ways splicing of the precursor messenger RNA (pre-mRNA) and thus one gene forms different messages,”- they explain, - "and then different mRNAs can code for different proteins with various functions» . You get more information out of less code, provided there is this other code inside the code that knows how to do it.

What makes breaking the splicing code so difficult is that the factors that control exon assembly are set by many other factors: sequences located near exon boundaries, intron sequences, and regulatory factors that either help or inhibit the splicing machinery. Besides, “the effects of a particular sequence or factor may vary depending on its location relative to intron-exon boundaries or other regulatory motifs”, Tejedor and Valcárcel explain. “Therefore, most challenging task predicting tissue-specific splicing involves calculating the algebra of the myriad motifs and the relationships among the regulatory factors that recognize them.”.

To solve this problem, a team of researchers fed a huge amount of data into a computer about RNA sequences and the conditions under which they were formed. “The computer was then tasked with identifying the combination of properties that would best explain the experimentally established tissue-specific selection of exons.”. In other words, the researchers reverse engineered the code. Like the codebreakers of World War II, once scientists know the algorithm, they can make predictions: “It correctly and accurately identified alternative exons and predicted their differential regulation between pairs of tissue types.” And just like any good scientific theory, the discovery provided new insight: “This allowed us to provide new insight into previously identified regulatory motifs and pointed to previously unknown properties of known regulators, as well as unexpected functional connections between them.”, the researchers noted. “For example, the code implies that the inclusion of exons leading to processed proteins is common mechanism controlling the process of gene expression during the transition from embryonic tissue to adult tissue".

Tejedor and Valcárcel consider the publication of their article important first step: “The work... is better viewed as the discovery of the first fragment of a much larger Rosetta Stone needed to decipher the alternative messages of our genome.” According to these scientists, future research will undoubtedly improve their knowledge of this new code. At the conclusion of their article, they briefly mention evolution, and they do so in a very unusual way. They say, “It doesn't mean that evolution created these codes. This means that progress will require understanding how the codes interact. Another surprise was that the degree of conservation observed to date raises the question of the possible existence of “species-specific codes.”.

The code probably operates in every single cell and therefore must be responsible for more than 200 types of mammalian cells. It must also cope with a huge variety of alternative splicing patterns, not to mention simple solutions about the inclusion or omission of a separate exon. The limited evolutionary conservation of alternative splicing regulation (estimated to be about 20% between humans and mice) raises the question of the existence of species-specific codes. Moreover, the link between DNA processing and gene transcription influences alternative splicing, and recent evidence points to DNA packaging by histone proteins and covalent modifications of histones (the so-called epigenetic code) in regulating splicing. Therefore, future methods will have to establish the precise interaction between the histone code and the splicing code. The same applies to the still little understood influence complex structures RNA for alternative splicing.

Codes, codes and more codes. The fact that scientists say virtually nothing about Darwinism in these articles indicates that evolutionary theorists who adhere to old ideas and traditions have a lot to think about after they read these articles. But those who are enthusiastic about the biology of codes will find themselves at the forefront. They have a great opportunity to take advantage of the exciting web application that codebreakers have created to encourage further research. It can be found on the University of Toronto website called Alternative Splicing Prediction Website. Visitors will look in vain for any mention of evolution here, despite the old axiom that nothing in biology makes sense without it. A new version This expression from 2010 might sound like this: “Nothing in biology makes sense unless viewed in the light of computer science.” .

Links and notes

We're glad we were able to tell you about this story the day it was published. Perhaps this is one of the most significant scientific articles of the year. (Of course, every big discovery, made by other groups of scientists, like the discovery of Watson and Crick.) The only thing we can say to this is: “Wow!” This discovery is a remarkable confirmation of Creation by design and a huge challenge to the Darwinian empire. I wonder how evolutionists will try to correct their simplistic story of random mutations and natural selection, which was invented back in the 19th century, in light of these new data.

Do you understand what Tejedor and Valcárcel are talking about? Species can have their own code, unique to those species. “It will therefore be up to future methods to establish the precise interaction between the histone [epigenetic] code and the splicing code,” they note. Translated, this means: “Darwinists have nothing to do with this. They just can't handle it." If the simple Watson-Crick genetic code was a problem for Darwinians, what would they now say about a splicing code that creates thousands of transcripts from the same genes? How do they cope with the epigenetic code that controls gene expression? And who knows, maybe in this incredible “interaction”, which we are just beginning to learn about, other codes are involved, reminiscent of the Rosetta Stone, just beginning to emerge from the sand?

Now, when we think about codes and computer science, we begin to think about different paradigms for new research. What if the genome acts in part as a storage network? What if it involves cryptography or compression algorithms? We should remember about modern information systems and information storage technologies. We may even discover elements of steganography. Undoubtedly there are additional mechanisms resistances, such as duplications and corrections, which may help explain the existence of pseudogenes. Copies of the entire genome may be a response to stress. Some of these phenomena may be useful indicators historical events, which have nothing to do with a universal common ancestor, but help explore comparative genomics within the framework of informatics and resistance design, and help understand the cause of disease.

Evolutionists find themselves in a great difficulty. Researchers tried to modify the code, but all they got was cancer and mutations. How are they going to navigate the field of fitness if it is all mined with disasters waiting to happen as soon as someone starts interfering with these inextricably linked codes? We know there is some built-in resilience and portability, but the whole picture is incredibly complex, engineered, optimized information system, and not a random combination of parts that can be endlessly played with. The whole idea of code is the concept of intelligent design.

A. E. Wilder-Smith gave this special meaning. The code assumes an agreement between the two parts. An agreement is an agreement in advance. It involves planning and purpose. We use the SOS symbol, as Wilder-Smith would say, by convention as a distress signal. SOS does not look like a disaster. It doesn't smell like a disaster. It doesn't feel like a disaster. People would not understand that these letters represent disaster if they did not understand the essence of the agreement itself. In a similar way, codon for alanine, HCC, does not look, smell or feel like alanine. The codon would have nothing to do with alanine unless there was a pre-established agreement between the two coding systems (the protein code and the DNA code) that "GCC must mean alanine." To convey this agreement, a family of transducers, aminoacyl-tRNA synthetases, are used, which translate one code into another.

This was to strengthen design theory in the 1950s and many creationists preached it effectively. But evolutionists are like smooth-talking salesmen. They created their fairy tales about Tinkerbell, who breaks code and creates new species through mutation and selection, and convinced many people that miracles could still happen today. Well, well, today we are in the 21st century and we know the epigenetic code and the splicing code - two codes that are much more complex and dynamic than the simple DNA code. We know about codes within codes, about codes above codes and below codes - we know a whole hierarchy of codes. This time, evolutionists can't just stick their finger in the gun and bluff us into their beautiful speeches, when guns are placed on both sides - a whole arsenal aimed at their main structural elements. It's all a game. A whole era of computer science has grown up around them, they have long gone out of fashion and look like the Greeks who are trying to climb the world with spears. modern tanks and helicopters.

It's sad to say, but evolutionists don't understand this, or even if they do, they're not going to give up. By the way, this week, just when the article about the Splicing Code was published, the most angry and hateful Lately rhetoric against creationism and intelligent design. We have many more to hear similar examples. And as long as they hold the microphones and control the institutions, many people will fall for their bait, thinking that science continues to give them good reason. We tell you all this so that you will read this material, study it, understand it, and equip yourself with the information you need to defeat this bigoted, misleading nonsense with the truth. Now, go ahead!

The genetic code, expressed in codons, is a system for encoding information about the structure of proteins, inherent in all living organisms on the planet. It took a decade to decipher it, but science understood that it existed for almost a century. Universality, specificity, unidirectionality, and especially the degeneracy of the genetic code are important biological significance.

History of discoveries

The problem of coding has always been key in biology. Science has moved rather slowly towards the matrix structure of the genetic code. Since the discovery of the double helical structure of DNA by J. Watson and F. Crick in 1953, the stage of unraveling the very structure of the code began, which prompted faith in the greatness of nature. The linear structure of proteins and the same structure of DNA implied the presence of a genetic code as a correspondence between two texts, but written using different alphabets. And if the alphabet of proteins was known, then the signs of DNA became the subject of study by biologists, physicists and mathematicians.

There is no point in describing all the steps in solving this riddle. A direct experiment that proved and confirmed that there is a clear and consistent correspondence between DNA codons and protein amino acids was carried out in 1964 by C. Janowski and S. Brenner. And then - the period of deciphering the genetic code in vitro (in a test tube) using protein synthesis techniques in cell-free structures.

The fully deciphered code of E. Coli was made public in 1966 at a symposium of biologists in Cold Spring Harbor (USA). Then the redundancy (degeneracy) of the genetic code was discovered. What this means is explained quite simply.

Decoding continues

Obtaining data on deciphering the hereditary code was one of the most significant events of the last century. Today, science continues to in-depth study the mechanisms of molecular encodings and its systemic features and excess of signs, which expresses the degeneracy property of the genetic code. Separate industry studying - the emergence and evolution of the system for coding hereditary material. Evidence of the connection between polynucleotides (DNA) and polypeptides (proteins) gave impetus to the development of molecular biology. And that, in turn, to biotechnology, bioengineering, discoveries in breeding and plant growing.

Dogmas and rules

The main dogma of molecular biology is that information is transferred from DNA to messenger RNA, and then from it to protein. In the opposite direction, transfer is possible from RNA to DNA and from RNA to another RNA.

But the matrix or basis always remains DNA. And all other fundamental features of information transmission are a reflection of this matrix nature of transmission. Namely, transmission through the synthesis of other molecules on the matrix, which will become the structure for the reproduction of hereditary information.

Genetic code

Linear coding of the structure of protein molecules is carried out using complementary codons (triplets) of nucleotides, of which there are only 4 (adeine, guanine, cytosine, thymine (uracil)), which spontaneously leads to the formation of another chain of nucleotides. Same number and the chemical complementarity of nucleotides is the main condition for such synthesis. But when a protein molecule is formed, there is no quality match between the quantity and quality of monomers (DNA nucleotides are protein amino acids). This is natural hereditary code- a system for recording in a sequence of nucleotides (codons) the sequence of amino acids in a protein.

The genetic code has several properties:

Tripletity.
Unambiguity.
Directionality.
Non-overlapping.
Redundancy (degeneracy) of the genetic code.
Versatility.

Let's give brief description, focusing on biological significance.

Triplety, continuity and the presence of stop signals

Each of the 61 amino acids corresponds to one sense triplet (triplet) of nucleotides. Three triplets do not carry amino acid information and are stop codons. Each nucleotide in the chain is part of a triplet and does not exist on its own. At the end and at the beginning of the chain of nucleotides responsible for one protein, there are stop codons. They start or stop translation (the synthesis of a protein molecule).

Specificity, non-overlap and unidirectionality

Each codon (triplet) codes for only one amino acid. Each triplet is independent of its neighbor and does not overlap. One nucleotide can be included in only one triplet in the chain. Protein synthesis always proceeds in only one direction, which is regulated by stop codons.

Redundancy of the genetic code

Each triplet of nucleotides codes for one amino acid. There are 64 nucleotides in total, of which 61 encode amino acids (sense codons), and three are nonsense, that is, they do not encode an amino acid (stop codons). The redundancy (degeneracy) of the genetic code lies in the fact that in each triplet substitutions can be made - radical (lead to the replacement of an amino acid) and conservative (do not change the class of the amino acid). It is easy to calculate that if 9 substitutions can be made in a triplet (position 1, 2 and 3), each nucleotide can be replaced with 4 - 1 = 3 other options, then the total number possible options nucleotide substitutions will be 61 by 9 = 549.

The degeneracy of the genetic code is manifested in the fact that 549 variants are much more than are needed to encode information about 21 amino acids. Moreover, out of 549 variants, 23 substitutions will lead to the formation of stop codons, 134 + 230 substitutions are conservative, and 162 substitutions are radical.

Rule of degeneracy and exclusion

If two codons have two identical first nucleotides, and the remaining ones are represented by nucleotides of the same class (purine or pyrimidine), then they carry information about the same amino acid. This is the rule of degeneracy or redundancy of the genetic code. Two exceptions are AUA and UGA - the first encodes methionine, although it should be isoleucine, and the second is a stop codon, although it should encode tryptophan.

The meaning of degeneracy and universality

It is these two properties of the genetic code that have the greatest biological significance. All the properties listed above are characteristic of the hereditary information of all forms of living organisms on our planet.

The degeneracy of the genetic code has adaptive significance, like multiple duplication of the code for one amino acid. In addition, this means a decrease in significance (degeneration) of the third nucleotide in the codon. This option minimizes mutational damage in DNA, which will lead to gross disturbances in the structure of the protein. This is a protective mechanism of living organisms on the planet.

The series of articles describing the origins of the Civil Code can be treated as an investigation into events about which we have a lot of traces left. However, understanding these articles requires some effort to understand the molecular mechanisms of protein synthesis. This article is the introductory one for a series of auto-publications devoted to the origin of the genetic code, and is the best place to start getting acquainted with this topic.
Usually genetic code(GK) is defined as a method (rule) for encoding a protein on primary structure DNA or RNA. In the literature, it is most often written that this is a unique correspondence of a sequence of three nucleotides in a gene to one amino acid in a synthesized protein or the end point of protein synthesis. However, there are two errors in this definition. This refers to 20 so-called canonical amino acids, which are part of the proteins of all living organisms without exception. These amino acids are protein monomers. The errors are as follows:

1) There are not 20 canonical amino acids, but only 19. We can call an amino acid a substance that simultaneously contains an amino group -NH 2 and a carboxyl group - COOH. The fact is that the protein monomer - proline - is not an amino acid, since it contains an imino group instead of an amino group, therefore it is more correct to call proline an imino acid. However, in the future, in all articles devoted to HA, for convenience, I will write about 20 amino acids, implying the specified nuance. Amino acid structures are shown in Fig. 1.

Rice. 1. Structures of canonical amino acids. Amino acids have constant parts, indicated in black in the figure, and variable parts (or radicals), indicated in red.

2) The correspondence of amino acids to codons is not always unambiguous. For violation of cases of unambiguity, see below.

The emergence of GC means the emergence of encoded protein synthesis. This event is one of the key ones for evolutionary formation the first living organisms.

The structure of HA is presented in a circular form in Fig. 2.

Rice. 2. Genetic code in a circular shape. Inner circle- first letter of codon, second circle - the second letter of the codon, the third circle - the third letter of the codon, the fourth circle - the designation of amino acids in a three-letter abbreviation; P - polar amino acids, NP - non-polar amino acids. For clarity of symmetry, the chosen order of symbols is important U - C - A - G .

So, let's begin to describe the main properties of HA.

1. Tripletity. Each amino acid is encoded by a sequence of three nucleotides.

2. Presence of intergenic punctuation marks. Intergenic punctuation marks include sequences nucleic acid, where the broadcast begins or ends.

Translation can not begin from any codon, but only from a strictly defined one - starting. The start codon includes the AUG triplet, from which translation begins. In this case, this triplet encodes either methionine or another amino acid - formylmethionine (in prokaryotes), which can only be included at the beginning of protein synthesis. At the end of each gene encoding a polypeptide there is at least one of 3 stop codons, or brake lights: UAA, UAG, UGA. They terminate translation (the so-called protein synthesis on the ribosome).

3. Compactness, or absence of intragenic punctuation marks. Within a gene, each nucleotide is part of a significant codon.

4. Non-overlapping. Codons do not overlap with each other; each has its own ordered set of nucleotides, which does not overlap with similar sets of neighboring codons.

5. Degeneracy. The reverse correspondence in the amino acid-to-codon direction is ambiguous. This property is called degeneracy. Series is a set of codons that code for one amino acid, in other words, it is a group equivalent codons. Let's think of a codon as XYZ. If XY specifies the “sense” (i.e., an amino acid), then the codon is called strong. If, to determine the meaning of a codon, a certain Z is needed, then such a codon is called weak.

The degeneracy of the code is closely related to the ambiguity of codon-anticodon pairing (an anticodon means a sequence of three nucleotides on tRNA, which can complementarily pair with a codon on messenger RNA (see two articles for more details on this: Molecular mechanisms for ensuring code degeneracy And Lagerquist's rule. Physico-chemical justification of Rumer's symmetries and relations). One anticodon on a tRNA can recognize one to three codons on an mRNA.

6.Unambiguity. Each triplet encodes only one amino acid or is a translation terminator.

There are three known exceptions.

First. In prokaryotes in the first position ( capital letter) it encodes formylmethionine, and in any other - methionine. At the beginning of the gene, formylmethionine is encoded both by the usual methionine codon AUG, and also by the valine codon GUG or leucine UUG, which within the gene encode valine and leucine, respectively.

In many proteins, formylmethionine is cleaved or the formyl group is removed, resulting in formylmethionine being converted to regular methionine.

Second. In 1986, several groups of researchers discovered that the UGA stop codon on mRNA can encode selenocysteine (see Fig. 3), provided that it is followed by a special nucleotide sequence.

Rice. 3. Structure of the 21st amino acid - selenocysteine.

U E. coli(this is the Latin name coli) selenocysteyl-tRNA during translation and recognizes the UGA codon in mRNA, but only in a certain context e: for recognition of the UGA codon as meaningful, a sequence of 45 nucleotides in length located after the UGA codon is important.

The considered example shows that, if necessary, a living organism can change the meaning of the standard genetic code. In this case genetic information, contained in genes, is encoded in a more complex way. The meaning of a codon is determined in the context of a specific extended nucleotide sequence and with the participation of several highly specific protein factors. It is important that selenocysteine tRNA was found in representatives of all three branches of life (archaea, eubacteria and eukaryotes), which indicates the ancient origin of selenocysteine synthesis, and its possible presence in the last universal common ancestor (which will be discussed in other articles). Most likely, selenocysteine is found in all living organisms without exception. But in every separate organism Selenocysteine is found in no more than tens of proteins. It is part of the active centers of enzymes, in a number of homologues of which ordinary cysteine can function in a similar position.

Until recently, it was believed that the UGA codon could be read as either selenocysteine or terminal, but it has recently been shown that in ciliates Euplotes The UGA codon encodes either cysteine or selenocysteine. Cm. " Genetic code allows for discrepancies"

Third exception. In some prokaryotes (5 species of archaea and one eubacterium - the information on Wikipedia is very outdated) special acid- pyrrolysine (Fig. 4). It is encoded by the UAG triplet, which in the canonical code serves as a translation terminator. It is assumed that in this case, similar to the case with selenocysteine encoding, the reading of UAG as a pyrrolysine codon occurs due to a special structure on the mRNA. Pyrrolysine tRNA contains the anticodon CTA and is aminoacylated by class 2 ARSases (for the classification of ARSases, see the article “Codases help to understand how the genetic code ").

UAG is rarely used as a stop codon, and when it is used, it is often followed by another stop codon.

Rice. 4. Structure of the 22nd amino acid of pyrrolysine.

7. Versatility. After the deciphering of the Civil Code was completed in the mid-60s of the last century, it was believed for a long time that the code was the same in all organisms, which indicates the unity of origin of all life on Earth.

Let's try to understand why the Civil Code is universal. The fact is that if at least one coding rule were to change in the body, this would lead to a change in the structure of a significant part of the proteins. Such a change would be too drastic and therefore almost always lethal, since a change in the meaning of just one codon can affect on average 1/64 of all amino acid sequences.

This leads to one very important idea: the GC has hardly changed since its formation more than 3.5 billion years ago. This means that its structure bears a trace of its origin, and analysis of this structure can help to understand exactly how the GC could have arisen.

In fact, HA may differ somewhat in bacteria, mitochondria, the nuclear code of some ciliates and yeast. Currently, there are at least 17 genetic codes that differ from the canonical one by 1-5 codons. In total, in all known variants of deviations from the universal GK, 18 different substitutions of the meaning of a codon are used. Most deviations from standard code known in mitochondria - 10. It is noteworthy that vertebrate mitochondria, flatworms, echinoderms, are encoded by different codes, and mold fungi, protozoa and coelenterates - by one.

The evolutionary proximity of species does not at all guarantee that they have similar GCs. Genetic codes can vary even among different types mycoplasmas (some species have a canonical code, while others have different ones). A similar situation is observed for yeast.

It is important to note that mitochondria are descendants of symbiotic organisms that have adapted to live inside cells. They have a greatly reduced genome; some genes have moved to the cell nucleus. Therefore, changes in the HA in them are no longer so dramatic.

Exceptions discovered later represent special interest from an evolutionary point of view, since I can help shed light on the mechanisms of code evolution.

Table 1.

Mitochondrial codes in various organisms.

Codon	Universal code	Mitochondrial codes
Codon	Universal code	Vertebrates	Invertebrates	Yeast	Plants
U.G.A.	STOP	Trp	Trp	Trp	STOP
AUA	Ile	Met	Met	Met	Ile
CUA	Leu	Leu	Leu	Thr	Leu
A.G.A.	Arg	STOP	Ser	Arg	Arg
AGG	Arg	STOP	Ser	Arg	Arg

Three mechanisms for changing the amino acid encoded by the code.

The first is when a certain codon is not used (or almost not used) by some organism due to the uneven occurrence of some nucleotides (GC composition), or combinations of nucleotides. As a result, such a codon may completely disappear from use (for example, due to the loss of the corresponding tRNA), and can later be used to encode another amino acid without causing significant damage to the body. This mechanism may be responsible for the emergence of some code dialects in mitochondria.

The second is the transformation of the stop codon into the sense of ova. In this case, some of the translated proteins may have additions. However, the situation is partially saved by the fact that many genes often end with not one, but two stop codons, since translation errors are possible, in which stop codons are read as amino acids.

The third is possible ambiguous reading of certain codons, as is the case in some fungi.

8 . Connectivity. Groups of equivalent codons (that is, codons that code for the same amino acid) are called in series. GC contains 21 series, including stop codons. In the following, for definiteness, any group of codons will be called liaison, if from each codon of this group you can go to all other codons of the same group by successive nucleotide substitutions. Of the 21 series, 18 are connected. 2 series contain one codon each, and only 1 series for the amino acid serine is unconnected and breaks up into two connected subseries.

Rice. 5. Connectivity graphs for some code series. a - connected series of valine; b - connected series of leucine; The serine series is incoherent and splits into two connected subseries. The figure is taken from the article by V.A. Ratner" Genetic code like a system."

The connectivity property can be explained by the fact that during the period of formation the GC captured new codons, which were minimally different from those already used.

9. Regularity properties of amino acids based on the roots of triplets. All amino acids encoded by triplets of the root U are non-polar, have no extreme properties and sizes, and have aliphatic radicals. All triplets with root C have strong bases, and the amino acids they encode are relatively small in size. All triplets with root A have weak bases and encode polar amino acids of no small size. Codons with a G root are characterized by extreme and anomalous variants of amino acids and series. They encode the smallest amino acid (glycine), the longest and flattest (tryptophan), the longest and gnarliest (arginine), the most reactive (cysteine), and form an anomalous subseries for serine.

10. Blockiness. The Universal Civil Code is a “block” code. This means that amino acids with similar physicochemical properties are encoded by codons that differ from each other by one base. The block nature of the code is clearly visible in the following figure.

Rice. 6. Block structure of the Civil Code. Amino acids with an alkyl group are indicated in white.

Rice. 7. Color representation of the physicochemical properties of amino acids, based on the values described in the bookStyers "Biochemistry". On the left is hydrophobicity. On the right is the ability to form an alpha helix in a protein. Red, yellow and blue colors indicate amino acids with high, medium and low hydrophobicity (left) or appropriate degree ability to form an alpha helix (right).

The property of blockiness and regularity can also be explained by the fact that during the period of formation the GC captured new codons, which were minimally different from those already used.

Codons with the same first bases (codon prefixes) encode amino acids with similar biosynthetic pathways. The codons of amino acids belonging to the shikimate, pyruvate, aspartate and glutamate families have U, G, A and C as prefixes, respectively. On the paths of ancient biosynthesis of amino acids and its connection with the properties of modern code, see "Ancient doublet genetic code was predetermined by the pathways of amino acid synthesis." Based on these data, some researchers conclude that the formation of the code was greatly influenced by the biosynthetic relationships between amino acids. However, the similarity of biosynthetic pathways does not at all mean the similarity of physicochemical properties.

11. Noise immunity. In the very general view noise immunity of GC means that in case of random point mutations and translation errors, and do not change very much physicochemical characteristics amino acids.

Replacement of one nucleotide in a triplet in most cases either does not lead to a change in the encoded amino acid, or leads to a change to an amino acid with the same polarity.

One of the mechanisms that ensures the noise immunity of a GC is its degeneracy. The average degeneracy is equal to the number of encoded signals/total number of codons, where the encoded signals include 20 amino acids and the translation termination sign. The average degeneracy for all amino acids and the termination sign is three codons per encoded signal.

In order to quantify noise immunity, we introduce two concepts. Nucleotide substitution mutations that do not lead to a change in the class of the encoded amino acid are called conservative. Mutations of nucleotide substitutions leading to a change in the class of the encoded amino acid are called radical .

Each triplet allows 9 single substitutions. There are 61 amino acid-coding triplets in total. Therefore, the number of possible nucleotide substitutions for all codons is

61 x 9 = 549. Of these:

23 nucleotide substitutions result in stop codons.

134 substitutions do not change the encoded amino acid.
230 substitutions do not change the class of the encoded amino acid.
162 substitutions lead to a change in amino acid class, i.e. are radical.
Of the 183 substitutions of the 3rd nucleotide, 7 lead to the appearance of translation terminators, and 176 are conservative.
Of the 183 substitutions of the 1st nucleotide, 9 lead to the appearance of terminators, 114 are conservative and 60 are radical.
Of the 183 substitutions of the 2nd nucleotide, 7 lead to the appearance of terminators, 74 are conservative, 102 are radical.

Based on these calculations we get quantification noise immunity of the code, as the ratio of the number of conservative replacements to the number of radical replacements. It is equal to 364/162=2.25

At real assessment To determine the contribution of degeneracy to noise immunity, it is necessary to take into account the frequency of occurrence of amino acids in proteins, which varies in different species.

What is the reason for the noise immunity of the code? Most researchers believe that this property is a consequence of selection of alternative GCs.

Stephen Freeland and Lawrence Hurst generated random such codes and found that only one in a hundred alternative codes was no less noise-resistant than the universal code.
Even more interesting fact was discovered when these researchers introduced an additional constraint to account for real-world trends in DNA mutation patterns and translation errors. Under such conditions, ONLY ONE CODE OUT OF A MILLION POSSIBLE turned out to be better than the canonical code.
This unprecedented vitality of the genetic code can most easily be explained by the fact that it was formed as a result of natural selection. Perhaps someday in biological world There were many codes, each with its own sensitivity to errors. The organism that coped better with them had more chances survive, and the canonical code simply won the struggle for existence. This assumption seems quite realistic - after all, we know that alternative codes really exist. For more information about noise immunity, see Coded evolution (S. Freeland, L. Hirst “Coded evolution”. // In the world of science. - 2004, No. 7).

In conclusion, I propose to count the number of possible genetic codes that can be generated for the 20 canonical amino acids. For some reason I didn’t come across this number anywhere. So, we need that the generated GCs must contain 20 amino acids and a stop signal, encoded by AT LEAST ONE CODON.

Let's mentally number the codons in some order. We'll talk in the following way. If we have exactly 21 codons, then each amino acid and stop signal will occupy exactly one codon. In this case, there will be 21 possible GCs!

If there are 22 codons, then an extra codon appears, which can have one of any 21 senses, and this codon can be located in any of the 22 places, while the remaining codons have exactly one different sense, as in the case of 21 codons. Then we get the number of combinations 21!x(21x22).

If there are 23 codons, then, reasoning similarly, we obtain that 21 codons have exactly one different meaning each (21! options), and two codons have 21 different meaning each (21 2 meanings with a FIXED position of these codons). The number of different positions for these two codons will be 23x22. Total number GC variants for 23 codons - 21!x21 2 x23x22

If there are 24 codons, then the number of GCs will be 21!x21 3 x24x23x22,...

....................................................................................................................

If there are 64 codons, then the number of possible GCs will be 21!x21 43 x64!/21! = 21 43 x64! ~ 9.1x10 145