What is a code in biology. The concept of a gene, genetic code

The substances responsible for storing and transmitting genetic information are nucleic acids (DNA and RNA).

All functions of cells and the body as a whole are determined a set of proteins providing

  • formation of cellular structures,
  • synthesis of all other substances (carbohydrates, fats, nucleic acids),
  • the course of life processes.

The genome contains information about the sequence of amino acids in all proteins in the body. This information is called genetic information .

Due to gene regulation, the time of protein synthesis, their quantity, and location in the cell or in the body as a whole are regulated. Regulatory sections of DNA are largely responsible for this, enhancing and weakening gene expression in response to certain signals.

Information about a protein can be recorded in nucleic acid in only one way - in the form of a sequence of nucleotides. DNA is built from 4 types of nucleotides (A, T, G, C), and proteins are made from 20 types of amino acids. Thus, the problem arises of translating the four-letter record of information in DNA into the twenty-letter record of proteins. The relations on the basis of which such a translation is carried out are called genetic code.

The outstanding physicist was the first to theoretically consider the problem of the genetic code Georgy Gamov. The genetic code has a certain set of properties, which will be discussed below.

Why is a genetic code necessary?

Earlier we said that all reactions in living organisms are carried out under the action of enzymes, and it is the ability of enzymes to couple reactions that allows cells to synthesize biopolymers using the energy of ATP hydrolysis. In the case of simple linear homopolymers, that is, polymers consisting of identical units, one enzyme is sufficient for such synthesis. To synthesize a polymer consisting of two alternating monomers, two enzymes are needed, three - three, etc. If the polymer is branched, additional enzymes are needed to form bonds at the branching points. Thus, in the synthesis of some complex polymers, more than ten enzymes are involved, each of which is responsible for the addition of a specific monomer in a specific place and with a specific bond.

However, when synthesizing irregular heteropolymers (that is, polymers without repeating regions) with a unique structure, such as proteins and nucleic acids, such an approach is in principle impossible. The enzyme can attach a specific amino acid, but cannot determine where in the polypeptide chain it should be placed. This is the main problem of protein biosynthesis, the solution of which is impossible using a conventional enzymatic apparatus. An additional mechanism is needed that uses some source of information about the order of amino acids in the chain.

To solve this problem Koltsov offered matrix mechanism of protein synthesis. He believed that a protein molecule is the basis, a matrix for the synthesis of the same molecules, i.e., opposite each amino acid residue in the polypeptide chain the same amino acid is placed in the new molecule being synthesized. This hypothesis reflected the level of knowledge of that era, when all functions of living things were associated with certain proteins.

However, it later became clear that the substance that stores genetic information is nucleic acids.

PROPERTIES OF THE GENETIC CODE

COLINEARITY (linearity)

First, we'll look at how the nucleotide sequence records the sequence of amino acids in proteins. It is logical to assume that since the sequences of nucleotides and amino acids are linear, there is a linear correspondence between them, i.e., adjacent nucleotides in DNA correspond to adjacent amino acids in the polypeptide. This is also indicated by the linear nature of genetic maps. Proof of such a linear correspondence, or collinearity, is the coincidence of the linear arrangement of mutations on the genetic map and amino acid substitutions in the proteins of mutant organisms.

triplicity

When considering the properties of a code, the question that comes up least often is the code number. It is necessary to encode 20 amino acids with four nucleotides. Obviously, 1 nucleotide cannot encode 1 amino acid, since then it would be possible to encode only 4 amino acids. In order to encode 20 amino acids, combinations of several nucleotides are needed. If we take combinations of two nucleotides, we get 16 different combinations ($4^2$ = 16). This is not enough. There will already be 64 combinations of three nucleotides ($4 ^3 $ = 64), i.e. even more than needed. It is clear that combinations of a larger number of nucleotides could also be used, but for reasons of simplicity and economy they are unlikely, i.e. the code is triplet.

degeneracy and uniqueness

In the case of 64 combinations, the question arises whether all combinations encode amino acids or whether each amino acid corresponds to only one triplet of nucleotides. In the second case, most of the triplets would be meaningless, and nucleotide substitutions as a result of mutations would lead to protein loss in two thirds of cases. This is not consistent with the observed frequencies of protein loss due to mutations, which indicates the use of all or almost all triplets. Later it was found that there are three triplets, not coding for amino acids. They serve to mark the end of a polypeptide chain. They are called stop codons. 61 triplets encode different amino acids, i.e. one amino acid can be encoded by several triplets. This property of the genetic code is called degeneracy. Degeneracy occurs only in the direction from amino acids to nucleotides, in the opposite direction the code is unambiguous, i.e. Each triplet codes for one specific amino acid.

punctuation marks

An important question, which theoretically turned out to be impossible to solve, is how triplets encoding neighboring amino acids are separated from each other, i.e., whether there are punctuation marks in the genetic text.

Missing commas - experiments

Ingenious experiments by Crick and Brenner made it possible to find out whether there are “commas” in genetic texts. During these experiments, scientists used mutagenic substances (acridine dyes) to cause the occurrence of a certain type of mutation - the loss or insertion of 1 nucleotide. It turned out that the loss or insertion of 1 or 2 nucleotides always causes a breakdown of the encoded protein, but the loss or insertion of 3 nucleotides (or a multiple of 3) has virtually no effect on the function of the encoded protein.

Let's imagine that we have a genetic text built from a repeating triplet of ABC nucleotides (Fig. 1, a). If there are no punctuation marks, inserting one additional nucleotide will lead to complete distortion of the text (Fig. 1, a). Bacteriophage mutations were obtained that were located close to each other on the genetic map. When crossing two phages carrying such mutations, a hybrid arose that carried two single-letter inserts (Fig. 1, b). It is clear that the meaning of the text was lost in this case as well. If you introduce another one-letter insert, then after a short incorrect section the meaning will be restored and there is a chance to obtain a functioning protein (Fig. 1, c). This is true for triplet code in the absence of punctuation. If the code number is different, then the number of insertions necessary to restore the meaning will be different. If there are punctuation marks in the code, then the insertion will disrupt the reading of only one triplet, and the rest of the protein will be synthesized correctly and will remain active. Experiments have shown that single-letter insertions always lead to the disappearance of the protein, and restoration of function occurs when the number of insertions is a multiple of 3. Thus, the triplet nature of the genetic code and the absence of internal punctuation marks were proven.

non-overlapping

Gamow assumed that the code was overlapping, i.e. the first, second and third nucleotides coded for the first amino acid, the second, third and fourth - for the second amino acid, the third, fourth and fifth - for the third, etc. This hypothesis created the appearance of solving spatial difficulties, but it created another problem. With this coding, a given amino acid could not be followed by any other, since in the triplet encoding it, the first two nucleotides had already been determined, and the number of possible triplets was reduced to four. Analysis of amino acid sequences in proteins showed that all possible pairs of neighboring amino acids occur, i.e. the code should be non-overlapping.

versatility

decoding the code

When the basic properties of the genetic code were studied, work began on deciphering it and the meanings of all triplets were determined (see figure). The triplet encoding a specific amino acid is called codon. As a rule, codons are indicated in mRNA, sometimes in the sense strand of DNA (the same codons, but with Y replaced by T). For some amino acids, such as methionine, there is only one codon. Others have two codons (phenylalanine, tyrosine). There are amino acids that are encoded by three, four and even six codons. Codons of one amino acid are similar to each other and, as a rule, differ in one last nucleotide. This makes the genetic code more stable, since replacing the last nucleotide in a codon during mutations does not lead to a replacement of the amino acid in the protein. Knowledge of the genetic code allows us, knowing the sequence of nucleotides in a gene, to deduce the sequence of amino acids in a protein, which is widely used in modern research.

Previously, we emphasized that nucleotides have an important feature for the formation of life on Earth - in the presence of one polynucleotide chain in a solution, the process of formation of a second (parallel) chain spontaneously occurs based on the complementary connection of related nucleotides. The same number of nucleotides in both chains and their chemical affinity are an indispensable condition for the implementation of this type of reaction. However, during protein synthesis, when information from mRNA is implemented into the protein structure, there can be no talk of observing the principle of complementarity. This is due to the fact that in mRNA and in the synthesized protein not only the number of monomers is different, but also, what is especially important, there is no structural similarity between them (nucleotides on the one hand, amino acids on the other). It is clear that in this case there is a need to create a new principle for accurately translating information from a polynucleotide into the structure of a polypeptide. In evolution, such a principle was created and its basis was the genetic code.

The genetic code is a system for recording hereditary information in nucleic acid molecules, based on a certain alternation of nucleotide sequences in DNA or RNA, forming codons corresponding to amino acids in a protein.

The genetic code has several properties.

    Tripletity.

    Degeneracy or redundancy.

    Unambiguity.

    Polarity.

    Non-overlapping.

    Compactness.

    Versatility.

It should be noted that some authors also propose other properties of the code related to the chemical characteristics of the nucleotides included in the code or the frequency of occurrence of individual amino acids in the body’s proteins, etc. However, these properties follow from those listed above, so we will consider them there.

A. Tripletity. The genetic code, like many complexly organized systems, has the smallest structural and smallest functional unit. A triplet is the smallest structural unit of the genetic code. It consists of three nucleotides. A codon is the smallest functional unit of the genetic code. Typically, triplets of mRNA are called codons. In the genetic code, a codon performs several functions. Firstly, its main function is that it encodes a single amino acid. Secondly, the codon may not code for an amino acid, but, in this case, it performs another function (see below). As can be seen from the definition, a triplet is a concept that characterizes elementary structural unit genetic code (three nucleotides). Codon – characterizes elementary semantic unit genome - three nucleotides determine the attachment of one amino acid to the polypeptide chain.

The elementary structural unit was first deciphered theoretically, and then its existence was confirmed experimentally. Indeed, 20 amino acids cannot be encoded with one or two nucleotides because there are only 4 of the latter. Three out of four nucleotides give 4 3 = 64 variants, which more than covers the number of amino acids available in living organisms (see Table 1).

The 64 nucleotide combinations presented in table have two features. Firstly, of the 64 triplet variants, only 61 are codons and encode any amino acid, they are called sense codons. Three triplets do not encode

Table 1.

Messenger RNA codons and corresponding amino acids

FOUNDATION OF KODONOV

Nonsense

Nonsense

Nonsense

Meth

Shaft

amino acids a are stop signals indicating the end of translation. There are three such triplets - UAA, UAG, UGA, they are also called “meaningless” (nonsense codons). As a result of a mutation, which is associated with the replacement of one nucleotide in a triplet with another, a nonsense codon can arise from a sense codon. This type of mutation is called nonsense mutation. If such a stop signal is formed inside the gene (in its information part), then during protein synthesis in this place the process will be constantly interrupted - only the first (before the stop signal) part of the protein will be synthesized. A person with this pathology will experience a lack of protein and will experience symptoms associated with this deficiency. For example, this kind of mutation was identified in the gene encoding the hemoglobin beta chain. A shortened inactive hemoglobin chain is synthesized, which is quickly destroyed. As a result, a hemoglobin molecule devoid of a beta chain is formed. It is clear that such a molecule is unlikely to fully fulfill its duties. A serious disease occurs that develops as hemolytic anemia (beta-zero thalassemia, from the Greek word “Thalas” - Mediterranean Sea, where this disease was first discovered).

The mechanism of action of stop codons differs from the mechanism of action of sense codons. This follows from the fact that for all codons encoding amino acids, corresponding tRNAs have been found. No tRNAs were found for nonsense codons. Consequently, tRNA does not take part in the process of stopping protein synthesis.

CodonAUG (sometimes GUG in bacteria) not only encode the amino acids methionine and valine, but are alsobroadcast initiator .

b. Degeneracy or redundancy.

61 of the 64 triplets encode 20 amino acids. This three-fold excess of the number of triplets over the number of amino acids suggests that two coding options can be used in the transfer of information. Firstly, not all 64 codons can be involved in encoding 20 amino acids, but only 20 and, secondly, amino acids can be encoded by several codons. Research has shown that nature used the latter option.

His preference is obvious. If out of 64 variant triplets only 20 were involved in encoding amino acids, then 44 triplets (out of 64) would remain non-coding, i.e. meaningless (nonsense codons). Previously, we pointed out how dangerous it is for the life of a cell to transform a coding triplet as a result of mutation into a nonsense codon - this significantly disrupts the normal functioning of RNA polymerase, ultimately leading to the development of diseases. Currently, three codons in our genome are nonsense, but now imagine what would happen if the number of nonsense codons increased by about 15 times. It is clear that in such a situation the transition of normal codons to nonsense codons will be immeasurably higher.

A code in which one amino acid is encoded by several triplets is called degenerate or redundant. Almost every amino acid has several codons. Thus, the amino acid leucine can be encoded by six triplets - UUA, UUG, TSUU, TsUC, TsUA, TsUG. Valine is encoded by four triplets, phenylalanine by two and only tryptophan and methionine encoded by one codon. The property that is associated with recording the same information with different symbols is called degeneracy.

The number of codons designated for one amino acid correlates well with the frequency of occurrence of the amino acid in proteins.

And this is most likely not accidental. The higher the frequency of occurrence of an amino acid in a protein, the more often the codon of this amino acid is represented in the genome, the higher the likelihood of its damage by mutagenic factors. Therefore, it is clear that a mutated codon has a greater chance of encoding the same amino acid if it is highly degenerate. From this perspective, the degeneracy of the genetic code is a mechanism that protects the human genome from damage.

It should be noted that the term degeneracy is used in molecular genetics in another sense. Thus, the bulk of the information in a codon is contained in the first two nucleotides; the base in the third position of the codon turns out to be of little importance. This phenomenon is called “degeneracy of the third base.” The latter feature minimizes the effect of mutations. For example, it is known that the main function of red blood cells is to transport oxygen from the lungs to the tissues and carbon dioxide from the tissues to the lungs. This function is performed by the respiratory pigment - hemoglobin, which fills the entire cytoplasm of the erythrocyte. It consists of a protein part - globin, which is encoded by the corresponding gene. In addition to protein, the hemoglobin molecule contains heme, which contains iron. Mutations in globin genes lead to the appearance of different variants of hemoglobins. Most often, mutations are associated with replacing one nucleotide with another and the appearance of a new codon in the gene, which may encode a new amino acid in the hemoglobin polypeptide chain. In a triplet, as a result of mutation, any nucleotide can be replaced - the first, second or third. Several hundred mutations are known that affect the integrity of the globin genes. Near 400 of which are associated with the replacement of single nucleotides in a gene and the corresponding amino acid replacement in a polypeptide. Of these only 100 replacements lead to instability of hemoglobin and various kinds of diseases from mild to very severe. 300 (approximately 64%) substitution mutations do not affect hemoglobin function and do not lead to pathology. One of the reasons for this is the above-mentioned “degeneracy of the third base,” when a replacement of the third nucleotide in a triplet encoding serine, leucine, proline, arginine and some other amino acids leads to the appearance of a synonymous codon encoding the same amino acid. Such a mutation will not manifest itself phenotypically. In contrast, any replacement of the first or second nucleotide in a triplet in 100% of cases leads to the appearance of a new hemoglobin variant. But even in this case, there may not be severe phenotypic disorders. The reason for this is the replacement of an amino acid in hemoglobin with another one similar to the first in physicochemical properties. For example, if an amino acid with hydrophilic properties is replaced by another amino acid, but with the same properties.

Hemoglobin consists of the iron porphyrin group of heme (oxygen and carbon dioxide molecules are attached to it) and protein - globin. Adult hemoglobin (HbA) contains two identical-chains and two-chains. Molecule-chain contains 141 amino acid residues,-chain - 146,- And-chains differ in many amino acid residues. The amino acid sequence of each globin chain is encoded by its own gene. Gene encoding-the chain is located in the short arm of chromosome 16,-gene - in the short arm of chromosome 11. Substitution in the gene encoding-the hemoglobin chain of the first or second nucleotide almost always leads to the appearance of new amino acids in the protein, disruption of hemoglobin functions and serious consequences for the patient. For example, replacing “C” in one of the triplets CAU (histidine) with “Y” will lead to the appearance of a new triplet UAU, encoding another amino acid - tyrosine. Phenotypically this will manifest itself in a severe disease.. A similar substitution in position 63-chain of histidine polypeptide to tyrosine will lead to destabilization of hemoglobin. The disease methemoglobinemia develops. Replacement, as a result of mutation, of glutamic acid with valine in the 6th position-chain is the cause of the most severe disease - sickle cell anemia. Let's not continue the sad list. Let us only note that when replacing the first two nucleotides, an amino acid with physicochemical properties similar to the previous one may appear. Thus, replacement of the 2nd nucleotide in one of the triplets encoding glutamic acid (GAA) in-chain with “U” leads to the appearance of a new triplet (GUA), encoding valine, and replacing the first nucleotide with “A” forms the triplet AAA, encoding the amino acid lysine. Glutamic acid and lysine are similar in physicochemical properties - they are both hydrophilic. Valine is a hydrophobic amino acid. Therefore, replacing hydrophilic glutamic acid with hydrophobic valine significantly changes the properties of hemoglobin, which ultimately leads to the development of sickle cell anemia, while replacing hydrophilic glutamic acid with hydrophilic lysine changes the function of hemoglobin to a lesser extent - patients develop a mild form of anemia. As a result of the replacement of the third base, the new triplet can encode the same amino acids as the previous one. For example, if in the CAC triplet uracil was replaced by cytosine and a CAC triplet appeared, then practically no phenotypic changes will be detected in humans. This is understandable, because both triplets code for the same amino acid – histidine.

In conclusion, it is appropriate to emphasize that the degeneracy of the genetic code and the degeneracy of the third base from a general biological point of view are protective mechanisms that are inherent in evolution in the unique structure of DNA and RNA.

V. Unambiguity.

Each triplet (except nonsense) encodes only one amino acid. Thus, in the direction codon - amino acid the genetic code is unambiguous, in the direction amino acid - codon it is ambiguous (degenerate).

Unambiguous

Amino acid codon

Degenerate

And in this case, the need for unambiguity in the genetic code is obvious. In another option, when translating the same codon, different amino acids would be inserted into the protein chain and, as a result, proteins with different primary structures and different functions would be formed. Cell metabolism would switch to the “one gene – several polypeptides” mode of operation. It is clear that in such a situation the regulatory function of genes would be completely lost.

g. Polarity

Reading information from DNA and mRNA occurs only in one direction. Polarity is important for defining higher order structures (secondary, tertiary, etc.). Earlier we talked about how lower-order structures determine higher-order structures. Tertiary structure and higher order structures in proteins are formed as soon as the synthesized RNA chain leaves the DNA molecule or the polypeptide chain leaves the ribosome. While the free end of an RNA or polypeptide acquires a tertiary structure, the other end of the chain continues to be synthesized on DNA (if RNA is transcribed) or a ribosome (if a polypeptide is transcribed).

Therefore, the unidirectional process of reading information (during the synthesis of RNA and protein) is essential not only for determining the sequence of nucleotides or amino acids in the synthesized substance, but for the strict determination of secondary, tertiary, etc. structures.

d. Non-overlapping.

The code may be overlapping or non-overlapping. Most organisms have a non-overlapping code. Overlapping code is found in some phages.

The essence of a non-overlapping code is that a nucleotide of one codon cannot simultaneously be a nucleotide of another codon. If the code were overlapping, then the sequence of seven nucleotides (GCUGCUG) could encode not two amino acids (alanine-alanine) (Fig. 33, A) as in the case of a non-overlapping code, but three (if there is one nucleotide in common) (Fig. 33, B) or five (if two nucleotides are common) (see Fig. 33, C). In the last two cases, a mutation of any nucleotide would lead to a violation in the sequence of two, three, etc. amino acids.

However, it has been established that a mutation of one nucleotide always disrupts the inclusion of one amino acid in a polypeptide. This is a significant argument that the code is non-overlapping.

Let us explain this in Figure 34. Bold lines show triplets encoding amino acids in the case of non-overlapping and overlapping code. Experiments have clearly shown that the genetic code is non-overlapping. Without going into details of the experiment, we note that if you replace the third nucleotide in the sequence of nucleotides (see Fig. 34)U (marked with an asterisk) to some other thing:

1. With a non-overlapping code, the protein controlled by this sequence would have a substitution of one (first) amino acid (marked with asterisks).

2. With an overlapping code in option A, a substitution would occur in two (first and second) amino acids (marked with asterisks). Under option B, the replacement would affect three amino acids (marked with asterisks).

However, numerous experiments have shown that when one nucleotide in DNA is disrupted, the disruption in the protein always affects only one amino acid, which is typical for a non-overlapping code.

GZUGZUG GZUGZUG GZUGZUG

GCU GCU GCU UGC GCU GCU GCU UGC GCU GCU GCU

*** *** *** *** *** ***

Alanin - Alanin Ala - Cis - Ley Ala - Ley - Ley - Ala - Ley

A B C

Non-overlapping code Overlapping code

Rice. 34. A diagram explaining the presence of a non-overlapping code in the genome (explanation in the text).

The non-overlap of the genetic code is associated with another property - the reading of information begins from a certain point - the initiation signal. Such an initiation signal in mRNA is the codon encoding methionine AUG.

It should be noted that a person still has a small number of genes that deviate from the general rule and overlap.

e. Compactness.

There is no punctuation between codons. In other words, triplets are not separated from each other, for example, by one meaningless nucleotide. The absence of “punctuation marks” in the genetic code has been proven in experiments.

and. Versatility.

The code is the same for all organisms living on Earth. Direct evidence of the universality of the genetic code was obtained by comparing DNA sequences with corresponding protein sequences. It turned out that all bacterial and eukaryotic genomes use the same sets of code values. There are exceptions, but not many.

The first exceptions to the universality of the genetic code were found in the mitochondria of some animal species. This concerned the terminator codon UGA, which reads the same as the codon UGG, encoding the amino acid tryptophan. Other rarer deviations from universality were also found.

MZ. The genetic code is a system for recording hereditary information in nucleic acid molecules, based on a certain alternation of nucleotide sequences in DNA or RNA that form codons,

corresponding to amino acids in protein.The genetic code has several properties.

They line up in chains and thus produce sequences of genetic letters.

Genetic code

The proteins of almost all living organisms are built from only 20 types of amino acids. These amino acids are called canonical. Each protein is a chain or several chains of amino acids connected in a strictly defined sequence. This sequence determines the structure of the protein, and therefore all its biological properties.

C

CUU (Leu/L)Leucine
CUC (Leu/L)Leucine
CUA (Leu/L)Leucine
CUG (Leu/L)Leucine

In some proteins, nonstandard amino acids, such as selenocysteine ​​and pyrrolysine, are inserted by a ribosome reading the stop codon, depending on the sequences in the mRNA. Selenocysteine ​​is now considered to be the 21st, and pyrrolysine the 22nd, amino acids that make up proteins.

Despite these exceptions, all living organisms have common genetic codes: a codon consists of three nucleotides, where the first two are decisive; codons are translated by tRNA and ribosomes into an amino acid sequence.

Deviations from the standard genetic code.
Example Codon Normal meaning Reads like:
Some types of yeast Candida C.U.G. Leucine Serin
Mitochondria, in particular in Saccharomyces cerevisiae CU(U, C, A, G) Leucine Serin
Mitochondria of higher plants CGG Arginine Tryptophan
Mitochondria (in all studied organisms without exception) U.G.A. Stop Tryptophan
Mitochondria in mammals, Drosophila, S. cerevisiae and many protozoa AUA Isoleucine Methionine = Start
Prokaryotes G.U.G. Valin Start
Eukaryotes (rare) C.U.G. Leucine Start
Eukaryotes (rare) G.U.G. Valin Start
Prokaryotes (rare) UUG Leucine Start
Eukaryotes (rare) A.C.G. Threonine Start
Mammalian mitochondria AGC, AGU Serin Stop
Drosophila mitochondria A.G.A. Arginine Stop
Mammalian mitochondria AG(A, G) Arginine Stop

History of ideas about the genetic code

However, in the early 60s of the 20th century, new data revealed the inconsistency of the “code without commas” hypothesis. Then experiments showed that codons, considered meaningless by Crick, could provoke protein synthesis in vitro, and by 1965 the meaning of all 64 triplets was established. It turned out that some codons are simply redundant, that is, a whole series of amino acids are encoded by two, four or even six triplets.

see also

Notes

  1. Genetic code supports targeted insertion of two amino acids by one codon. Turanov AA, Lobanov AV, Fomenko DE, Morrison HG, Sogin ML, Klobutcher LA, Hatfield DL, Gladyshev VN. Science. 2009 Jan 9;323(5911):259-61.
  2. The AUG codon encodes methionine, but at the same time serves as a start codon - translation usually begins with the first AUG codon of mRNA.
  3. NCBI: "The Genetic Codes", Compiled by Andrzej (Anjay) Elzanowski and Jim Ostell
  4. Jukes TH, Osawa S, The genetic code in mitochondria and chloroplasts., Experience. 1990 Dec 1;46(11-12):1117-26.
  5. Osawa S, Jukes TH, Watanabe K, Muto A (March 1992). "Recent evidence for evolution of the genetic code." Microbiol. Rev. 56 (1): 229–64. PMID 1579111.
  6. SANGER F. (1952). "The arrangement of amino acids in proteins." Adv Protein Chem. 7 : 1-67. PMID 14933251.
  7. M. Ichas Biological code. - World, 1971.
  8. WATSON JD, CRICK FH. (April 1953). “Molecular structure of nucleic acids; a structure for deoxyribose nucleic acid." Nature 171 : 737-738. PMID 13054692.
  9. WATSON JD, CRICK FH. (May 1953). "Genetic implications of the structure of deoxyribonucleic acid." Nature 171 : 964-967. PMID 13063483.
  10. Crick FH. (April 1966). “The genetic code - yesterday, today, and tomorrow.” Cold Spring Harb Symp Quant Biol.: 1-9. PMID 5237190.
  11. G. GAMOW (February 1954). "Possible Relation between Deoxyribonucleic Acid and Protein Structures." Nature 173 : 318. DOI:10.1038/173318a0. PMID 13882203.
  12. GAMOW G, RICH A, YCAS M. (1956). "The problem of information transfer from the nucleic acids to proteins." Adv Biol Med Phys. 4 : 23-68. PMID 13354508.
  13. Gamow G, Ycas M. (1955). “STATISTICAL CORRELATION OF PROTEIN AND RIBONUCLEIC ACID COMPOSITION. " Proc Natl Acad Sci U S A. 41 : 1011-1019. PMID 16589789.
  14. Crick FH, Griffith JS, Orgel LE. (1957). “CODES WITHOUT COMMAS. " Proc Natl Acad Sci U S A. 43 : 416-421. PMID 16590032.
  15. Hayes B. (1998). "The Invention of the Genetic Code." (PDF reprint). American Scientist 86 : 8-14.

Literature

  • Azimov A. Genetic code. From the theory of evolution to deciphering DNA. - M.: Tsentrpoligraf, 2006. - 208 pp. - ISBN 5-9524-2230-6.
  • Ratner V. A. Genetic code as a system - Soros educational journal, 2000, 6, No. 3, pp. 17-22.
  • Crick FH, Barnett L, Brenner S, Watts-Tobin RJ. General nature of the genetic code for proteins - Nature, 1961 (192), pp. 1227-32

Links

  • Genetic code- article from the Great Soviet Encyclopedia

Wikimedia Foundation. 2010.

Chemical composition and structural organization of the DNA molecule.

Nucleic acid molecules are very long chains consisting of many hundreds and even millions of nucleotides. Any nucleic acid contains only four types of nucleotides. The functions of nucleic acid molecules depend on their structure, the nucleotides they contain, their number in the chain and the sequence of the compound in the molecule.

Each nucleotide consists of three components: a nitrogenous base, a carbohydrate and a phosphoric acid. IN compound each nucleotide DNA includes one of four types of nitrogenous bases (adenine - A, thymine - T, guanine - G or cytosine - C), as well as deoxyribose carbon and a phosphoric acid residue.

Thus, DNA nucleotides differ only in the type of nitrogenous base.
The DNA molecule consists of a huge number of nucleotides connected in a chain in a certain sequence. Each type of DNA molecule has its own number and sequence of nucleotides.

DNA molecules are very long. For example, to write down the sequence of nucleotides in DNA molecules from one human cell (46 chromosomes) in letters would require a book of about 820,000 pages. The alternation of four types of nucleotides can form an infinite number of variants of DNA molecules. These structural features of DNA molecules allow them to store a huge amount of information about all the characteristics of organisms.

In 1953, the American biologist J. Watson and the English physicist F. Crick created a model of the structure of the DNA molecule. Scientists have found that each DNA molecule consists of two chains interconnected and spirally twisted. It looks like a double helix. In each chain, four types of nucleotides alternate in a specific sequence.

Nucleotide DNA composition varies among different types of bacteria, fungi, plants, and animals. But it does not change with age and depends little on environmental changes. Nucleotides are paired, that is, the number of adenine nucleotides in any DNA molecule is equal to the number of thymidine nucleotides (A-T), and the number of cytosine nucleotides is equal to the number of guanine nucleotides (C-G). This is due to the fact that the connection of two chains to each other in a DNA molecule is subject to a certain rule, namely: adenine of one chain is always connected by two hydrogen bonds only with Thymine of the other chain, and guanine - by three hydrogen bonds with cytosine, that is, the nucleotide chains of one molecule DNA is complementary, complementing each other.



Nucleic acid molecules - DNA and RNA - are made up of nucleotides. DNA nucleotides include a nitrogenous base (A, T, G, C), the carbohydrate deoxyribose and a phosphoric acid molecule residue. The DNA molecule is a double helix, consisting of two chains connected by hydrogen bonds according to the principle of complementarity. The function of DNA is to store hereditary information.

Properties and functions of DNA.

DNA is a carrier of genetic information recorded in the form of a sequence of nucleotides using a genetic code. DNA molecules are associated with two fundamental properties of living things organisms - heredity and variability. During a process called DNA replication, two copies of the original strand are formed, which are inherited by daughter cells when they divide, so that the resulting cells are genetically identical to the original.

Genetic information is realized during gene expression in the processes of transcription (synthesis of RNA molecules on a DNA template) and translation (synthesis of proteins on an RNA template).

The sequence of nucleotides “encodes” information about different types of RNA: messenger or template (mRNA), ribosomal (rRNA) and transport (tRNA). All these types of RNA are synthesized from DNA during the process of transcription. Their role in protein biosynthesis (translation process) is different. Messenger RNA contains information about the sequence of amino acids in a protein, ribosomal RNA serves as the basis for ribosomes (complex nucleoprotein complexes, the main function of which is the assembly of proteins from individual amino acids based on mRNA), transfer RNAs deliver amino acids to the site of protein assembly - to the active center of the ribosome, " crawling" on mRNA.

Genetic code, its properties.

Genetic code- a method characteristic of all living organisms of encoding the amino acid sequence of proteins using a sequence of nucleotides. PROPERTIES:

  1. Triplety- a meaningful unit of code is a combination of three nucleotides (triplet, or codon).
  2. Continuity- there are no punctuation marks between triplets, that is, the information is read continuously.
  3. Non-overlapping- the same nucleotide cannot simultaneously be part of two or more triplets (not observed for some overlapping genes of viruses, mitochondria and bacteria, which encode several frameshift proteins).
  4. Uniqueness (specificity)- a specific codon corresponds to only one amino acid (however, the UGA codon has Euplotes crassus encodes two amino acids - cysteine ​​and selenocysteine)
  5. Degeneracy (redundancy)- several codons can correspond to the same amino acid.
  6. Versatility- the genetic code works the same in organisms of different levels of complexity - from viruses to humans (genetic engineering methods are based on this; there are a number of exceptions, shown in the table in the section “Variations of the standard genetic code” below).
  7. Noise immunity- mutations of nucleotide substitutions that do not lead to a change in the class of the encoded amino acid are called conservative; nucleotide substitution mutations that lead to a change in the class of the encoded amino acid are called radical.

5. Autoreproduction of DNA. Replicon and its functioning .

The process of self-reproduction of nucleic acid molecules, accompanied by the inheritance (from cell to cell) of exact copies of genetic information; R. carried out with the participation of a set of specific enzymes (helicase<helicase>controlling the unwinding of the molecule DNA, DNA-polymerase<DNA polymerase> I and III, DNA-ligase<DNA ligase>), proceeds in a semi-conservative manner with the formation of a replication fork<replication fork>; on one of the circuits<leading strand> the synthesis of the complementary chain is continuous, and on the other<lagging strand> occurs due to the formation of Dkazaki fragments<Okazaki fragments>; R. - a high-precision process, the error rate of which does not exceed 10 -9; in eukaryotes R. can occur at several points of one molecule at once DNA; speed R. eukaryotes have about 100, and bacteria have about 1000 nucleotides per second.

6. Levels of eukaryotic genome organization .

In eukaryotic organisms, the mechanism of transcription regulation is much more complex. As a result of cloning and sequencing of eukaryotic genes, specific sequences involved in transcription and translation were discovered.
A eukaryotic cell is characterized by:
1. The presence of introns and exons in the DNA molecule.
2. Maturation of mRNA - excision of introns and stitching of exons.
3. The presence of regulatory elements that regulate transcription, such as: a) promoters - 3 types, each of which is occupied by a specific polymerase. Pol I replicates ribosomal genes, Pol II replicates protein structural genes, Pol III replicates genes encoding small RNAs. The Pol I and Pol II promoter are located in front of the transcription initiation site, the Pol III promoter is within the structural gene; b) modulators - DNA sequences that enhance the level of transcription; c) amplifiers - sequences that enhance the level of transcription and act regardless of their position relative to the coding part of the gene and the state of the starting point of RNA synthesis; d) terminators - specific sequences that stop both translation and transcription.
These sequences differ from prokaryotic sequences in their primary structure and location relative to the start codon, and bacterial RNA polymerase does not “recognize” them. Thus, for the expression of eukaryotic genes in prokaryotic cells, the genes must be under the control of prokaryotic regulatory elements. This circumstance must be taken into account when constructing expression vectors.

7. Chemical and structural composition of chromosomes .

Chemical chromosome composition - DNA - 40%, Histone proteins - 40%. Non-histone - 20% some RNA. Lipids, polysaccharides, metal ions.

The chemical composition of a chromosome is a complex of nucleic acids with proteins, carbohydrates, lipids and metals. The chromosome regulates gene activity and restores it in the event of chemical or radiation damage.

STRUCTURAL????

Chromosomes- nucleoprotein structural elements of the cell nucleus, containing DNA, which contains the hereditary Information of the organism, are capable of self-reproduction, have structural and functional individuality and retain it over a number of generations.

in the mitotic cycle the following features of the structural organization of chromosomes are observed:

There are mitotic and interphase forms of the structural organization of chromosomes, mutually transforming into each other in the mitotic cycle - these are functional and physiological transformations

8. Levels of packaging of hereditary material in eukaryotes .

Structural and functional levels of organization of hereditary material of eukaryotes

Heredity and variability provide:

1) individual (discrete) inheritance and change of individual characteristics;

2) reproduction in individuals of each generation of the entire complex of morphofunctional characteristics of organisms of a particular biological species;

3) redistribution in species with sexual reproduction in the process of reproduction of hereditary inclinations, as a result of which the descendant has a combination of characteristics that is different from their combination in the parents. The patterns of inheritance and variability of traits and their sets follow from the principles of the structural and functional organization of genetic material.

There are three levels of organization of the hereditary material of eukaryotic organisms: gene, chromosomal and genomic (genotype level).

The elementary structure of the gene level is the gene. The transfer of genes from parents to offspring is necessary for the development of certain characteristics. Although several forms of biological variability are known, only a violation of the structure of genes changes the meaning of hereditary information, in accordance with which specific characteristics and properties are formed. Thanks to the presence of the gene level, individual, separate (discrete) and independent inheritance and changes in individual characteristics are possible.

Genes in eukaryotic cells are distributed in groups along chromosomes. These are the structures of the cell nucleus, which are characterized by individuality and the ability to reproduce themselves with the preservation of individual structural features over generations. The presence of chromosomes determines the identification of the chromosomal level of organization of hereditary material. The placement of genes on chromosomes affects the relative inheritance of traits and makes it possible for the function of a gene to be influenced by its immediate genetic environment - neighboring genes. The chromosomal organization of hereditary material serves as a necessary condition for the redistribution of the hereditary inclinations of parents in offspring during sexual reproduction.

Despite the distribution on different chromosomes, the entire set of genes functionally behaves as a whole, forming a single system representing the genomic (genotypic) level of organization of the hereditary material. At this level, there is a wide interaction and mutual influence of hereditary inclinations, localized both in one and in different chromosomes. The result is the mutual correspondence of genetic information of different hereditary inclinations and, consequently, the development of traits balanced in time, place and intensity in the process of ontogenesis. The functional activity of genes, the mode of replication and mutational changes in the hereditary material also depend on the characteristics of the genotype of the organism or cell as a whole. This is evidenced, for example, by the relativity of the property of dominance.

Eu - and heterochromatin.

Some chromosomes appear condensed and intensely colored during cell division. Such differences were called heteropyknosis. The term " heterochromatin" There are euchromatin - the main part of mitotic chromosomes, which undergoes the usual cycle of compaction and decompaction during mitosis, and heterochromatin- regions of chromosomes that are constantly in a compact state.

In most species of eukaryotes, chromosomes contain both ew- and heterochromatic regions, the latter making up a significant part of the genome. Heterochromatin located in pericentromeric, sometimes in peritomeric regions. Heterochromatic regions were discovered in the euchromatic arms of chromosomes. They look like inclusions (intercalations) of heterochromatin into euchromatin. Such heterochromatin called intercalary. Chromatin compaction. Euchromatin and heterochromatin differ in compaction cycles. Euhr. goes through a full cycle of compaction-decompaction from interphase to interphase, hetero. maintains a state of relative compactness. Differential stainability. Different areas of heterochromatin are stained with different dyes, some areas with one, others with several. By using various dyes and using chromosomal rearrangements that break up heterochromatic regions, it has been possible to characterize many small regions in Drosophila where the affinity for the stains is different from neighboring regions.

10. Morphological features of the metaphase chromosome .

The metaphase chromosome consists of two longitudinal strands of deoxyribonucleoprotein - chromatids, connected to each other in the region of the primary constriction - the centromere. A centromere is a specially organized region of a chromosome that is common to both sister chromatids. The centromere divides the chromosome body into two arms. Depending on the location of the primary constriction, the following types of chromosomes are distinguished: equal-armed (metacentric), when the centromere is located in the middle and the arms are approximately equal in length; unequal arms (submetacentric), when the centromere is displaced from the middle of the chromosome, and the arms are of unequal length; rod-shaped (acrocentric), when the centromere is shifted to one end of the chromosome and one arm is very short. There are also point (telocentric) chromosomes; they lack one arm, but they are not present in the human karyotype (chromosome set). Some chromosomes may have secondary constrictions that separate a region called a satellite from the chromosome body.

Each living organism has a special set of proteins. Certain nucleotide compounds and their sequence in the DNA molecule form the genetic code. It conveys information about the structure of the protein. A certain concept has been accepted in genetics. According to it, one gene corresponded to one enzyme (polypeptide). It should be said that research on nucleic acids and proteins has been carried out over a fairly long period. Later in the article we will take a closer look at the genetic code and its properties. A brief chronology of the research will also be provided.

Terminology

The genetic code is a way of encoding the sequence of amino acid proteins involving the nucleotide sequence. This method of generating information is characteristic of all living organisms. Proteins are natural organic substances with high molecularity. These compounds are also present in living organisms. They consist of 20 types of amino acids, which are called canonical. Amino acids are arranged in a chain and connected in a strictly established sequence. It determines the structure of the protein and its biological properties. There are also several chains of amino acids in a protein.

DNA and RNA

Deoxyribonucleic acid is a macromolecule. She is responsible for the transmission, storage and implementation of hereditary information. DNA uses four nitrogenous bases. These include adenine, guanine, cytosine, thymine. RNA consists of the same nucleotides, except that it contains thymine. Instead, there is a nucleotide containing uracil (U). RNA and DNA molecules are nucleotide chains. Thanks to this structure, sequences are formed - the “genetic alphabet”.

Implementation of information

Protein synthesis, which is encoded by the gene, is realized by combining mRNA on a DNA template (transcription). The genetic code is also transferred into the amino acid sequence. That is, the synthesis of the polypeptide chain on mRNA takes place. To encrypt all amino acids and the signal for the end of the protein sequence, 3 nucleotides are enough. This chain is called a triplet.

History of the study

The study of proteins and nucleic acids has been carried out for a long time. In the middle of the 20th century, the first ideas about the nature of the genetic code finally appeared. In 1953, it was discovered that some proteins consist of sequences of amino acids. True, at that time they could not yet determine their exact number, and there were numerous disputes about this. In 1953, two works were published by the authors Watson and Crick. The first stated about the secondary structure of DNA, the second spoke about its permissible copying using template synthesis. In addition, emphasis was placed on the fact that a specific sequence of bases is a code that carries hereditary information. American and Soviet physicist Georgiy Gamow assumed the coding hypothesis and found a method for testing it. In 1954, his work was published, during which he proposed to establish correspondences between amino acid side chains and diamond-shaped “holes” and use this as a coding mechanism. Then it was called rhombic. Explaining his work, Gamow admitted that the genetic code could be a triplet. The physicist’s work was one of the first among those that were considered close to the truth.

Classification

Over the years, various models of genetic codes have been proposed, of two types: overlapping and non-overlapping. The first was based on the inclusion of one nucleotide in several codons. It includes a triangular, sequential and major-minor genetic code. The second model assumes two types. Non-overlapping codes include combination code and comma-free code. The first option is based on the encoding of an amino acid by triplets of nucleotides, and the main thing is its composition. According to the "code without commas", certain triplets correspond to amino acids, but others do not. In this case, it was believed that if any significant triplets were arranged sequentially, others located in a different reading frame would be unnecessary. Scientists believed that it was possible to select a nucleotide sequence that would satisfy these requirements, and that there were exactly 20 triplets.

Although Gamow and his co-authors questioned this model, it was considered the most correct over the next five years. At the beginning of the second half of the 20th century, new data appeared that made it possible to discover some shortcomings in the “code without commas”. It was found that codons are capable of inducing protein synthesis in vitro. Closer to 1965, the principle of all 64 triplets was comprehended. As a result, redundancy of some codons was discovered. In other words, the amino acid sequence is encoded by several triplets.

Distinctive features

The properties of the genetic code include:

Variations

The first deviation of the genetic code from the standard was discovered in 1979 during the study of mitochondrial genes in the human body. Further similar variants were further identified, including many alternative mitochondrial codes. These include the decoding of the UGA stop codon, which is used to determine tryptophan in mycoplasmas. GUG and UUG in archaea and bacteria are often used as starting options. Sometimes genes encode a protein with a start codon that differs from that normally used by the species. Additionally, in some proteins, selenocysteine ​​and pyrrolysine, which are nonstandard amino acids, are inserted by the ribosome. She reads the stop codon. This depends on the sequences found in the mRNA. Currently, selenocysteine ​​is considered the 21st and pyrrolysane the 22nd amino acid present in proteins.

General features of the genetic code

However, all exceptions are rare. In living organisms, the genetic code generally has a number of common characteristics. These include the composition of a codon, which includes three nucleotides (the first two belong to the defining ones), the transfer of codons by tRNA and ribosomes into the amino acid sequence.