Genetic code: description, characteristics, history of research. How the genetic code affects character and destiny

Each living organism has a special set of proteins. Certain nucleotide compounds and their sequence in the DNA molecule form the genetic code. It conveys information about the structure of the protein. A certain concept has been accepted in genetics. According to it, one gene corresponded to one enzyme (polypeptide). It should be said that research on nucleic acids and proteins has been carried out over a fairly long period. Later in the article we will take a closer look at the genetic code and its properties. A brief chronology of the research will also be provided.

Terminology

The genetic code is a way of encoding the sequence of amino acid proteins using the nucleotide sequence. This method of generating information is characteristic of all living organisms. Proteins are natural organic substances with high molecularity. These compounds are also present in living organisms. They consist of 20 types of amino acids, which are called canonical. Amino acids are arranged in a chain and connected in a strictly established sequence. It determines the structure of the protein and its biological properties. There are also several chains of amino acids in a protein.

DNA and RNA

Deoxyribonucleic acid is a macromolecule. She is responsible for the transmission, storage and implementation of hereditary information. DNA uses four nitrogenous bases. These include adenine, guanine, cytosine, thymine. RNA consists of the same nucleotides, except that it contains thymine. Instead, there is a nucleotide containing uracil (U). RNA and DNA molecules are nucleotide chains. Thanks to this structure, sequences are formed - the “genetic alphabet”.

Realization of information

Protein synthesis, which is encoded by the gene, is realized by combining mRNA on a DNA template (transcription). The genetic code is also transferred into the amino acid sequence. That is, the synthesis of the polypeptide chain on mRNA takes place. To encrypt all amino acids and the signal for the end of the protein sequence, 3 nucleotides are enough. This chain is called a triplet.

History of the study

The study of proteins and nucleic acids has been carried out for a long time. In the middle of the 20th century, the first ideas finally appeared about the nature of the genetic code. In 1953, it was discovered that some proteins consist of sequences of amino acids. True, at that time they could not yet determine their exact number, and there were numerous disputes about this. In 1953, two works were published by the authors Watson and Crick. The first stated about the secondary structure of DNA, the second spoke about its permissible copying using template synthesis. In addition, emphasis was placed on the fact that a specific sequence of bases is a code that carries hereditary information. American and Soviet physicist Georgiy Gamow assumed the coding hypothesis and found a method for testing it. In 1954, his work was published, during which he proposed to establish correspondences between amino acid side chains and diamond-shaped “holes” and use this as a coding mechanism. Then it was called rhombic. Explaining his work, Gamow admitted that the genetic code could be a triplet. The physicist's work was one of the first among those that were considered close to the truth.

Classification

Over the years, various models of genetic codes have been proposed, of two types: overlapping and non-overlapping. The first was based on the inclusion of one nucleotide in several codons. It includes a triangular, sequential and major-minor genetic code. The second model assumes two types. Non-overlapping codes include combination code and comma-free code. The first option is based on the encoding of an amino acid by triplets of nucleotides, and the main thing is its composition. According to the "code without commas", certain triplets correspond to amino acids, but others do not. In this case, it was believed that if any significant triplets were arranged sequentially, others located in a different reading frame would be unnecessary. Scientists believed that it was possible to select a nucleotide sequence that would satisfy these requirements, and that there were exactly 20 triplets.

Although Gamow and his co-authors questioned this model, it was considered the most correct over the next five years. At the beginning of the second half of the 20th century, new data appeared that made it possible to discover some shortcomings in the “code without commas”. It was found that codons are capable of inducing protein synthesis in vitro. Closer to 1965, the principle of all 64 triplets was understood. As a result, redundancy of some codons was discovered. In other words, the amino acid sequence is encoded by several triplets.

Distinctive features

The properties of the genetic code include:

Variations

The first deviation of the genetic code from the standard was discovered in 1979 during the study of mitochondrial genes in the human body. Further similar variants were further identified, including many alternative mitochondrial codes. These include the decoding of the UGA stop codon, which is used to determine tryptophan in mycoplasmas. GUG and UUG in archaea and bacteria are often used as starting options. Sometimes genes encode a protein with a start codon that differs from that normally used by the species. Additionally, in some proteins, selenocysteine ​​and pyrrolysine, which are nonstandard amino acids, are inserted by the ribosome. She reads the stop codon. This depends on the sequences found in the mRNA. Currently, selenocysteine ​​is considered the 21st and pyrrolysane the 22nd amino acid present in proteins.

General features of the genetic code

However, all exceptions are rare. In living organisms, the genetic code generally has a number of common characteristics. These include the composition of a codon, which includes three nucleotides (the first two belong to the defining ones), the transfer of codons by tRNA and ribosomes into the amino acid sequence.

Thanks to the process of transcription in the cell, information is transferred from DNA to protein: DNA - mRNA - protein. The genetic information contained in DNA and mRNA is contained in the sequence of nucleotides in the molecules. How is information transferred from the “language” of nucleotides to the “language” of amino acids? This translation is carried out using the genetic code. A code, or cipher, is a system of symbols for translating one form of information into another. The genetic code is a system for recording information about the sequence of amino acids in proteins using the sequence of nucleotides in messenger RNA. How important exactly the sequence of arrangement of the same elements (four nucleotides in RNA) is for understanding and preserving the meaning of information can be seen in a simple example: by rearranging the letters in the word code, we get a word with a different meaning - doc. What properties does the genetic code have?

1. The code is triplet. RNA consists of 4 nucleotides: A, G, C, U. If we tried to designate one amino acid with one nucleotide, then 16 out of 20 amino acids would remain unencrypted. A two-letter code would encrypt 16 amino acids (from four nucleotides, 16 different combinations can be made, each of which contains two nucleotides). Nature has created a three-letter, or triplet, code. This means that each of the 20 amino acids is encoded by a sequence of three nucleotides, called a triplet or codon. From 4 nucleotides you can create 64 different combinations of 3 nucleotides each (4*4*4=64). This is more than enough to encode 20 amino acids and, it would seem, 44 codons are superfluous. However, it is not.

2. The code is degenerate. This means that each amino acid is encrypted by more than one codon (from two to six). The exceptions are the amino acids methionine and tryptophan, each of which is encoded by only one triplet. (This can be seen in the genetic code table.) The fact that methionine is encoded by a single OUT triplet has a special meaning that will become clear to you later (16).

3. The code is unambiguous. Each codon codes for only one amino acid. In all healthy people, in the gene carrying information about the beta chain of hemoglobin, the triplet GAA or GAG, I in sixth place, encodes glutamic acid. In patients with sickle cell anemia, the second nucleotide in this triplet is replaced by U. As can be seen from the table, the triplets GUA or GUG, which are formed in this case, encode the amino acid valine. You already know what such a replacement leads to from the section on DNA.

4. There are “punctuation marks” between genes. In printed text there is a period at the end of each phrase. Several related phrases make up a paragraph. In the language of genetic information, such a paragraph is an operon and its complementary mRNA. Each gene in the operon encodes one polypeptide chain - a phrase. Since in some cases several different polypeptide chains are sequentially created from the mRNA matrix, they must be separated from each other. For this purpose, there are three special triplets in the genetic code - UAA, UAG, UGA, each of which indicates the termination of the synthesis of one polypeptide chain. Thus, these triplets function as punctuation marks. They are found at the end of every gene. There are no "punctuation marks" inside the gene. Since the genetic code is similar to a language, let us analyze this property using the example of a phrase composed of triplets: once upon a time there was a quiet cat, that cat was dear to me. The meaning of what is written is clear, despite the absence of punctuation marks. If we remove one letter in the first word (one nucleotide in the gene), but also read in triplets of letters, then the result will be nonsense: ilb ylk ott ilb yls erm ilm no otk Violation of the meaning also occurs when one or two nucleotides are lost from a gene. The protein that will be read from such a damaged gene will have nothing in common with the protein that was encoded by the normal gene.

6. The code is universal. The genetic code is the same for all creatures living on Earth. In bacteria and fungi, wheat and cotton, fish and worms, frogs and humans, the same triplets code for the same amino acids.

When it is necessary to synthesize proteins, one serious problem arises before the cell - information in DNA is stored in the form of a sequence encoded 4 characters(nucleotides), and proteins consist of 20 different symbols(amino acids). If you try to use all four symbols at once to encode amino acids, you will get only 16 combinations, while there are 20 proteinogenic amino acids. There are not enough...

There is an example of brilliant thinking on this matter:

"Take, for example, a deck of playing cards, in which we pay attention only to the suit of the card. How many triplets of the same type can you get? Four, of course: three of hearts, three of diamonds, three of spades and three of clubs. How many triplets are there with two cards of the same suit and one of a different suit? Let's say we have four choices for the third card. Therefore we have 4x3 = 12 possibilities. In addition we have four triplets with all three different cards. So, 4+12+4=20, and this is the exact number of amino acids that we wanted to get" (George Gamow, eng. George Gamow, 1904-1968, Soviet and American theoretical physicist, astrophysicist and popularizer of science).

Indeed, experiments have proven that for each amino acid there are two mandatory nucleotides and a third variable, less specific (“ rocking effect"). If you take three characters out of four, you get 64 combinations, which greatly exceeds the number of amino acids. Thus, it is found that any amino acid is encoded by three nucleotides. This trio is called codon. As already mentioned, there are 64 options. Three of them do not code for any amino acid, these are the so-called " nonsense codons"(French) non-sense- nonsense) or "stop codons".

Genetic code

The genetic (biological) code is a way of encoding information about the structure of proteins in the form of a nucleotide sequence. It is designed to translate the four-character language of nucleotides (A, G, U, C) into the twenty-character language of amino acids. It has characteristic features:

  • Triplety– three nucleotides form a codon that codes for an amino acid. There are a total of 61 sense codons.
  • Specificity(or unambiguity) – each codon corresponds to only one amino acid.
  • Degeneracy– One amino acid can correspond to several codons.
  • Versatility– the biological code is the same for all types of organisms on Earth (however, there are exceptions in the mitochondria of mammals).
  • Colinearity– the sequence of codons corresponds to the sequence of amino acids in the encoded protein.
  • Non-overlapping– triplets do not overlap each other, being located next to each other.
  • No punctuation– there are no additional nucleotides or any other signals between the triplets.
  • Unidirectionality– during protein synthesis, codons are read sequentially, without skipping or going back.

However, it is clear that the biological code cannot express itself without additional molecules that perform a transition function or adapter function.

Adapter role of transfer RNAs

Transfer RNAs are the only intermediary between the 4-letter nucleic acid sequence and the 20-letter protein sequence.

Each transfer RNA has a specific triplet sequence in the anticodon loop ( anticodon) and can only attach an amino acid that matches this anticodon. It is the presence of one or another anticodon in tRNA that determines which amino acid will be included in the protein molecule, because neither the ribosome nor the mRNA recognizes the amino acid.

Thus, adapter role of tRNA is:

  1. in specific binding to amino acids,
  2. in specific, according to codon-anticodon interaction, binding to mRNA,
  3. and, as a result, in the incorporation of amino acids into the protein chain in accordance with the information in the mRNA.

The addition of an amino acid to tRNA is carried out by an enzyme aminoacyl-tRNA synthetase, which has specificity for two compounds simultaneously: any amino acid and its corresponding tRNA. The reaction requires two high-energy ATP bonds. The amino acid attaches to the 3" end of the tRNA acceptor loop through its α-carboxyl group, and the bond between the amino acid and the tRNA becomes macroergic. The α-amino group remains free.

Aminoacyl-tRNA synthesis reaction

Since there are about 60 different tRNAs, some amino acids have two or more tRNAs. Different tRNAs that add the same amino acid are called isoacceptor.

Today it is no secret to anyone that the life program of all living organisms is written on a DNA molecule. The easiest way to imagine a DNA molecule is as a long ladder. The vertical posts of this staircase are made up of molecules of sugar, oxygen and phosphorus. All the important operating information in the molecule is written on the rungs of the ladder - they consist of two molecules, each of which is attached to one of the vertical posts. These molecules—the nitrogenous bases—are called adenine, guanine, thymine, and cytosine, but they are usually simply designated by the letters A, G, T, and C. The shape of these molecules allows them to form bonds—complete ladders—only of a certain type. These are connections between the bases A and T and between the bases G and C (the pair thus formed is called "base pair"). There cannot be any other types of connections in a DNA molecule.

By going down the steps along one strand of a DNA molecule, you get a sequence of bases. It is this message in the form of a sequence of bases that determines the flow of chemical reactions in the cell and, consequently, the characteristics of the organism possessing this DNA. According to the central dogma of molecular biology, the DNA molecule encodes information about proteins, which, in turn, act as enzymes ( cm. Catalysts and enzymes) regulate all chemical reactions in living organisms.

The strict correspondence between the sequence of base pairs in a DNA molecule and the sequence of amino acids that make up protein enzymes is called the genetic code. The genetic code was deciphered soon after the discovery of the double-stranded structure of DNA. It was known that the newly discovered molecule informational, or matrix RNA (mRNA, or mRNA) carries information written on DNA. Biochemists Marshall W. Nirenberg and J. Heinrich Matthaei of the National Institutes of Health in Bethesda, near Washington, D.C., conducted the first experiments that led to clues to the genetic code.

They began by synthesizing artificial mRNA molecules consisting only of the repeating nitrogenous base uracil (which is an analogue of thymine, "T", and forms bonds only with adenine, "A", from the DNA molecule). They added these mRNAs to test tubes with a mixture of amino acids, and in each tube only one of the amino acids was labeled with a radioactive label. The researchers discovered that the mRNA they artificially synthesized initiated protein formation in only one test tube, which contained the labeled amino acid phenylalanine. So they established that the sequence “—U—U—U—” on the mRNA molecule (and, therefore, the equivalent sequence “—A—A—A—” on the DNA molecule) encodes a protein consisting only of the amino acid phenylalanine. This was the first step towards deciphering the genetic code.

Today it is known that three base pairs of a DNA molecule (this triplet is called codon) code for one amino acid in a protein. By performing experiments similar to the one described above, geneticists eventually deciphered the entire genetic code, in which each of the 64 possible codons corresponds to a specific amino acid.

In any cell and organism, all anatomical, morphological and functional features are determined by the structure of the proteins that comprise them. The hereditary property of the body is the ability to synthesize certain proteins. Amino acids are located in a polypeptide chain, on which biological characteristics depend.
Each cell has its own sequence of nucleotides in the polynucleotide chain of DNA. This is the genetic code of DNA. Through it, information about the synthesis of certain proteins is recorded. This article describes what the genetic code is, its properties and genetic information.

A little history

The idea that there might be a genetic code was formulated by J. Gamow and A. Down in the mid-twentieth century. They described that the nucleotide sequence responsible for the synthesis of a particular amino acid contains at least three units. Later they proved the exact number of three nucleotides (this is a unit of genetic code), which was called a triplet or codon. There are sixty-four nucleotides in total, because the acid molecule where RNA occurs is made up of four different nucleotide residues.

What is genetic code

The method of encoding the sequence of amino acid proteins due to the sequence of nucleotides is characteristic of all living cells and organisms. This is what the genetic code is.
There are four nucleotides in DNA:

  • adenine - A;
  • guanine - G;
  • cytosine - C;
  • thymine - T.

They are denoted by capital Latin or (in Russian-language literature) Russian letters.
RNA also contains four nucleotides, but one of them is different from DNA:

  • adenine - A;
  • guanine - G;
  • cytosine - C;
  • uracil - U.

All nucleotides are arranged in chains, with DNA having a double helix and RNA having a single helix.
Proteins are built on twenty amino acids, where they, located in a certain sequence, determine its biological properties.

Properties of the genetic code

Triplety. A unit of genetic code consists of three letters, it is triplet. This means that the twenty amino acids that exist are encoded by three specific nucleotides called codons or trilpets. There are sixty-four combinations that can be created from four nucleotides. This amount is more than enough to encode twenty amino acids.
Degeneracy. Each amino acid corresponds to more than one codon, with the exception of methionine and tryptophan.
Unambiguity. One codon codes for one amino acid. For example, in a healthy person's gene with information about the beta target of hemoglobin, a triplet of GAG and GAA encodes A in everyone who has sickle cell disease, one nucleotide is changed.
Collinearity. The sequence of amino acids always corresponds to the sequence of nucleotides that the gene contains.
The genetic code is continuous and compact, which means that it has no punctuation marks. That is, starting at a certain codon, continuous reading occurs. For example, AUGGGUGTSUAUAUGUG will be read as: AUG, GUG, TSUU, AAU, GUG. But not AUG, UGG and so on or anything else.
Versatility. It is the same for absolutely all terrestrial organisms, from humans to fish, fungi and bacteria.

Table

Not all available amino acids are included in the table presented. Hydroxyproline, hydroxylysine, phosphoserine, iodine derivatives of tyrosine, cystine and some others are absent, since they are derivatives of other amino acids encoded by m-RNA and formed after modification of proteins as a result of translation.
From the properties of the genetic code it is known that one codon is capable of encoding one amino acid. The exception is the genetic code that performs additional functions and encodes valine and methionine. The mRNA, being at the beginning of the codon, attaches t-RNA, which carries formylmethione. Upon completion of the synthesis, it is cleaved off and takes the formyl residue with it, transforming into a methionine residue. Thus, the above codons are the initiators of the synthesis of the polypeptide chain. If they are not at the beginning, then they are no different from the others.

Genetic information

This concept means a program of properties that is passed down from ancestors. It is embedded in heredity as a genetic code.
The genetic code is realized during protein synthesis:

  • messenger RNA;
  • ribosomal rRNA.

Information is transmitted through direct communication (DNA-RNA-protein) and reverse communication (medium-protein-DNA).
Organisms can receive, store, transmit it and use it most effectively.
Passed on by inheritance, information determines the development of a particular organism. But due to interaction with the environment, the reaction of the latter is distorted, due to which evolution and development occur. In this way, new information is introduced into the body.


The calculation of the laws of molecular biology and the discovery of the genetic code illustrated the need to combine genetics with Darwin's theory, on the basis of which a synthetic theory of evolution emerged - non-classical biology.
Darwin's heredity, variation and natural selection are complemented by genetically determined selection. Evolution is realized at the genetic level through random mutations and the inheritance of the most valuable traits that are most adapted to the environment.

Decoding the human code

In the nineties, the Human Genome Project was launched, as a result of which genome fragments containing 99.99% of human genes were discovered in the two thousandths. Fragments that are not involved in protein synthesis and are not encoded remain unknown. Their role remains unknown for now.

Last discovered in 2006, chromosome 1 is the longest in the genome. More than three hundred and fifty diseases, including cancer, appear as a result of disorders and mutations in it.

The role of such studies cannot be overestimated. When they discovered what the genetic code is, it became known according to what patterns development occurs, how the morphological structure, psyche, predisposition to certain diseases, metabolism and defects of individuals are formed.