Coding of biological information. Genetic code as a way to record hereditary information

Nikitin A.V.

Challenges in Understanding the DNA Coding System


Yes, I must admit that I was wrong. Biologists are concerned about the coding of DNA information. Even more. And there is a technocratic approach to this problem. It may not be exactly what I wanted, but... there is an interest in finding the truth. And this is the main point.

Petr Petrovich Garyaev sent me his latest monograph for study and understanding, for which special thanks to him.

But along with new information, new questions arose. I will try to talk about some of them in this article.

We write two, one in our minds...

We have already noted the fuzzy following of triplets during protein translation. P.P. Garyaev is also exploring the same question. Here is a visible contradiction:

“The accuracy of encoding protein amino acid sequences in this model strangely coexists with the double degeneracy of the proposed “code” along the lines of excess transfer RNA (tRNA) compared to the number of amino acids and ambiguous codon-anticodon correspondence, when only two (and not three) nucleotides of mRNA triplets precise pairing with the anticodon pair of tRNA nucleotides is necessary, and at the third nucleotide nature allows incorrect pairing, the so-called “wobble” (from the English word “wobble” - swinging) according to F. Crick’s hypothesis. This means that some anticodons can “recognize” more than one codon depending on which base is at the 1st position of the anticodon, corresponding to the 3rd position of the nucleotide, given their antiparallel complementary interaction. “Recognition” of this kind is “wrong” if we follow the paradigm of the genetic code, since non-canonical base pairs “Adenine-Guanine”, “Uracil-Cytosine” and others with energetically unfavorable hydrogen bonds arise. The “code,” especially the mitochondrial one, becomes so degenerate, and the logically following arbitrariness of the inclusion of amino acids in the peptide chain is so great that the very concept of genetic coding seems to disappear.”


The question is posed:

“The accuracy of protein synthesis is evolutionarily conservative and high, but can it be achieved by this kind of “secret writing”, when the “sign” (codon) and the “designated” (amino acid) are not always isomorphic, not unambiguous? If we adhere to the old dogma of the genetic code, it is logical to think that two different amino acids, encrypted by two identical (the third is not important) nucleotides of the mRNA codons, will be equally likely to be included in the peptide chain, i.e. accidentally. And there are six such paired ambiguities even in the non-mitochondrial code, not counting two more at stop codons (they are also “nonsense” or meaningless). So, is there an “indulgence of permission” for frequent and random amino acid substitutions during protein synthesis? However, it is known that such random substitutions in most cases have the most negative consequences for the body (sickle anemia, thalassemia, etc.). There is an obvious contradiction: accuracy (unambiguity) of the “sign-signified” (codon-amino acid) relationship is needed, but the code invented by people does not provide it.”

Explanation of the essence of the contradictions and the proposed solution:

“It can be seen that pairs of different amino acids are encrypted by identical significant doublets of codon nucleotides (“wobbling” little significant, according to Crick, and generally unreadable, according to Lagerkvist, nucleotides are shifted to the index). In linguistic terms, this phenomenon is called homonymy, when the same words have different meanings (for example, the Russian words “bow”, “braid” or the English “box”, “ring”, etc.). On the other hand, redundant different codons designating the same amino acids have long been considered synonymous.”

“...For greater illustrative purposes, we present a table of the genetic code presented by Lagerquist and rearranged by him into codon families, focusing on the first two working nucleotides:

From Table 1. It can be seen that the same amino acid can be encoded by four codon families. For example, the CU family four encodes leucine. The four of the GU family encode valine, UC – serine, CC – proline, AC – tryptophan, GC – alanine, CG – arginine, GG – glycine. This is a fact of degeneracy lying on the surface, and immediately noticed, i.e. information redundancy of the code. If we borrow the concepts and terms of linguistics for the protein code, which has long been universally and easily accepted, then the degeneracy of the code can be understood as synonymy. This was also unanimously adopted. In other words, the same object, for example, an amino acid, has several codes - codons. Synonymy does not pose any danger to the accuracy of protein biosynthesis. On the contrary, such redundancy is good because it increases the reliability of the translational ribosomal “machine”.”

I added a little color variation to the table to make it clear what we're talking about. Synonymous fours are highlighted in yellow. There are 8 such fours in total. Homonymous fours had to be divided into three categories, according to the degree of diversity. Further:

“... However, Table 1 also shows another, fundamental, genolinguistic phenomenon, seemingly unnoticed or ignored. This phenomenon is revealed in the fact that in some codon families, four codons, or more precisely, their significant identical twos of nucleotides, encrypt not one, but two different amino acids, as well as stop codons. Thus, the doublet UU family encodes phenylalanine and leucine, AU – isoleucine and methionine, UA – tyrosine, Och and Amb stop codons, CA – histidine and glycine, AA – asparagine and lysine, GA – aspartic and glutamine, UG – cysteine, tryptophan and Umb stop codon, AG – serine and arginine. Continuing linguistic analogies, let's call this phenomenon HOMONYMY of the first two coding nucleotides in some codon families.

Unlike synonymy, homonymy is potentially dangerous, as Lagerkvist noted, although he did not introduce the term-concept of “homonymy” as applied to the protein code. This situation, it seems, should really lead to ambiguity in the coding of amino acids and stop signals: the same codon doublet, within some families identified by Lagerquist, encodes two different amino acids or is “different stop”.

It is fundamentally important to understand: if code synonymy is a blessing (excess information), then homonymy is a potential evil (uncertainty, ambiguity of information). But this is an imaginary evil, since the protein synthesizing apparatus easily bypasses this difficulty, which will be discussed below. If you automatically follow the table (model) of the genetic code, then evil becomes not imaginary, but real. And then it is obvious that the homonymous code vector leads to errors in protein synthesis, since the ribosomal protein synthesizing apparatus, each time encountering one or another homonymous doublet and guided by the “two out of three” reading rule, must select one and only one amino acid from two different ones, but encoded by ambiguously identical homonym doublets.

Consequently, the 3'-nucleotides in codons and the 5'-nucleotides in anticodons paired with them do not have a gene-sign character and play the role of “steric crutches” filling the “empty spaces” in codon-anticodon pairs. In short, the 5'-nucleotides in anticodons are random, “wobble” - from the English “wobble” (swing, oscillation, wobble). This is the essence of the Wobble hypothesis.”

The essence is stated quite clearly. No translation required. The problem is clear.

Stop codons and start codons, they are highlighted in bold in the table, also do not always work unambiguously, but depending on something..., as biologists believe, on the context.

“Let us continue our analysis of the seminal work of Crick and Nirenberg, which postulates the concept of the genetic code.

P.142 -143: “... so far all experimental data have been in good agreement with the general assumption that information is read in triplets of bases, starting from one end of the gene. However, we would get the same results if the information was read in groups of four or even more bases” or “...groups containing a multiple of three bases.” This position is almost forgotten or not understood, but it is here that doubt is visible whether the code is necessarily triplet. And no less important, it predicts a future understanding of DNA and RNA texts as semantic fractal formations akin to natural languages, as demonstrated in our research.”

With 4 different bases in the DNA code system, reading groups can only be 3 or 4 bases long. 4 bases when read in pairs give only 16 possible combinations. Lacks. But how many: 3 or 4 bases in the reading group is impossible to establish mathematically. Because all possible combinations will be used one way or another. Or 64 for a triplet, or 256 for a tetraplet.

By increasing the code reading area by “groups containing a multiple of three bases,” the number of possible code combinations will increase unlimitedly. Just what does this give us? If you focus on the coding of amino acids, then... nothing. And this is in no way compatible with the doublet approach of biologists.

But, most importantly, in this quote for the first time, although implicitly, a “reading zone” of information that does not correspond to the triplet appeared. A triplet is one thing, but a reading zone is another. And one may not coincide with the other. Very important note.

In fact, the swing theory proposes that only the first two bases are considered the codon reading zone. Those. in this case, it is proposed to recognize that the reading area is smaller than the encoding area.

Now let's consider the reverse approach:

“Some mRNAs contain signals to change the reading frame. Some mRNAs contain stop codons in the translated region, but these codons are successfully bypassed by changing the reading frame before or directly on them. The frame can shift by -1, +1 and + 2. There are special signals in mRNA that change the reading frame. Thus, a translation frameshift of -1 on retroviral RNA occurs at a specific heptanucleotide sequence in front of the hairpin structure in the mRNA (Fig. 5c). For a +1 frameshift on the bacterial termination factor RF-2 mRNA, the nucleotide sequence at the shift site (UGA codon), the subsequent codon, and the preceding sequence complementary to the 3"-terminal sequence of the ribosomal RNA (analogous to the Shine-Dalgarno sequence) are important. (Fig. 5, d)".

The quote has already been given earlier, but now let’s look at its content more carefully. What is meant by the term reading frame? This concept comes from the hoary antiquity of computer technology, when the area for reading information from a punched tape or punched card was limited by an opaque frame in order to reduce the risk of errors when reading information with a light flux onto a photodetector through holes in the card or tape, marking lines knocked out in the right places. The principle of reading is long gone, but the term remains. Since the concept of a reading frame is clear to all biologists, it apparently means the reading zone of only one base from a triplet. And by “reading frame shift” we must understand that at +1, the base following the last element of the triplet is read, and -1, that the base before the first element of the same triplet is read. Which base pair remains the basis in the read triplet? This is not specified...

But it seems that not everyone understands the reading frame, as in this case. If the concept of a reading frame is understood as a frame delimiting 3 bases, then with a shift of +2, 1 element remains from the readable triplet, and two from the neighboring one.

So what reading frame are we talking about? Well, yes, okay, let it remain unclear for now...

But in any case, then these bases, already read by the frame, will be read again when the frame returns to its place and the ribosome moves on to reading the next triplet... but what about the non-overlapping code?

In this case, biologists' mechanistic approach to estimating changes in triplet readout positions does not take into account the actual size of what they are talking about. The terminology is clearly misleading. How they themselves figure it out later is unclear. Obviously, no “frame” is moving anywhere...

The selection of the required positions in the reading area moves. And if we add the maximum reading frame shifts listed above with the length of the readable codon, we get: 2+3+2 = 7. Thus, the total width of the ribosome reading zone is already 7 bases. The ribosome selects a triplet from 7 possible bases. How? This is another question...

But something else is more important to us. Now we can really estimate that the zone for reading information from RNA can be larger than a triplet and consist of 7 or more bases, while only three bases are fixed as the necessary reading positions. What are the other positions? Perhaps this is the very “context” that changes the options for reading the triplet. Homonemic, according to the terminology of P.P. Garyaev.

Of course, this is only one of many special cases of understanding the multifaceted concept of context. But... at least it allows us to understand something without resorting to higher philosophical generalizations. At a very real level of mechanistic understanding.

About the alphabet of cellular texts.

The question is, of course, interesting...

The understanding of DNA bases as letters of some cellular alphabet has been adopted by biologists for a long time. Hence the emergence of the concept of semantic context in the assessment of triplet coding, and the search for a meaningful approach of the cell to this coding, and the gradual transition to the Higher Mind, which wrote this Book of Life...

Only now, with the exact indication of the letters of this alphabet, disagreements arise all the time. What are the letters? Bases (A, T, C, G), codons composed of them, or amino acids in the composition of the protein obtained during translation?

There are 4 bases, 20 amino acids, 64 codons, what should we take as a basis?

Everyone talks about the need for linguistic evaluation of sequences of DNA, RNA and protein molecules, regardless of their understanding of the letters of the cellular alphabet. Biologists are required to approach DNA information as a semantic text with an understanding of the context applicable for literary evaluation. Thus, it is assumed that the language under study has all the attributes of a developed literary language and an appropriate approach is needed to assess its multi-semantic information content.

Wonderful. And yet, where are the letters? How was this literary text written that requires such close attention from linguists? So far, within the framework of the same mechanistic approach...

Bases or nucleotides? Looks like no. The majority of biologists agree with this. 4 reasons for creating a literary text are not enough. Moreover, in the presence of sequence continuity throughout the DNA.

With the codon, as a letter of this alphabet, difficulties arise immediately. Where is it, this codon, on DNA and RNA, how to find it? Only a ribosome can do this, and then only through direct contact. And what kind of compound letters are these, from triplets? Difficult to understand. Nevertheless, this understanding of codons as letters of the cellular alphabet has many supporters.

Mistake amino acids for letters of the alphabet? Yes, the majority agrees with this. But then protein, not DNA, becomes the Book of Life. In a protein there is a semantic context, but in DNA, it turns out, maybe not? Or it will be, but different, different from protein...

And therefore, there is a requirement to evaluate both DNA and protein from the standpoint of semantic context, but there is no clarification of what and how to evaluate.

In this situation, P.P. Garyaev proposed, including linguistically, to evaluate not DNA and protein, but their holographic three-dimensional “portraits.” A very strong position, I must admit. And very productive...

But with the alphabet of the cell, with a mechanistic, already familiar approach, then it is completely incomprehensible. Does he exist, or does he not exist at all, and is this concept only an allegory?

Biologists do not give clarifications. But they stubbornly continue to apply this concept. Everyone has their own understanding...

About the original coding system.

It is about the original one, which was perhaps at the stage of division of cells into prokaryotes and eukaryotes. Now it is hidden by numerous overlaps and deviations in both. Millions of years of evolution have not passed without a trace.

But still…

DNA was not always a repository of information; previously, RNA could play this role. It completely replaced protein at some stage. Numerous studies show this. And DNA and RNA bases were not always 4, but we are not talking about that now...

But at some stage of development, an information encoding system appeared, which then fully satisfied all the requirements of the information and logical structure for controlling cell processes.

The same classic that everyone points to and immediately begins to refute...

Information array – DNA, RNA. A sequence consisting of a combination of 4 nucleotides: A,T(U),C,G.

The information reading step is 1 nucleotide.

The method of reading information is sequential.

The volume of a single reading is a triplet.

No logical system can count. But she is able to count to one. This is already a lot further. And differentiate different units in two neighboring pairs do the same. And if the axis of symmetry is real, then it is quite capable of determining the logical states of neighboring positions relative to such an axis. But it was apparently very difficult at that stage to further increase the reading area without counting.

And therefore, at that stage - A triplet is the maximum possible form of a system's information unit. Discharge on the axis of symmetry, discharge on the right and discharge on the left.

Three different accounting units...even for step-by-step reading...that's a lot.

The DNA and RNA information coding system uses 4 possible logical states, triplet reading. The complexity for the cell is extreme.

How to prove that a code is triplet? I have already shown this more than once. Let’s write it again: Bases – 4, amino acids – 20, codons or triplets – 64.

The math is simple: 64/3 = 21

This number of non-overlapping triplets can be obtained with a fixation step through one base. These are 20 triplets for amino acids and one STOP codon.

On the other hand: 4 3 = 64, these are the same 21x3 = 63, these are 60 combinations of triplets, 3 stop codons and a start codon, closing the variational set. This is just mathematics, but... it shows that initially, three bases in a row were actually read - a codon at a step of 1 base. This determined the number of amino acids used - 20. Thus, it is still a triplet.

In this case, the degeneracy of the amino acid code in the triplet is clear. It arose from code overlap.

We misunderstand the emergence of codon degeneracy. This is not an expansion of the system’s capabilities in encoding information, but “mistakes of its past.” This is an echo of the original coding system...

Information on the topic:

“P.153: “... one amino acid is encrypted by several codons. Such a code is called degenerate... this kind of degeneracy does not indicate any uncertainty in the construction of the protein molecule... it only means that a certain amino acid can be directed to the appropriate place in the chain of the protein molecule using a few code words.”

Of course, to encode any amino acid in DNA bases, one code triplet is sufficient. Moreover, with non-overlapping coding. Repeat one codon as many times as you like, and get as many molecules of the desired amino acid in the protein. It’s easy, simple, understandable, and energy costs are minimal.

The degeneracy of a triplet code is a necessary measure, directly related to the original method of reading the code. It just happened in the course of evolution.

The mechanism for the appearance of code degeneracy looks like this:

When reading triplets in a step of 1 base, only one sign of the triplet changes at each step, and two signs of the triplet remain constant. Only their positions shift synchronously. With two steps, the information of only one sign of the triplet remains unchanged, but it passes sequentially through all display positions.

Why do we need this?

With 3 coding characters, 2 characters are repeated at each step. And only one changes. In the next step, the second sign will also change. And one sign will remain unchanged along the path traveled. A complete change of signs will occur only after the third step. Only now the new triplet combination will not have the influence of previous combinations.

With a triplet step, each new triplet in formation does not depend on the previous one, but... such a step for such a reading system was then impossible.

And the formed DNA triplets turned out to be dependent on each other during reading.

Such a smooth flow of one triplet into another leads to a limitation in the ability to quickly use all permutations in the triplet. For the possible use of all 64 triplet variants, 64 * 3 = 192 single steps of reading DNA triplets are required. And vice versa, out of 64 steps of reading possible combinations, with sequential step-by-step reading of all codons, from the first to the 64th, there will be 42 repeats, and no more than 1/3 = 21 combinations will be unique. And another 1/3….

This is the answer why there are only 20 amino acids. It could be more, but the system for encoding and reading information does not allow it.

So the cell began to use additional codes from the existing 42 repetitions. She couldn’t do it any other way, because gaps in the broadcast are unacceptable. There is a code - any one, and the ribosome must perform the translation operation. Transitional variants from one independent triplet code to another quickly began to deal with the same 20 amino acids, but depending on the frequency of use. For one there are 6 codes, and for the other one is enough. We register this as code degeneracy.

It is clear that with the use of dependent codons, the base of transport tRNAs should also expand. And so it happened. In a full-scale system, the number of codons on the mRNA must match the number of anticodons on the tRNA. So, a large number of tRNAs only indicates that the system was originally formed in this way.

As we can see, the initial or initial coding system at the stage of the appearance of 4 nucleotides in DNA is clearly visible. Next came the layers of later evolutionary processes. And today we have...what we have.

Initial basic amino acid codes.

On the other hand, if you follow this path, then out of the 64 possible, you can choose some 21 combinations and apply them as the main ones. But which ones?

How could a cell choose? The simplest answer is based on the maximum symmetry of the triplet.

Let's apply the principle of symmetry in searching for the necessary combinations and check how correctly we understand the way of natural coding of amino acids in DNA. To do this, let's collect all the variants of symmetric codes in Table 2. Excellent result..., 15 out of 16 possible amino acids received symmetric codes.

But, there are still 5 amino acids left and STOP.

Apparently Nature walked the same path... and stumbled in the same place. All symmetrical options have been used, there is no room to expand the system, and there are not enough codes. What next option did she use to continue searching for codes?

Now repetitions and one additional element...

Eat. CAA, AAC, UGG, and here is the main Stop codon - UAA.

There are two more codons left to find...

GAC and AUG. The latter became the Start codon...

And the total number of main combinations used in DNA and RNA became 21. Table 2 reflects the search path for the main code designations.

But here, too, the evolutionary logic of development shows an interesting example. Only complete symmetries are used to the end and immediately. The remaining options were not used immediately and not completely. For example, for the amino acid Gly, the main codon GGG was used, and then GGU was added from an unused reserve...

The created coding reserves worked until the last minute. Today, all reserves have long been used up and the time has come to combine functions where possible. For example, for the Start codon. The search began for new ways to expand the capabilities of triplet coding. amino acids in RNA. This is probably how the selection of the main codes went. By symmetry and simplest permutations...

table 2

The logic of action is clear. We may have made a mistake in the sequence of actions, but this is not so important for now. Of course, these are just my variations on the theme; professionals probably know better whether this was how it really was or not, but still... it turned out interesting.

Ends don't meet...

Strange, ... symmetric codes can only be used with triplet reading, without overlap. This point forces us to take another look at the above mathematics of obtaining 20 amino acids for use in triplet coding. Obviously, one does not correspond to the other.

Mathematics shows the objective reality of the element-by-element movement of a ribosome along RNA. But such a widespread use of symmetries in amino acid coding also cannot be accidental, and points to triplets of independent reading.

It is possible that element-by-element reading of RNA information existed before triplet coding and for some time together with the appearance of triplets. It determined the amount of amino acids used.

But at some stage there was a leap in development. The coding system has been completely revised. Triplet independent reading forced us to re-encode the amino acids used based on symmetry. But evolution does not know how to discard old options...

There are already additional codes; we had to redistribute them among amino acids depending on the frequency of their use.

And a paradoxical picture emerged. The readout seems to be non-overlapping, and one codon is enough to encode an amino acid, but all 64 variants were used. The potential redundancy of coding is covered by the degeneracy of the codes. There is a calculated reserve, but in fact there is not. We have already seen how this happened.

Most likely, the rapid development of cellular ribosomes was a factor in the revision of the system. Ultimately, they determine the entire coding system and its application in the cellular organism.

It can be assumed that the information reading zone of the ribosome has long exceeded three digits and has gone far beyond these limits. It became possible to select and remember the information of the desired codon within a large information reading area. This made it possible to leave the ribosome with an element-by-element step, but the possibility of triplet reading in an independent mode was also realized. The ribosome somewhere acquired RAM.

The information reading zone for the ribosome, even in prokaryotes, as we see, has reached 7 nucleotides. And this is not the limit. If we take as a basis that ribosomes have two centers for translation or information reading, then their total area for information reading by one ribosome has already reached 14 nucleotides. Some sections of the codes are taken as triplets, and the rest constitutes the context...

And now…

And now everything is completely confused. According to biologists, the counting occurs in triplets, although no one explains how this happens. The immediate context is not taken into account. Comparing the RNA code sequence and the protein obtained from it is a very difficult task, and it is apparently impossible to clearly understand how the system has changed and what is taken into account during translation.

Moreover, biologists focus not on systematization, but on finding deviations from the system, thereby increasing the already vast variety of facts, and creating a puzzling unsolved problem for themselves. The confusion is complemented by the complete confusion of various deviations in the mechanisms for reading triplets of prokaryotes and eukaryotes into one big crossword puzzle... where they themselves seem to have become confused.

Why? They have different tasks. They work with biological objects, as is customary in their science. Therefore, the conclusions on the issues of RNA coding were reflected in the “swing” theory, and not in the system of principles of information reading and coding theory. They can be understood, but a way out must be found...

The technocratic approach to the problem of understanding DNA coding, proposed by biologists themselves, has not yet exhausted its capabilities. In fact, it hasn’t really been used yet. Only the terminology was used, but not the approach.

Perhaps the time has come to use machine analysis of DNA sequences, taking into account the expanded information reading area in relation to the coding triplet. Then the mechanism of action of the coding context closest to the triplet reading, and possibly also the programming elements of the protein translation process, memorized by the ribosome, will become clear. Such analysis is especially important for studying the untranslated regions of RNA and DNA. Since it is already clear that these are software elements of the coding system. All processes depend on them, including protein translation. The name “garbage” clearly doesn’t suit them at all...

And there cannot be “garbage” in the arrays of strategically important information stored in DNA. No information system can afford this.

The current level of development of computer technology makes it possible to solve these problems. Build an information management system in the cellular structure, clarify communication channels, establish key control elements and a signal system. Then at least the approximate level of technical complexity of this control system will be clear. So far, the only thing that is clear is that the ribosome plays a key role in it, but how technically complex is this universal cellular automaton? How does the technical complexity of the rest of the cell's executive mechanisms look against its background?

I haven't found any answers yet...

Literature:

  1. Garyaev P.P. Tertyshny G.G. Leonova E.A. Mologin A.V. Wave biocomputer functions of DNA. http://nature.web.ru/db/msg.html?mid=1157645&s
  2. Nikitin A.V., Reading and processing DNA information // “Academy of Trinitarianism”, M., El No. 77-6567, pub.16147, 08.11.2010

Nikitin A.V., Problems of understanding the DNA coding system // “Academy of Trinitarianism”, M., El No. 77-6567, pub.16181, 11.27.2010


Nucleotides DNA and RNA
  1. Purines: adenine, guanine
  2. Pyrimidine: cytosine, thymine (uracil)

Codon- a triplet of nucleotides encoding a specific amino acid.

tab. 1. Amino acids that are commonly found in proteins
Name Abbreviation
1. AlanineAla
2. ArginineArg
3. AsparagineAsn
4. Aspartic acidAsp
5. CysteineCys
6. Glutamic acidGlu
7. GlutamineGln
8. GlycineGly
9. HistidineHis
10. IsoleucineIle
11. LeucineLeu
12. LysineLys
13. MethionineMet
14. PhenylalaninePhe
15. ProlinePro
16. SeriesSer
17. ThreonineThr
18. TryptophanTrp
19. TyrosineTyr
20. ValinVal

The genetic code, also called the amino acid code, is a system for recording information about the sequence of amino acids in a protein using the sequence of nucleotide residues in DNA that contain one of 4 nitrogenous bases: adenine (A), guanine (G), cytosine (C) and thymine (T). However, since the double-stranded DNA helix is ​​not directly involved in the synthesis of the protein that is encoded by one of these strands (i.e., RNA), the code is written in RNA language, which contains uracil (U) instead of thymine. For the same reason, it is customary to say that a code is a sequence of nucleotides, and not pairs of nucleotides.

The genetic code is represented by certain code words, called codons.

The first code word was deciphered by Nirenberg and Mattei in 1961. They obtained an extract from E. coli containing ribosomes and other factors necessary for protein synthesis. The result was a cell-free system for protein synthesis, which could assemble proteins from amino acids if the necessary mRNA was added to the medium. By adding synthetic RNA consisting only of uracils to the medium, they found that a protein was formed consisting only of phenylalanine (polyphenylalanine). Thus, it was established that the triplet of nucleotides UUU (codon) corresponds to phenylalanine. Over the next 5-6 years, all codons of the genetic code were determined.

The genetic code is a kind of dictionary that translates text written with four nucleotides into protein text written with 20 amino acids. The remaining amino acids found in protein are modifications of one of the 20 amino acids.

Properties of the genetic code

The genetic code has the following properties.

  1. Triplety- Each amino acid corresponds to a triple of nucleotides. It is easy to calculate that there are 4 3 = 64 codons. Of these, 61 are semantic and 3 are nonsense (termination, stop codons).
  2. Continuity(no separating marks between nucleotides) - absence of intragenic punctuation marks;

    Within a gene, each nucleotide is part of a significant codon. In 1961 Seymour Benzer and Francis Crick experimentally proved the triplet nature of the code and its continuity (compactness) [show]

    The essence of the experiment: “+” mutation - insertion of one nucleotide. "-" mutation - loss of one nucleotide.

    A single mutation ("+" or "-") at the beginning of a gene or a double mutation ("+" or "-") spoils the entire gene.

    A triple mutation ("+" or "-") at the beginning of a gene spoils only part of the gene.

    A quadruple “+” or “-” mutation again spoils the entire gene.

    The experiment was carried out on two adjacent phage genes and showed that

    1. the code is triplet and there is no punctuation inside the gene
    2. there are punctuation marks between genes
  3. Presence of intergenic punctuation marks- the presence among triplets of initiating codons (they begin protein biosynthesis), and terminator codons (indicating the end of protein biosynthesis);

    Conventionally, the AUG codon, the first after the leader sequence, also belongs to punctuation marks. It functions as a capital letter. In this position it encodes formylmethionine (in prokaryotes).

    At the end of each gene encoding a polypeptide there is at least one of 3 stop codons, or stop signals: UAA, UAG, UGA. They terminate the broadcast.

  4. Colinearity- correspondence of the linear sequence of codons of mRNA and amino acids in the protein.
  5. Specificity- each amino acid corresponds only to certain codons that cannot be used for another amino acid.
  6. Unidirectionality- codons are read in one direction - from the first nucleotide to the subsequent ones
  7. Degeneracy or redundancy, - one amino acid can be encoded by several triplets (amino acids - 20, possible triplets - 64, 61 of them are semantic, i.e., on average, each amino acid corresponds to about 3 codons); the exceptions are methionine (Met) and tryptophan (Trp).

    The reason for the degeneracy of the code is that the main semantic load is carried by the first two nucleotides in the triplet, and the third is not so important. From here code degeneracy rule : If two codons have the same first two nucleotides and their third nucleotides belong to the same class (purine or pyrimidine), then they code for the same amino acid.

    However, there are two exceptions to this ideal rule. This is the AUA codon, which should correspond not to isoleucine, but to methionine, and the UGA codon, which is a stop codon, whereas it should correspond to tryptophan. The degeneracy of the code obviously has an adaptive significance.

  8. Versatility- all of the above properties of the genetic code are characteristic of all living organisms.
    Codon Universal code Mitochondrial codes
    Vertebrates Invertebrates Yeast Plants
    U.G.A.STOPTrpTrpTrpSTOP
    AUAIleMetMetMetIle
    CUALeuLeuLeuThrLeu
    A.G.A.ArgSTOPSerArgArg
    AGGArgSTOPSerArgArg

    Recently, the principle of code universality has been shaken in connection with the discovery by Berrell in 1979 of the ideal code of human mitochondria, in which the rule of code degeneracy is satisfied. In the mitochondrial code, the UGA codon corresponds to tryptophan, and AUA to methionine, as required by the code degeneracy rule.

    Perhaps at the beginning of evolution, all simple organisms had the same code as mitochondria, and then it underwent slight deviations.

  9. Non-overlapping- each of the triplets of the genetic text is independent of each other, one nucleotide is included in only one triplet; In Fig. shows the difference between overlapping and non-overlapping code.

    In 1976 The DNA of phage φX174 was sequenced. It has single-stranded circular DNA consisting of 5375 nucleotides. The phage was known to encode 9 proteins. For 6 of them, genes located one after another were identified.

    It turned out that there is an overlap. Gene E is located entirely within gene D. Its start codon appears as a result of a frame shift of one nucleotide. Gene J begins where gene D ends. The start codon of gene J overlaps with the stop codon of gene D as a result of a two-nucleotide shift. The construction is called a “reading frameshift” by a number of nucleotides not a multiple of three. To date, overlap has only been shown for a few phages.

  10. Noise immunity- the ratio of the number of conservative substitutions to the number of radical substitutions.

    Nucleotide substitution mutations that do not lead to a change in the class of the encoded amino acid are called conservative. Nucleotide substitution mutations that lead to a change in the class of the encoded amino acid are called radical.

    Since the same amino acid can be encoded by different triplets, some substitutions in triplets do not lead to a change in the encoded amino acid (for example, UUU -> UUC leaves phenylalanine). Some substitutions change an amino acid to another from the same class (non-polar, polar, basic, acidic), other substitutions also change the class of the amino acid.

    In each triplet, 9 single substitutions can be made, i.e. There are three ways to choose which position to change (1st or 2nd or 3rd), and the selected letter (nucleotide) can be changed to 4-1=3 other letters (nucleotide). The total number of possible nucleotide substitutions is 61 by 9 = 549.

    By direct calculation using the genetic code table, you can verify that of these: 23 nucleotide substitutions lead to the appearance of codons - translation terminators. 134 substitutions do not change the encoded amino acid. 230 substitutions do not change the class of the encoded amino acid. 162 substitutions lead to a change in amino acid class, i.e. are radical. Of the 183 substitutions of the 3rd nucleotide, 7 lead to the appearance of translation terminators, and 176 are conservative. Of the 183 substitutions of the 1st nucleotide, 9 lead to the appearance of terminators, 114 are conservative and 60 are radical. Of the 183 substitutions of the 2nd nucleotide, 7 lead to the appearance of terminators, 74 are conservative, 102 are radical.


In the body's metabolism leading role belongs to proteins and nucleic acids.
Protein substances form the basis of all vital cell structures, have an unusually high reactivity, and are endowed with catalytic functions.
Nucleic acids are part of the most important organ of the cell - the nucleus, as well as the cytoplasm, ribosomes, mitochondria, etc. Nucleic acids play an important, primary role in heredity, variability of the body, and in protein synthesis.

Plan synthesis protein is stored in the cell nucleus, and direct synthesis occurs outside the nucleus, so it is necessary delivery service encoded plan from the nucleus to the site of synthesis. This delivery service is performed by RNA molecules.

The process starts at core cells: part of the DNA “ladder” unwinds and opens. Thanks to this, the RNA letters form bonds with the open DNA letters of one of the DNA strands. The enzyme transfers the RNA letters to join them into a strand. This is how the letters of DNA are “rewritten” into the letters of RNA. The newly formed RNA chain is separated, and the DNA “ladder” twists again. The process of reading information from DNA and synthesizing it using its RNA matrix is ​​called transcription , and the synthesized RNA is called messenger or mRNA .

After further modifications, this type of encoded mRNA is ready. mRNA comes out of the nucleus and goes to the site of protein synthesis, where the letters of the mRNA are deciphered. Each set of three i-RNA letters forms a “letter” that represents one specific amino acid.

Another type of RNA finds this amino acid, captures it with the help of an enzyme, and delivers it to the site of protein synthesis. This RNA is called transfer RNA, or t-RNA. As the mRNA message is read and translated, the chain of amino acids grows. This chain twists and folds into a unique shape, creating one type of protein. Even the protein folding process is remarkable: it takes a computer to calculate everything options folding an average-sized protein consisting of 100 amino acids would take 1027 (!) years. And it takes no more than one second to form a chain of 20 amino acids in the body, and this process occurs continuously in all cells of the body.

Genes, genetic code and its properties.

About 7 billion people live on Earth. Apart from the 25-30 million pairs of identical twins, genetically all people are different : everyone is unique, has unique hereditary characteristics, character traits, abilities, and temperament.

These differences are explained differences in genotypes- sets of genes of the organism; Each one is unique. The genetic characteristics of a particular organism are embodied in proteins - therefore, the structure of the protein of one person differs, although very slightly, from the protein of another person.

It does not mean that no two people have exactly the same proteins. Proteins that perform the same functions may be the same or differ only slightly by one or two amino acids from each other. But does not exist on Earth of people (with the exception of identical twins) who would have all their proteins are the same .

Protein Primary Structure Information encoded as a sequence of nucleotides in a section of a DNA molecule, gene – a unit of hereditary information of an organism. Each DNA molecule contains many genes. The totality of all the genes of an organism constitutes it genotype . Thus,

Gene is a unit of hereditary information of an organism, which corresponds to a separate section of DNA

Coding of hereditary information occurs using genetic code , which is universal for all organisms and differs only in the alternation of nucleotides that form genes and encode proteins of specific organisms.

Genetic code consists of triplets (triplets) of DNA nucleotides, combined in different sequences (AAT, HCA, ACG, THC, etc.), each of which encodes a specific amino acid (which will be built into the polypeptide chain).

Actually code counts sequence of nucleotides in an mRNA molecule , because it removes information from DNA (process transcriptions ) and translates it into a sequence of amino acids in the molecules of synthesized proteins (the process broadcasts ).
The composition of mRNA includes nucleotides A-C-G-U, the triplets of which are called codons : a triplet on DNA CGT on i-RNA will become a triplet GCA, and a triplet DNA AAG will become a triplet UUC. Exactly mRNA codons the genetic code is reflected in the record.

Thus, genetic code - a unified system for recording hereditary information in nucleic acid molecules in the form of a sequence of nucleotides . The genetic code is based on the use of an alphabet consisting of only four letters-nucleotides, distinguished by nitrogenous bases: A, T, G, C.

Basic properties of the genetic code:

1. Genetic code triplet. A triplet (codon) is a sequence of three nucleotides encoding one amino acid. Since proteins contain 20 amino acids, it is obvious that each of them cannot be encoded by one nucleotide ( Since there are only four types of nucleotides in DNA, in this case 16 amino acids remain uncoded). Two nucleotides are also not enough to encode amino acids, since in this case only 16 amino acids can be encoded. This means that the smallest number of nucleotides encoding one amino acid must be at least three. In this case, the number of possible nucleotide triplets is 43 = 64.

2. Redundancy (degeneracy) The code is a consequence of its triplet nature and means that one amino acid can be encoded by several triplets (since there are 20 amino acids and 64 triplets), with the exception of methionine and tryptophan, which are encoded by only one triplet. In addition, some triplets perform specific functions: in an mRNA molecule, triplets UAA, UAG, UGA are stop codons, i.e. stop-signals that stop the synthesis of the polypeptide chain. The triplet corresponding to methionine (AUG), located at the beginning of the DNA chain, does not code for an amino acid, but performs the function of initiating (exciting) reading.

3. Unambiguity code - at the same time as redundancy, code has the property unambiguity : each codon matches only one a certain amino acid.

4. Collinearity code, i.e. nucleotide sequence in a gene exactly corresponds to the sequence of amino acids in a protein.

5. Genetic code non-overlapping and compact , i.e. does not contain “punctuation marks”. This means that the reading process does not allow the possibility of overlapping columns (triplets), and, starting at a certain codon, reading proceeds continuously, triplet after triplet, until stop-signals ( stop codons).

6. Genetic code universal , i.e., the nuclear genes of all organisms encode information about proteins in the same way, regardless of the level of organization and systematic position of these organisms.

Exist genetic code tables for decryption codons mRNA and construction of chains of protein molecules.

Matrix synthesis reactions.

Reactions unknown in inanimate nature occur in living systems - matrix synthesis reactions.

The term "matrix" in technology they designate a mold used for casting coins, medals, and typographic fonts: the hardened metal exactly reproduces all the details of the mold used for casting. Matrix synthesis resembles casting on a matrix: new molecules are synthesized in exact accordance with the plan laid down in the structure of existing molecules.

The matrix principle lies at the core the most important synthetic reactions of the cell, such as the synthesis of nucleic acids and proteins. These reactions ensure the exact, strictly specific sequence of monomer units in the synthesized polymers.

There is directional action going on here. pulling monomers to a specific location cells - into molecules that serve as a matrix where the reaction takes place. If such reactions occurred as a result of random collisions of molecules, they would proceed infinitely slowly. The synthesis of complex molecules based on the template principle is carried out quickly and accurately. The role of the matrix macromolecules of nucleic acids play in matrix reactions DNA or RNA .

Monomeric molecules from which the polymer is synthesized - nucleotides or amino acids - in accordance with the principle of complementarity, are located and fixed on the matrix in a strictly defined, specified order.

Then it happens "cross-linking" of monomer units into a polymer chain, and the finished polymer is discharged from the matrix.

After that matrix is ​​ready to the assembly of a new polymer molecule. It is clear that just as on a given mold only one coin or one letter can be cast, so on a given matrix molecule only one polymer can be “assembled”.

Matrix reaction type- a specific feature of the chemistry of living systems. They are the basis of the fundamental property of all living things - its ability to reproduce its own kind.

Template synthesis reactions

1. DNA replication - replication (from Latin replicatio - renewal) - the process of synthesis of a daughter molecule of deoxyribonucleic acid on the matrix of the parent DNA molecule. During the subsequent division of the mother cell, each daughter cell receives one copy of a DNA molecule that is identical to the DNA of the original mother cell. This process ensures that genetic information is accurately passed on from generation to generation. DNA replication is carried out by a complex enzyme complex consisting of 15-20 different proteins, called replisome . The material for synthesis is free nucleotides present in the cytoplasm of cells. The biological meaning of replication lies in the accurate transfer of hereditary information from the mother molecule to the daughter molecules, which normally occurs during the division of somatic cells.

A DNA molecule consists of two complementary strands. These chains are held together by weak hydrogen bonds that can be broken by enzymes. The DNA molecule is capable of self-duplication (replication), and on each old half of the molecule a new half is synthesized.
In addition, an mRNA molecule can be synthesized on a DNA molecule, which then transfers the information received from DNA to the site of protein synthesis.

Information transfer and protein synthesis proceed according to a matrix principle, comparable to the operation of a printing press in a printing house. Information from DNA is copied many times. If errors occur during copying, they will be repeated in all subsequent copies.

True, some errors when copying information with a DNA molecule can be corrected - the process of error elimination is called reparation. The first of the reactions in the process of information transfer is the replication of the DNA molecule and the synthesis of new DNA chains.

2. Transcription (from Latin transcriptio - rewriting) - the process of RNA synthesis using DNA as a template, occurring in all living cells. In other words, it is the transfer of genetic information from DNA to RNA.

Transcription is catalyzed by the enzyme DNA-dependent RNA polymerase. RNA polymerase moves along the DNA molecule in the direction 3" → 5". Transcription consists of stages initiation, elongation and termination . The unit of transcription is an operon, a fragment of a DNA molecule consisting of promoter, transcribed part and terminator . mRNA consists of a single chain and is synthesized on DNA in accordance with the rule of complementarity with the participation of an enzyme that activates the beginning and end of the synthesis of the mRNA molecule.

The finished mRNA molecule enters the cytoplasm onto ribosomes, where the synthesis of polypeptide chains occurs.

3. Broadcast (from lat. translation- transfer, movement) - the process of protein synthesis from amino acids on a matrix of information (messenger) RNA (mRNA, mRNA), carried out by the ribosome. In other words, this is the process of translating the information contained in the sequence of nucleotides of mRNA into the sequence of amino acids in the polypeptide.

4. Reverse transcription is the process of forming double-stranded DNA based on information from single-stranded RNA. This process is called reverse transcription, since the transfer of genetic information occurs in the “reverse” direction relative to transcription. The idea of ​​reverse transcription was initially very unpopular because it contradicted the central dogma of molecular biology, which assumed that DNA is transcribed into RNA and then translated into proteins.

However, in 1970, Temin and Baltimore independently discovered an enzyme called reverse transcriptase (revertase) , and the possibility of reverse transcription was finally confirmed. In 1975, Temin and Baltimore were awarded the Nobel Prize in Physiology or Medicine. Some viruses (such as the human immunodeficiency virus, which causes HIV infection) have the ability to transcribe RNA into DNA. HIV has an RNA genome that is integrated into DNA. As a result, the DNA of the virus can be combined with the genome of the host cell. The main enzyme responsible for the synthesis of DNA from RNA is called reversease. One of the functions of reversease is to create complementary DNA (cDNA) from the viral genome. The associated enzyme ribonuclease cleaves RNA, and reversease synthesizes cDNA from the DNA double helix. The cDNA is integrated into the host cell genome by integrase. The result is synthesis of viral proteins by the host cell, which form new viruses. In the case of HIV, apoptosis (cell death) of T-lymphocytes is also programmed. In other cases, the cell may remain a distributor of viruses.

The sequence of matrix reactions during protein biosynthesis can be represented in the form of a diagram.

Thus, protein biosynthesis- this is one of the types of plastic exchange, during which hereditary information encoded in DNA genes is implemented into a specific sequence of amino acids in protein molecules.

Protein molecules are essentially polypeptide chains made up of individual amino acids. But amino acids are not active enough to combine with each other on their own. Therefore, before they combine with each other and form a protein molecule, amino acids must activate . This activation occurs under the action of special enzymes.

As a result of activation, the amino acid becomes more labile and, under the action of the same enzyme, binds to t- RNA. Each amino acid corresponds to a strictly specific t- RNA, which finds “its” amino acid and transfers it into the ribosome.

Consequently, various activated amino acids combined with their own T- RNA. The ribosome is like conveyor to assemble a protein chain from various amino acids supplied to it.

Simultaneously with t-RNA, on which its own amino acid “sits,” “ signal"from the DNA that is contained in the nucleus. In accordance with this signal, one or another protein is synthesized in the ribosome.

The directing influence of DNA on protein synthesis is not carried out directly, but with the help of a special intermediary - matrix or messenger RNA (m-RNA or mRNA), which synthesized into the nucleus e under the influence of DNA, so its composition reflects the composition of DNA. The RNA molecule is like a cast of the DNA form. The synthesized mRNA enters the ribosome and, as it were, transfers it to this structure plan- in what order must the activated amino acids entering the ribosome be combined with each other in order for a specific protein to be synthesized? Otherwise, genetic information encoded in DNA is transferred to mRNA and then to protein.

The mRNA molecule enters the ribosome and stitches her. That segment of it that is currently located in the ribosome is determined codon (triplet), interacts in a completely specific manner with those that are structurally similar to it triplet (anticodon) in transfer RNA, which brought the amino acid into the ribosome.

Transfer RNA with its amino acid matches a specific codon of the mRNA and connects with him; to the next, neighboring section of mRNA another tRNA with a different amino acid is added and so on until the entire chain of i-RNA is read, until all the amino acids are reduced in the appropriate order, forming a protein molecule. And tRNA, which delivered the amino acid to a specific part of the polypeptide chain, freed from its amino acid and exits the ribosome.

Then, again in the cytoplasm, the desired amino acid can join it and again transfer it to the ribosome. In the process of protein synthesis, not one, but several ribosomes - polyribosomes - are involved simultaneously.

The main stages of the transfer of genetic information:

1. Synthesis on DNA as a template for mRNA (transcription)
2. Synthesis of a polypeptide chain in ribosomes according to the program contained in mRNA (translation) .

The stages are universal for all living beings, but the temporal and spatial relationships of these processes differ in pro- and eukaryotes.

U prokaryote transcription and translation can occur simultaneously because DNA is located in the cytoplasm. U eukaryotes transcription and translation are strictly separated in space and time: the synthesis of various RNAs occurs in the nucleus, after which the RNA molecules must leave the nucleus by passing through the nuclear membrane. The RNAs are then transported in the cytoplasm to the site of protein synthesis.

  • Ticket No. 13
  • Types of mutations:
  • Ticket number 15
  • 1. Mitotic cell cycle. Characteristics of periods. Mitosis, its biological significance. Problems of cell proliferation in medicine Cell cycle in tumors.
  • 2. Cytological method for diagnosing human chromosomal disorders. Biochemical method.
  • 3. Bull tapeworm. Systematic position, morphology, development cycle, laboratory diagnostics. Teniarinhosa.
  • 1. Methods for studying human heredity. Genealogical and twin methods, their significance for medicine.
  • 2. Lice, fleas. Systematic position, morphology, development, epidemiological significance, control methods.
  • 3. The subject of the fundamentals of human and animal biology and its place among other biomedical disciplines for a specialist in medical equipment.
  • Ticket number 17
  • 1. Genotype as a whole. Nuclear and cytoplasmic inheritance.
  • 2. The concept of type. The reality of the species. View structure. Type criteria.
  • 3. Ways to overcome tissue incompatibility. Artificial organs. Cloning of organisms: pros and cons.
  • Ticket number 18
  • 1. Structure and functions of DNA. Mechanism of DNA autoreproduction. Biological significance.
  • 2. The role of heredity and environment in ontogenesis. Critical periods of development. Teratogenic environmental factors.
  • Ticket number 19
  • 1. Genetic mechanisms of sex determination. Differentiation of sex characteristics in development. Factors influencing sex determination in ontogenesis.
  • 2. Biological and social aspects of aging and death. The problem of longevity. The concept of gerontology and geriatrics.
  • 3. Life cycle of flatworms. Alternation of hosts and the phenomenon of host change. Intermediate and main hosts. The concept of biohelminths, examples.
  • 1. Inheritance of blood groups, avo system and Rh factor. Rhesus conflict.
  • 2. Receptors of the surface apparatus of cells. Transport of substances across membranes. Membrane potential, concentration gradient, diffusion, osmosis.
  • 3. Life cycle of roundworms. Alternation of hosts and the change phenomenon
  • Ticket 21.
  • 1. Qualitative features of living matter. The principle of organization in time and space. Levels of organization of living things.
  • 2. Multiple alleles and polygenic inheritance using the example of humans. Interaction of non-allelic genes: complementarity, epistasis.
  • 3. Arthropods. Systematics, morphology, development. Importance for medicine as carriers of pathogens of transmissible natural focal diseases.
  • Ticket 22.
  • 1.Blood elements, blood substitutes – artificial blood.
  • 2. Periodization of postembryonic development. The period of growth and formation, the influence of external factors.
  • 3. The biosphere as a natural historical system. Modern concepts of the biosphere: biochemical, biogeocenotic, thermodynamic, geophysical, cybernetic, socio-ecological.
  • Ticket 23.
  • 1. The law of independent combination of characteristics. Cytogenetic basis of the universality of Mendel's laws. Mendelian characteristics of man.
  • 2. Biogeographical characteristics of living conditions as a factor in infection with parasitic diseases. Examples. Means of prevention.
  • 3. Population structure of humanity. Dems. Isolates. People as objects of evolutionary factors.
  • Ticket 24.
  • 2. Trichomonas. Systematics, morphology, development cycle, routes of infection. Laboratory diagnostics and prevention.
  • 3. Evolution of the biosphere. Teachings of Academician V.I. Vernadsky.
  • Ticket 25.
  • 2. Protozoa. Classification. Characteristic features of the organization. Importance for medicine as causative agents of protozoal diseases.
  • 3. The internal environment of the body – homeostasis. Composition and functions of blood. Plasma, blood clotting.
  • Ticket 26.
  • 1. Classification of genes: genes for structural RNA synthesis, regulators. Properties of genes: discreteness, stability, lability, specificity, pleiotropy.
  • 2. Death as the final stage of ontogenesis. Clinical and biological death. Resuscitation.
  • 3. Environmental problems and ways to solve them.
  • 1. Coding and implementation of biological information in the cell. DNA and protein code system.

    2. Genetic engineering. Biotechnology. Objectives, methods. Achievements, prospects.

    3. Definition of the science of ecology. Environment as an ecological concept, environmental factors. Ecosystem, biogeocenosis, anthropocenosis. Specifics of people's living environment.

    1. Primarily, the diversity of life is determined by the diversity of protein molecules that perform various biological functions in cells. The structure of proteins is determined by the set and order of amino acids in their peptide chains. It is this sequence of amino acids in peptide chains that is encrypted in DNA molecules using a biological (genetic) code. To encrypt 20 different amino acids, a sufficient number of nucleotide combinations can only be provided by a triplet code, in which each amino acid is encrypted by three adjacent nucleotides.

    Genetic code is a system for recording information about the sequence of amino acids in proteins using the sequential arrangement of nucleotides in mRNA.

    St. Gen. code:

    1) The code is triplet. This means that each of the 20 amino acids is encrypted by a sequence of 3 nucleotides, called a triplet or codon.

    2) The code is degenerate. This means that each amino acid is encoded by more than one codon (exceptions are methiotine and tryptophan)

    3) The code is unambiguous - each codon encrypts only 1 amino acid

    4) Between genes there are “punctuation marks” (UAA, UAG, UGA), each of which means the cessation of synthesis and stands at the end of each gene.

    5) There is no punctuation inside the gene.

    6) The code is universal. The genetic code is the same for all living creatures on earth.

    Transcription is the process of reading RNA information carried out by mRNA polymerase. DNA is the carrier of all genetic information in a cell and does not directly participate in protein synthesis. A carrier information intermediary is sent from the nucleus to the ribosomes - the sites of protein assembly - and is able to pass through the pores of the nuclear membrane. It is mRNA. According to the principle of complementarity, it reads from DNA with the participation of an enzyme called RNA polymerase. The transcription process can be divided into 4 stages:

    1) Binding of RNA polymerase to the promoter,

    2) initiation – the beginning of synthesis. It consists in the formation of the first phosphodiester bond between ATP and GTP and two nucleotides of the synthesizing mRNA molecule,

    3) elongation – growth of the RNA chain, i.e. sequential addition of nucleotides to each other in the order in which complementary nucleotides appear in the transcribed DNA strand,

    4) Termination – completion of mRNA synthesis. The promoter is a platform for RNA polymerase. An operon is part of a single DNA gene.

    DNA(deoxyribonucleic acid) is a biological polymer consisting of two polynucleotide chains connected to each other. The monomers that make up each of the DNA chains are complex organic compounds that include one of four nitrogenous bases: adenine (A) or thymine (T), cytosine (C) or guanine (G), the five-atomic sugar pentose - deoxyribose, which is named after DNA itself was named, as well as the phosphoric acid residue. These compounds are called nucleotides.

    2. GENETIC ENGINEERING, or recombinant DNA technology, a change using biochemical and genetic techniques of chromosomal material - the main hereditary substance of cells. Chromosomal material consists of deoxyribonucleic acid (DNA). Biologists isolate certain sections of DNA, combine them in new combinations and transfer them from one cell to another. As a result, it is possible to carry out changes in the genome that would hardly have occurred naturally. A number of drugs have already been obtained using genetic engineering, including human insulin and the antiviral drug interferon. And although this technology is still being developed, it promises enormous advances in both medicine and agriculture. In medicine, for example, this is a very promising way to create and produce vaccines. In agriculture, recombinant DNA can be used to produce varieties of cultivated plants that are resistant to drought, cold, diseases, insect pests and herbicides.

    Genetic engineering methods:

    Sequencing method - determination of the nucleotide sequence of DNA;

    DNA reverse transcription method;

    Reproduction of individual DNA fragments.

    Modern biotechnology- this is a new scientific and technical direction that arose in the 60-70s of our century. It began to develop especially rapidly in the mid-70s after the first successes of genetic engineering experiments. Biotechnology, in essence, is nothing more than the use of cell cultures of bacteria, yeast, animals or plants, the metabolism and biosynthetic capabilities of which ensure the production of specific substances. Biotechnology, based on the application of knowledge and methods of biochemistry, genetics and chemical engineering, has made it possible to obtain, with the help of easily accessible, renewable resources, those substances that are important for life and well-being.

    3. Ecology– the science of the relationship between living organisms and their environment. The nature in which a living organism lives is its habitat . Environmental factors that affect the body are called environmental factors:

      abiotic factors– factors of inanimate nature (temperature, light, humidity);

      biotic factors– relationships between individuals in a population and between populations in a natural society;

      anthropogenic factor– human activity leading to changes in the habitat of living organisms.

    Photoperiodism - a general important adaptation of organisms. Thus, the lengthening days of spring cause active activity of the gonads.

    In 1935, the English botanist A. Tesley introduced the concept of “ ecosystem“- historically established open, but integral and stable systems of living and non-living components, having a one-way flow of energy, internal and external circulation of substances and having the ability to regulate all these processes.

    In 1942, Soviet academician V.N. Sukachev formulated the concept of “ biogeocenosis“- an open natural system consisting of living and non-living components, occupying an area with a relatively homogeneous plant community and characterized by a certain flow of energy, circulation of substances, movement and development.

    A forest, field, meadow is an ecosystem. But when the characteristics of the forest and its type are specified by a certain plant community (spruce forest - blueberry, pine forest - lingonberry) - this is a biogeocenosis.

    The human environment is an interweaving of interacting natural and anthropogenic environmental factors, the set of which varies in different natural-geographical and economic regions of the planet.

    The genetic code is a system for recording hereditary information in nucleic acid molecules, based on a certain alternation of nucleotide sequences in DNA or RNA, forming codons corresponding to amino acids in a protein.

    Properties of the genetic code.

    The genetic code has several properties.

      Tripletity.

      Degeneracy or redundancy.

      Unambiguity.

      Polarity.

      Non-overlapping.

      Compactness.

      Versatility.

    It should be noted that some authors also propose other properties of the code related to the chemical characteristics of the nucleotides included in the code or the frequency of occurrence of individual amino acids in the body’s proteins, etc. However, these properties follow from those listed above, so we will consider them there.

    A. Tripletity. The genetic code, like many complexly organized systems, has the smallest structural and smallest functional unit. A triplet is the smallest structural unit of the genetic code. It consists of three nucleotides. A codon is the smallest functional unit of the genetic code. Typically, triplets of mRNA are called codons. In the genetic code, a codon performs several functions. Firstly, its main function is that it encodes a single amino acid. Secondly, the codon may not code for an amino acid, but, in this case, it performs another function (see below). As can be seen from the definition, a triplet is a concept that characterizes elementary structural unit genetic code (three nucleotides). Codon – characterizes elementary semantic unit genome - three nucleotides determine the attachment of one amino acid to the polypeptide chain.

    The elementary structural unit was first deciphered theoretically, and then its existence was confirmed experimentally. Indeed, 20 amino acids cannot be encoded with one or two nucleotides because there are only 4 of the latter. Three out of four nucleotides give 4 3 = 64 variants, which more than covers the number of amino acids available in living organisms (see Table 1).

    The 64 nucleotide combinations presented in table have two features. Firstly, of the 64 triplet variants, only 61 are codons and encode any amino acid, they are called sense codons. Three triplets do not encode

    amino acids a are stop signals indicating the end of translation. There are three such triplets - UAA, UAG, UGA, they are also called “meaningless” (nonsense codons). As a result of a mutation, which is associated with the replacement of one nucleotide in a triplet with another, a nonsense codon can arise from a sense codon. This type of mutation is called nonsense mutation. If such a stop signal is formed inside the gene (in its information part), then during protein synthesis in this place the process will be constantly interrupted - only the first (before the stop signal) part of the protein will be synthesized. A person with this pathology will experience a lack of protein and will experience symptoms associated with this deficiency. For example, this kind of mutation was identified in the gene encoding the hemoglobin beta chain. A shortened inactive hemoglobin chain is synthesized, which is quickly destroyed. As a result, a hemoglobin molecule devoid of a beta chain is formed. It is clear that such a molecule is unlikely to fully fulfill its duties. A serious disease occurs that develops as hemolytic anemia (beta-zero thalassemia, from the Greek word “Thalas” - Mediterranean Sea, where this disease was first discovered).

    The mechanism of action of stop codons differs from the mechanism of action of sense codons. This follows from the fact that for all codons encoding amino acids, corresponding tRNAs have been found. No tRNAs were found for nonsense codons. Consequently, tRNA does not take part in the process of stopping protein synthesis.

    CodonAUG (sometimes GUG in bacteria) not only encode the amino acids methionine and valine, but are alsobroadcast initiator .

    b. Degeneracy or redundancy.

    61 of the 64 triplets encode 20 amino acids. This three-fold excess of the number of triplets over the number of amino acids suggests that two coding options can be used in the transfer of information. Firstly, not all 64 codons can be involved in encoding 20 amino acids, but only 20 and, secondly, amino acids can be encoded by several codons. Research has shown that nature used the latter option.

    His preference is obvious. If out of 64 variant triplets only 20 were involved in encoding amino acids, then 44 triplets (out of 64) would remain non-coding, i.e. meaningless (nonsense codons). Previously, we pointed out how dangerous it is for the life of a cell to transform a coding triplet as a result of mutation into a nonsense codon - this significantly disrupts the normal functioning of RNA polymerase, ultimately leading to the development of diseases. Currently, three codons in our genome are nonsense, but now imagine what would happen if the number of nonsense codons increased by about 15 times. It is clear that in such a situation the transition of normal codons to nonsense codons will be immeasurably higher.

    A code in which one amino acid is encoded by several triplets is called degenerate or redundant. Almost every amino acid has several codons. Thus, the amino acid leucine can be encoded by six triplets - UUA, UUG, TSUU, TsUC, TsUA, TsUG. Valine is encoded by four triplets, phenylalanine by two and only tryptophan and methionine encoded by one codon. The property that is associated with recording the same information with different symbols is called degeneracy.

    The number of codons designated for one amino acid correlates well with the frequency of occurrence of the amino acid in proteins.

    And this is most likely not accidental. The higher the frequency of occurrence of an amino acid in a protein, the more often the codon of this amino acid is represented in the genome, the higher the likelihood of its damage by mutagenic factors. Therefore, it is clear that a mutated codon has a greater chance of encoding the same amino acid if it is highly degenerate. From this perspective, the degeneracy of the genetic code is a mechanism that protects the human genome from damage.

    It should be noted that the term degeneracy is used in molecular genetics in another sense. Thus, the bulk of the information in a codon is contained in the first two nucleotides; the base in the third position of the codon turns out to be of little importance. This phenomenon is called “degeneracy of the third base.” The latter feature minimizes the effect of mutations. For example, it is known that the main function of red blood cells is to transport oxygen from the lungs to the tissues and carbon dioxide from the tissues to the lungs. This function is performed by the respiratory pigment - hemoglobin, which fills the entire cytoplasm of the erythrocyte. It consists of a protein part - globin, which is encoded by the corresponding gene. In addition to protein, the hemoglobin molecule contains heme, which contains iron. Mutations in globin genes lead to the appearance of different variants of hemoglobins. Most often, mutations are associated with replacing one nucleotide with another and the appearance of a new codon in the gene, which may encode a new amino acid in the hemoglobin polypeptide chain. In a triplet, as a result of mutation, any nucleotide can be replaced - the first, second or third. Several hundred mutations are known that affect the integrity of the globin genes. Near 400 of which are associated with the replacement of single nucleotides in a gene and the corresponding amino acid replacement in a polypeptide. Of these only 100 replacements lead to instability of hemoglobin and various kinds of diseases from mild to very severe. 300 (approximately 64%) substitution mutations do not affect hemoglobin function and do not lead to pathology. One of the reasons for this is the above-mentioned “degeneracy of the third base,” when a replacement of the third nucleotide in a triplet encoding serine, leucine, proline, arginine and some other amino acids leads to the appearance of a synonymous codon encoding the same amino acid. Such a mutation will not manifest itself phenotypically. In contrast, any replacement of the first or second nucleotide in a triplet in 100% of cases leads to the appearance of a new hemoglobin variant. But even in this case, there may not be severe phenotypic disorders. The reason for this is the replacement of an amino acid in hemoglobin with another one similar to the first in physicochemical properties. For example, if an amino acid with hydrophilic properties is replaced by another amino acid, but with the same properties.

    Hemoglobin consists of the iron porphyrin group of heme (oxygen and carbon dioxide molecules are attached to it) and protein - globin. Adult hemoglobin (HbA) contains two identical-chains and two-chains. Molecule-chain contains 141 amino acid residues,-chain - 146,- And-chains differ in many amino acid residues. The amino acid sequence of each globin chain is encoded by its own gene. Gene encoding-the chain is located in the short arm of chromosome 16,-gene - in the short arm of chromosome 11. Substitution in the gene encoding-the hemoglobin chain of the first or second nucleotide almost always leads to the appearance of new amino acids in the protein, disruption of hemoglobin functions and serious consequences for the patient. For example, replacing “C” in one of the triplets CAU (histidine) with “Y” will lead to the appearance of a new triplet UAU, encoding another amino acid - tyrosine. Phenotypically this will manifest itself in a severe disease.. A similar substitution in position 63-chain of histidine polypeptide to tyrosine will lead to destabilization of hemoglobin. The disease methemoglobinemia develops. Replacement, as a result of mutation, of glutamic acid with valine in the 6th position-chain is the cause of the most severe disease - sickle cell anemia. Let's not continue the sad list. Let us only note that when replacing the first two nucleotides, an amino acid with physicochemical properties similar to the previous one may appear. Thus, replacement of the 2nd nucleotide in one of the triplets encoding glutamic acid (GAA) in-chain with “U” leads to the appearance of a new triplet (GUA), encoding valine, and replacing the first nucleotide with “A” forms the triplet AAA, encoding the amino acid lysine. Glutamic acid and lysine are similar in physicochemical properties - they are both hydrophilic. Valine is a hydrophobic amino acid. Therefore, replacing hydrophilic glutamic acid with hydrophobic valine significantly changes the properties of hemoglobin, which ultimately leads to the development of sickle cell anemia, while replacing hydrophilic glutamic acid with hydrophilic lysine changes the function of hemoglobin to a lesser extent - patients develop a mild form of anemia. As a result of the replacement of the third base, the new triplet can encode the same amino acids as the previous one. For example, if in the CAC triplet uracil was replaced by cytosine and a CAC triplet appeared, then practically no phenotypic changes will be detected in humans. This is understandable, because both triplets code for the same amino acid – histidine.

    In conclusion, it is appropriate to emphasize that the degeneracy of the genetic code and the degeneracy of the third base from a general biological point of view are protective mechanisms that are inherent in evolution in the unique structure of DNA and RNA.

    V. Unambiguity.

    Each triplet (except nonsense) encodes only one amino acid. Thus, in the direction codon - amino acid the genetic code is unambiguous, in the direction amino acid - codon it is ambiguous (degenerate).

    Unambiguous

    Amino acid codon

    Degenerate

    And in this case, the need for unambiguity in the genetic code is obvious. In another option, when translating the same codon, different amino acids would be inserted into the protein chain and, as a result, proteins with different primary structures and different functions would be formed. Cell metabolism would switch to the “one gene – several polypeptides” mode of operation. It is clear that in such a situation the regulatory function of genes would be completely lost.

    g. Polarity

    Reading information from DNA and mRNA occurs only in one direction. Polarity is important for defining higher order structures (secondary, tertiary, etc.). Earlier we talked about how lower-order structures determine higher-order structures. Tertiary structure and higher order structures in proteins are formed as soon as the synthesized RNA chain leaves the DNA molecule or the polypeptide chain leaves the ribosome. While the free end of an RNA or polypeptide acquires a tertiary structure, the other end of the chain continues to be synthesized on DNA (if RNA is transcribed) or a ribosome (if a polypeptide is transcribed).

    Therefore, the unidirectional process of reading information (during the synthesis of RNA and protein) is essential not only for determining the sequence of nucleotides or amino acids in the synthesized substance, but for the strict determination of secondary, tertiary, etc. structures.

    d. Non-overlapping.

    The code may be overlapping or non-overlapping. Most organisms have a non-overlapping code. Overlapping code is found in some phages.

    The essence of a non-overlapping code is that a nucleotide of one codon cannot simultaneously be a nucleotide of another codon. If the code were overlapping, then the sequence of seven nucleotides (GCUGCUG) could encode not two amino acids (alanine-alanine) (Fig. 33, A) as in the case of a non-overlapping code, but three (if there is one nucleotide in common) (Fig. 33, B) or five (if two nucleotides are common) (see Fig. 33, C). In the last two cases, a mutation of any nucleotide would lead to a violation in the sequence of two, three, etc. amino acids.

    However, it has been established that a mutation of one nucleotide always disrupts the inclusion of one amino acid in a polypeptide. This is a significant argument that the code is non-overlapping.

    Let us explain this in Figure 34. Bold lines show triplets encoding amino acids in the case of non-overlapping and overlapping code. Experiments have clearly shown that the genetic code is non-overlapping. Without going into details of the experiment, we note that if you replace the third nucleotide in the sequence of nucleotides (see Fig. 34)U (marked with an asterisk) to some other thing:

    1. With a non-overlapping code, the protein controlled by this sequence would have a substitution of one (first) amino acid (marked with asterisks).

    2. With an overlapping code in option A, a substitution would occur in two (first and second) amino acids (marked with asterisks). Under option B, the replacement would affect three amino acids (marked with asterisks).

    However, numerous experiments have shown that when one nucleotide in DNA is disrupted, the disruption in the protein always affects only one amino acid, which is typical for a non-overlapping code.

    GZUGZUG GZUGZUG GZUGZUG

    GCU GCU GCU UGC GCU GCU GCU UGC GCU GCU GCU

    *** *** *** *** *** ***

    Alanin - Alanin Ala - Cis - Ley Ala - Ley - Ley - Ala - Ley

    A B C

    Non-overlapping code Overlapping code

    Rice. 34. A diagram explaining the presence of a non-overlapping code in the genome (explanation in the text).

    The non-overlap of the genetic code is associated with another property - the reading of information begins from a certain point - the initiation signal. Such an initiation signal in mRNA is the codon encoding methionine AUG.

    It should be noted that a person still has a small number of genes that deviate from the general rule and overlap.

    e. Compactness.

    There is no punctuation between codons. In other words, triplets are not separated from each other, for example, by one meaningless nucleotide. The absence of “punctuation marks” in the genetic code has been proven in experiments.

    and. Versatility.

    The code is the same for all organisms living on Earth. Direct evidence of the universality of the genetic code was obtained by comparing DNA sequences with corresponding protein sequences. It turned out that all bacterial and eukaryotic genomes use the same sets of code values. There are exceptions, but not many.

    The first exceptions to the universality of the genetic code were found in the mitochondria of some animal species. This concerned the terminator codon UGA, which reads the same as the codon UGG, encoding the amino acid tryptophan. Other rarer deviations from universality were also found.

    DNA code system.

    The genetic code of DNA consists of 64 triplets of nucleotides. These triplets are called codons. Each codon codes for one of the 20 amino acids used in protein synthesis. This gives some redundancy in the code: most amino acids are coded for by more than one codon.
    One codon performs two interrelated functions: it signals the beginning of translation and encodes the inclusion of the amino acid methionine (Met) in the growing polypeptide chain. The DNA coding system is designed so that the genetic code can be expressed either as RNA codons or DNA codons. RNA codons are found in RNA (mRNA) and these codons are able to read information during the synthesis of polypeptides (a process called translation). But each mRNA molecule acquires a nucleotide sequence in transcription from the corresponding gene.

    All but two amino acids (Met and Trp) can be encoded by 2 to 6 different codons. However, the genome of most organisms shows that certain codons are favored over others. In humans, for example, alanine is encoded by GCC four times more often than by GCG. This probably indicates greater translation efficiency of the translation apparatus (for example, the ribosome) for some codons.

    The genetic code is almost universal. The same codons are assigned to the same section of amino acids and the same start and stop signals are overwhelmingly the same in animals, plants and microorganisms. However, some exceptions have been found. Most involve assigning one or two of the three stop codons to an amino acid.