Externalities genetic algorithm article. Genetic algorithm - visual implementation

Programming

About four years ago, at university, I heard about an optimization method called a genetic algorithm. Exactly two things were reported about him everywhere: he’s cool and he doesn’t work. Or rather, it works, but it’s slow, unreliable, and shouldn’t be used anywhere. But he can beautifully demonstrate the mechanisms of evolution. In this article I will show a beautiful way to look at the processes of evolution live using the example of this simple method. All you need is a little math, programming and some imagination.

Briefly about the algorithm

So what is a genetic algorithm? This is, first of all, a multidimensional optimization method, i.e. method for finding the minimum of a multidimensional function. Potentially, this method can be used for global optimization, but there are difficulties with this, which I will describe later.

The very essence of the method is that we modulate evolutionary process: we have some population (set of vectors) that reproduces, which is affected by mutations and natural selection is carried out based on minimization objective function. Let's take a closer look at these processes.

So, first of all, our population must multiply. The basic principle of reproduction is that the offspring resembles its parents. Those. we must define some kind of inheritance mechanism. And it will be better if it includes an element of chance. But the rate of development of such systems is very low - genetic diversity decreases, the population degenerates. Those. the value of the function ceases to be minimized.

To solve this problem, a mechanism was introduced mutations, which consists of random changes in some individuals. This mechanism allows us to introduce something new into genetic diversity.
The next important mechanism is selection. As was said, selection is the selection of individuals (it can be from those who have just been born, or from all of them - practice shows that this does not matter decisive role), which better minimize the function. Usually, as many individuals are selected as there were before reproduction, so that from era to era we have a constant number of individuals in the population. It is also customary to select “lucky ones” - a certain number of individuals who may not minimize the function well, but will introduce diversity in subsequent generations.

These three mechanisms are most often not sufficient to minimize function. This is how the population degenerates - sooner or later the local minimum overwhelms the entire population with its value. When this happens, a process called shake-up(in the nature of analogies - global disasters), when almost the entire population is destroyed and new (random) individuals are added.

Here is a description of the classic genetic algorithm, it is easy to implement and there is room for imagination and research.

Formulation of the problem

So, when I had already decided that I wanted to try to implement this legendary (albeit unsuccessful) algorithm, the conversation turned to what would I minimize? Usually they take some scary multidimensional function with sines, cosines, etc. But this is not very interesting and not clear at all. One simple idea came up - an image where the value is responsible for the brightness is perfect for displaying a multidimensional vector. Thus, we can introduce a simple function - the distance to our target image, measured in the difference in pixel brightness. For simplicity and speed, I took images with a brightness of 0 or 255.

From a mathematical point of view, such optimization is a mere trifle. The graph of such a function is a huge multidimensional “hole” (like a three-dimensional parabaloid in the figure), into which you will inevitably slide if you follow the gradient. The only local minimum is global. .

The only problem is that the number of paths along which you can go down is already close to the minimum, and in total we have as many directions as there are dimensions (i.e. the number of pixels). Obviously, it's not worth solving this problem using a genetic algorithm, but we can look at interesting processes occurring in our population.

Implementation

All the mechanisms described in the first paragraph were implemented. Reproduction was carried out by simply crossing random pixels from the “mother” and from the “father”. Mutations were made by changing the value of a random pixel in a random individual to the opposite. And the shaking was carried out if the minimum did not change over five steps. Then an “extreme mutation” occurs - the replacement occurs more intensely than usual.

I took nonograms (“Japanese scanwords”) as the initial pictures, but, to be honest, you can just take black squares - there is absolutely no difference. Below are the results for several images. Here, for everyone except the “house”, the number of mutations was 100 on average for each individual, there were 100 individuals in the population, and during reproduction the population increased 4 times. There were 30% lucky ones in each era. For the house, smaller values were chosen (30 individuals in the population, 50 mutations per individual).

Experimentally, I found that the use of “lucky ones” in selection reduces the rate at which the population tends to a minimum, but it helps to get out of stagnation - without the “lucky ones” stagnation will be constant. What can be seen from the graphs: the left graph is the development of the “pharaoh” population with lucky ones, the right one is without lucky ones.

Thus, we see that this algorithm allows us to solve the problem, albeit for a very for a long time. Too much a large number of shakes, in the case of large images, can solve large quantity individuals in a population. I leave the optimal selection of parameters for different dimensions outside the scope of this post.

Global optimization

As was said, local optimization is a rather trivial task, even for multidimensional cases. It is much more interesting to see how the algorithm copes with global optimization. But to do this, you first need to construct a function with many local minima. And in our case this is not so difficult. It is enough to take the minimum from the distances to several images (house, dinosaur, fish, boat). Then the initial algorithm will “slide” into some random hole. And you can just run it several times.

But there is more interesting solution given problem: we can understand that we have slipped into a local minimum, make a strong shake-up (or even initiate individuals again), and then add penalties as we approach a known minimum. As you can see, the pictures alternate. I note that we do not have the right to touch the original function. But we can remember local minima and add penalties ourselves.

This picture shows the result when, upon reaching a local minimum (strong stagnation), the population simply dies out.

Here the population dies out and a small penalty is added (equivalent to the normal distance to a known minimum). This greatly reduces the likelihood of repetitions.

It is more interesting when the population does not die out, but simply begins to adapt to new conditions (next figure). This is achieved with a penalty of 0.000001 * sum ^ 4. In this case, the new images become a little noisy:

This noise is eliminated by limiting the penalty to max(0.000001 * sum ^ 4, 20). But we see that the fourth local minimum (dinosaur) cannot be reached - most likely because it is too close to some other one.

Biological interpretation

What conclusions can we draw from, dare I say it, modeling? First of all, we see sexual reproduction- the most important engine of development and adaptability. But that alone is not enough. The role of random, small changes is extremely important. It is they who ensure the emergence of new species of animals in the process of evolution, and in our country it ensures the diversity of the population.

The most important role in the evolution of the Earth was played by natural disasters And mass extinctions(extinctions of dinosaurs, insects, etc. - there were about ten major ones - see the diagram below). This was also confirmed by our simulation. And the selection of the “lucky ones” showed that the weakest organisms today are capable of becoming the basis for subsequent generations in the future.

As they say, everything is like in life. This “do-it-yourself evolution” method clearly shows interesting mechanisms and their role in development. Of course, there are many more worthwhile evolutionary models (based, of course, on difurs), taking into account more factors that are closer to life. Of course there are more effective methods optimization.

P.S.

I wrote a program in Matlab (or rather, even in Octave), because everything here is a bare matrix, and there are tools for working with pictures. Source code is attached.

Source

function res = genetic(file) %generating global A B; im2line(file); dim = length(A(1,:)); count = 100; reprod = 4; mut = 100; select = 0.7; stagn = 0.8; pop = round(rand(count,dim)); res = ; B = ; localmin = ; localcount = ; for k = 1:300 %reproduction for j = 1:count * reprod pop = ; end %mutation idx = 10 * (length(res) > 5 && std(res(1:5)) == 0) + 1; for j = 1:count * mut a = floor(rand() * count) + 1; b = floor(rand() * dim) + 1; pop(a,b) = ~pop(a,b); end %selection val = func(pop); val(1:count) = val(1:count) * 10; npop = zeros(count,dim); = sort(val); res = ; opt = pop(i(1),:); fn = sprintf("result/%05d-%d.png",k,s(1)); line2im(opt*255,fn); if (s(1) == 0 || localcount > 10) localmin = ; localcount = ; B = ; % pop = round(rand(count,dim)); continue; % break; end for j = 1:floor(count * select) npop(j,:) = pop(i(j),:); end %adding luckers for j = (floor(count*select)+1) : count npop(j,:) = pop(floor(rand() * count) + 1,:); end %fixing stagnation if (length(res) > 5 && std(res(1:5)) == 0) if (localmin == res(1)) localcount = localcount+1; else localcount = 1; end localmin = res(1); for j = 1:count*stagn a = floor(rand() * count) + 1; npop(a,:) = crossingover(npop(a,:),rand(1,dim)); end end pop = npop; end res = res(length(res):-1:1); end function res = crossingover(a, b) x = round(rand(size(a))); res = a .* x + b .* (~x); end function res = func(v) global A B; res = inf; for i = 1:size(A,1) res = min(res,sum(v ~= A(i,:),2)); end for i = 1:size(B,1) res = res + max(0.000001 * sum(v == B(i,:),2) .^ 4.20); end end function = im2line(files) global A sz; A = ; files = cellstr(files); for i = 1:size(files,1) imorig = imread(char(files(i,:))); sz = size(imorig); A = )]; end A = A / 255; end function = line2im(im,file) global sz; imwrite(reshape(im*255,sz),file); end

Tags:

genetic algorithm
multidimensional optimization
evolution
matlab

Add tags

It betrayed a noble emptiness. However insufficient level*cut out by censorship* pushed back the publication date, and only now, after shameful tedious begging on my part, this article got the opportunity to show itself to the world. During this period of time, at least three (that’s how many I caught my eye) articles on similar topic, and it is likely that this is not the first time you will read some of what is written below. I suggest to such people not to frown at yet another attempt by an inexperienced youth to explain GA in a popular science way, but to move on to the next exhibit, to the second part, which describes the creation of a GA-based bot for the programming game Robocode. This, according to the latest intelligence information, has not yet been encountered on the hub.

Part one. Life and work of a genetic algorithm.

Let's start from afar. There is a certain set of problems that need to be solved. Our goal is to find actions that can transform Given(initial conditions of the problems) in Answer(target state).

If the situation is simple, and the solution to such a problem can be clearly calculated from the conditions with the help of these matans of yours, then great, here everything is fine even without our wisdom, we are fucked, we all go our separate ways. For example, when solving quadratic equation the answer (values x1, x2) are obtained from the initial condition (coefficients a, b, c) by applying the formula that we all learned in school. What to do in a more sad case, when the required formula not in the textbook? You can try with brainstorming solve one of the problems. Analytically. Numerical methods. By the force of desperate search of functions. After a while, the dreamy student “if only it would resolve itself” will be heard. Yeah, this is where we come out from behind the curtains. So, the goal is to write a program that would find a function (program) that receives input data and returns valid numbers. The power of metaprogramming, into battle!

Hmm, how are we going to achieve this goal? Let's make a sacrifice to the gods of recursion by the fire: we'll write a program that will write a program that would find a function (program)... No, this won't work a second time. It’s better to take an example from nature, casting our gaze on such phenomena as the mechanism of evolution and natural selection. Everything is like in life: our programs will live, mate, give birth and die under the yoke of more adapted individuals, passing on their best qualities descendants. It sounds crazy, but it's worth taking a closer look.

The God of our software world is our task. Programs must believe in it, mate for its sake, light a candle in the church for it, and live with the sole purpose of finding the meaning of life and solving this problem. The one who is most adapted to the environment (closer to solving the problem) becomes the alpha male, survives and produces strong offspring. A loser who spent his entire life in prison online games has not experienced success in solving the problem, has very little chance of giving birth to offspring. The gene pool will be cleared of the contribution of these pimply comrades, and the entire society of programs will move towards the bright future of the solved problem. Well, in general outline It’s already clear, now you need to understand the nuances: firstly, how do you imagine pairing programs? secondly, where will we get the first generation of programs? thirdly, by what characteristic will we determine the fitness of individuals and how will it influence crossing? fourthly, it is worth deciding on the conditions for the end of the algorithm’s work and when to stop this whole orgy.

The Art of Program Pairing

I think many of us sometimes feel a burning desire to apply sexual violence to programs. Here we are forced to warn in advance that we do not encourage such interspecies deviations. Everything is as we wished Catholic Church: a program with a program, only after marriage... and they don’t change partners, even if that languid guy bought you a cocktail at the bar. Although no, I’m lying, harem-type polygamy is thriving. Yes, and also, despite the use of words such as “father” or “son” below, our programs are hermaphrodites. Well, incest too... Ugh, and I also talked about the church *facepalm*. Okay, more on that later.

The issue of crossing programs is not so simple. Accidentally exchanging functions, strings or variables will result in a fat thread scary words addressed to you from the compiler/interpreter, and not new program. That is, it is necessary to find a way to cross programs correctly. Smart guys found a way out. And the smart boys and girls who studied the structure of compilers also already guessed it. Yes, yes, this is a syntax tree.

Let me immediately curb my ardor: our beard is not very thick yet, so we will use the most simple types programs. Those who wish can go into the valley of the countless riches of programming, but here everything is simple - the program consists of expressions, in turn consisting of simple functions with some arity, variables and constants. Each expression counts one of the values returned by the program.

For example: some individual program square consists of two expressions, trying (not very successfully) to solve a quadratic equation:
function square(a, b, c)( x1 = min(sin(b)*(a+1), 0); x2 = 3 + exp(log(b*a)); return (x1, x2); )
We've decided on the presentation, now we need to sort out the storage. Since there are still a lot of dances around these very programs, including transferring them from one part of the system to another (which, generally speaking, in my case were generally written in different languages), then storing our specimen in the form of a tree is not very convenient. To represent it in a more convenient way (ideally, a set of strings over some finite alphabet), our individual-program-set_of-trees will have to learn to encode/decode.

It looks like a tree, but it doesn't

So, we need to represent the tree as a string. Here the power of karva trees will help us out. First, you should decide on the set of functions, variables and constants that may appear in the tree. Variables and constants correspond to the leaves of the tree and will be called terminals, functions - to the remaining (internal) nodes of the tree, are called non-terminals. It is also worth paying attention to the fact that functions can have different quantities arguments, therefore we will really need such knowledge (“arity,” the word quietly ran through the lips of experts). The result is an encoding table, for example, like this:

Here n, +, *, if are functions; 2 - constant; a and b are variables. IN real problems table of weights, with such a set one cannot solve a quadratic equation. You also need to keep in mind the fact that in order to avoid division by zero and other apocalypse scenarios, all functions must be defined on the entire set of real numbers (well, or whatever set you use in the problem). Otherwise you will have to sit on guard, catch logarithms from zero and then figure out what to do with it. We are not proud people, we will take the easy way, excluding such options.

So, with the help of such a table, running functions from a tree to a row and back is not a problem. For example, we received the following line for decoding:

Using the table, we identify each element, and also remember about arity:

Now, using arity, we place references to function arguments:

Please note that the last 3 elements of the list turned out to be of no use to anyone, and their values do not in any way affect the result of the function. This happened due to the fact that the number of involved list elements, the number of tree nodes, constantly floats depending on their arities. So it’s better to stock up on reserves than to suffer with an incorrect tree later.

Now if we pull it up by the first element, then the expression tree will dangle in our hand:

The value of the function can be calculated by recursively traversing the tree; it turns out to be like this:

My dad's eyes are like this

Let's return to the hottest thing - crossing. We put the operations of crossing programs following conditions: firstly, two individuals mating produce two offspring (i.e. the population size is constant); secondly, as a result of crossing, the descendants must, to a certain extent, possess the characteristics of both parents (i.e., the apple should not roll too far from the apple tree). We have now learned how the program will be represented - as a set of strings or trees. Accordingly, they can be crossed as strings or as trees.

Tree crossing is the exchange of randomly selected branches. Crossing strings can be implemented in several ways: single-point recombination (piecewise gluing), two-point recombination, element-by-element exchange, etc. They can be described by long complex sentences with participial phrases, but one glance at the diagram is enough to understand what’s what:

One has only to note that the gluing sites in recombination are chosen randomly, just as in element-by-element crossing, the exchange occurs with some probability. Crossing with trees looks more promising in terms of heredity, but is more difficult to implement.

Hey, this girl is with me!

The most intimate part of the process has been dealt with (many have already felt through this article how meager personal life author). Now let's move from the relationship between a pair of individuals to social foundations.

Individuals are divided into generations. The new generation consists of children of individuals of the previous generation. It turns out that there is the current generation of sons and daughters, the generation of fathers and mothers, grandparents, great-grandmothers, and so on until the zero generation - the ancestors of the entire proud people. Each individual of a new generation after birth tries to solve a problem, its actions are assessed by some divine fitness function, and depending on its assessment of the youth’s activity, the individual receives some chances of reproducing offspring, that is, getting into the class of the best representatives of the generation selected for procreation. Our world is harsh and cruel, and according to all the canons of dystopias (or according to the ideas of the Fuhrer, as you wish), useless retired parents, after completing their mission of giving birth to offspring, go on a trip in a gas van, freeing up living space for a couple of their children. Children follow in the footsteps of their parents, and so on from generation to generation.

The same fitness function (or fitness function) that issues mating quotas must adequately assess the individual’s ability to solve the problem and issue numeric expression this fitness (the higher the value, the better the fitness). For example, in the case of that same quadratic equation, this could be a measure of how close the value of the left side of the equation is to zero when the values x1, x2 are substituted, calculated by the individual program.

The fitness function gives each individual of the generation a certain number indicating its usefulness and fitness. This value will influence the selection procedure: the higher an individual’s value, the more likely it is to find a pair for crossing (and even more than one). In practice, after calculating the fitness for all individuals of a generation, we normalize these values (so that the sum of the fitness of individuals is equal to 1) and for each of the kissing places a lot is cast (a random number from 0 to 1) to determine the lucky one. The alpha male can get himself a few places, the loser will get nothing and will be left alone with a worn 1994 calendar with Pamella. This method of selection is called “roulette selection”, and schematically it looks something like this:

There are other selection methods, but they all adhere to general rule: The greater the fitness of an individual, the more it should participate in crossing. You can also include the option of elitism in the process, when the best representative of a generation receives a bonus in the form of additional years life: it passes to the next generation without changes, although it can simultaneously produce children. This allows us not to lose much good decision, which may be destroyed during the crossing process.

Let's also mention mutation here. This is an operation randomly with some small probability it changes a fragment of an individual, which makes it possible to diversify the gene pool. This is a useful thing, maybe this mutation will help break down lactose! And if not, and there’s one more hand to spare, then you’ll have to suffer with it for the rest of your days; there’s still little chance of producing offspring.

Creation and Apocalypse

We figured out how to move from generation to generation, now the question is: “what was the root cause, where did it all begin?” Unlike this world of yours, to explain such things we don’t need to come up with tricks like “ big bang" or "7 days". Here the answer is very clear - it all started with the zero generation, which was created randomly. Yes, yes, we just randomly generate rows/trees. The only requirement is the correctness of the individual, and no one cares how flawed it is; selection will do its job.

Our world exists as long as we need it. We either set the level of fitness that satisfies us, and when a sufficiently cool individual appears, we stop the process, or we check how much the individuals of the generation differ from each other. It is logical that if the entire generation consists of identical twins, then further mating will not give anything new to the gene pool, and it is naive to hope for one mutation. You can also set a time limit.

Hey, you! It's good to soar your brain! What's the end result?

Let's take a break from this fascinating verbiage and look back (well, that is, up). To summarize, the genetic algorithm looks like this:

We learn to represent the solution to a problem in the form of an individual genetic algorithm - a list of a fixed length over some alphabet. After this, we select a fitness function that could evaluate individuals and randomly generate a zero generation. Here the cycle of free love begins: the fitness of individuals of a generation is calculated, based on these data pairs are formed (losers are thrown out, and alpha males are not limited to one pair), the remaining ones mate, give birth to a couple of children (who also have a mutation attached) and commit suicide. This continues until the chosen one is found, or the changes cease to please us, or we are tired of the whole thing. Well, how can I manage without a diagram:

Part two. The role of the genetic algorithm in the image of the Robocode bot.

Somehow the first part dragged on, we were all tired, so we won’t repeat it. We will also omit some implementation features.
You can find out what Robocode is here: habrahabr.ru/blogs/programmers_games/59784 (the pictures are lost, though). In short, this is a programming game, originally created to study the features of the Java language, which allows participants to create their own robot bots and arrange battles between them. Each participant writes code in Java that controls a small tank and fights with other similar tanks.

We are faced with the following task: development using a genetic algorithm automated systems controlling a tank bot. The robot must be created and modified automatically, i.e. during its evolution, “adapt” to a specific and pre-selected opponent in 1 on 1 battles.

How to represent the solution to a problem as an individual

First, let's determine the capabilities of the tank. The list of basic actions that a robot can perform during a battle is limited to four points: turn the gun, turn the body, shoot, move. We excluded the fifth action, rotation of the radar, from consideration, implementing it trivially - constant rotation (thus, the tank will always have up-to-date information about the position of the enemy).

It is obvious that for successful combat these actions should not be performed chaotically, but depend on the situation (condition) on the battlefield: the position of the tanks, their speeds, energy and other parameters. Thus, the process of controlling a tank comes down to performing the above actions based on the state of the battle. We will call the law that determines the behavior of the tank (its actions) based on the situation on the battlefield the control function, and it will be the individual of our genetic algorithm.

Since the control function must return 4 values (shot energy, turret rotation angle, hull rotation angle, tank movement), then, as explained in the last part, it will consist of four expressions, i.e. of four lines/trees.

To compile a coding table, you need to decide on a set basic functions, variables and constants.

Functions:
+(x, y) = x + y
++(x, y, z) = x + y + z
n(x) = -x
*(x, y) = x * y
**(x, y) = x * y * z
min(x, y) = x > y? y: x
s(x) = 1/(1+exp(-x))
if(x, y, z, w) = x > y? z:w

Variables:
x, y - coordinates of the opponent’s tank relative to our tank;
dr is the distance that remains for our tank to “reach”;
tr is the angle at which our tank remains to turn;
w is the distance from our tank to the edge of the field;
dh is the angle between the direction towards the enemy tank and the gun of our tank;
GH is the angle of rotation of the gun of our tank;
h - direction of movement of the opponent’s tank;
d is the distance between our tank and the opponent’s tank;
e is the energy of the opponent’s tank;
E is the energy of our tank.

Well, constants: 0.5, 0, 1, 2, 10

Fitness function

Let us describe how the fitness function was chosen. Robocode generates the results of the battle based on many nuances. This is not only the number of victories, but also all kinds of points for activity, for survival, for hitting an opponent, etc. As a result, “Robocode” ranks robots according to the “total scores” parameter, which takes into account all the subtleties described above. We will use it when calculating the fitness of an individual: the final fitness will be equal to the percentage of our tank’s points from the sum of the points of both tanks, and takes a value from 0 to 100. Accordingly, if the fitness value is more than 50, then our robot has scored more points than the opponent is therefore stronger than him. Note that according to this counting system, first place is not always taken by the one who won the majority of rounds of the battle. Well, here we shrug our shoulders with the phrase about a motor scooter: the creators defined the criteria, we follow them.

Generally speaking, calculating the fitness of an individual involves conducting a series of fights! Those. Such a seemingly insignificant point as a miscalculation of fitness consists of such dances with a tambourine:
1) Our system saves the encoded chromosomes of an individual into the file chromosome.dat;
2) For each individual, the “Robocode” environment is launched, which organizes a duel. We provide it with a file in the .battle format, which describes the conditions of the battle - the list of fighting tanks, the size of the field, the number of rounds, etc.;
3) For battle, Robocode loads tanks, our shell robot reads the chromosome.dat file with encoded behavior, interprets it into a set of actions and conducts the battle according to them;
4) At the end of the fight, the Robocode environment writes the result of the battle to the results.txt file and ends its work;
5) Our system selects this file, parses it and extracts from it the total score values of our tank and the opponent. By simple arithmetic we obtain the fitness value.

Like ours, right?

Let's summarize our design bureau. Our system consists of two parts (programs). The first of them, based on a genetic algorithm, collects an individual and saves it as a set of strings, and the second (robot code) interprets it (processing it into an expression tree) and controls the tank (by calculating the value of expression trees with given variables by recursive traversal, that is current state battle). The first program is written in SI language, the second in Java.

When implementing the genetic algorithm, the number of individuals in the population was chosen to be 51 (25 pairs + one elite individual). One step of evolution (population change) takes about a dozen minutes, therefore, in total, the matter drags on for several hours.

As a result, we will demonstrate the results of creating a rival for the Walls and Crazy robots:

In the first case, we stopped the process after one of the individuals reached a fitness level of 70; in the second, it was enough for us that the average fitness of individuals in a generation exceeded 50.

After contemplation, rinse your eyes with alcohol

If anyone is not afraid to cry tears of blood in convulsions from contemplating redneck coding (especially the hair will start to move from the robot code - we have mutual hatred with java), then I attach

A basic (classical) genetic algorithm (also called an elementary or simple genetic algorithm) consists of the following steps:

1) initialization, or selection of the initial population of chromosomes;

2) assessment of the fitness of chromosomes in the population;

3) checking the algorithm stopping condition;

4) selection of chromosomes;

5) application of genetic operators;

6) formation of a new population;

7) selection of the “best” chromosome.

The block diagram of the main genetic algorithm is shown in Fig. 4.3. Let's look at the specific steps of this algorithm in more detail using the additional details presented in Fig. 4.4.

Rice. 4.3. Flowchart of a genetic algorithm.

Rice. 4.4. Execution diagram of the genetic algorithm.

Initialization, i.e. the formation of the initial population consists of randomly selecting a given number of chromosomes (individuals), represented by binary sequences of a fixed length.

Assessing the fitness of chromosomes in a population consists of calculating the fitness function for each chromosome in that population. The greater the value of this function, the higher the “quality” of the chromosome. The form of the fitness function depends on the nature of the problem being solved. It is assumed that the fitness function always takes non-negative values and, in addition, that to solve the optimization problem it is necessary to maximize this function. If the original form of the fitness function does not satisfy these conditions, then the corresponding transformation is performed (for example, the function minimization problem can be easily reduced to the maximization problem).

Checking the algorithm stopping condition. Determining the stopping condition for a genetic algorithm depends on its specific application. In optimization problems, if the maximum (or minimum) value of the fitness function is known, then the algorithm can stop after reaching the expected optimal value, possibly with a given accuracy. Stopping the algorithm can also occur when its execution does not lead to an improvement in the value already achieved. The algorithm can be stopped after a certain execution time or after completing a specified number of iterations. If the stopping condition is met, then the transition to the final stage of selecting the “best” chromosome is made. Otherwise, selection is performed in the next step.

Chromosome selection consists of selecting (according to the fitness function values calculated at the second stage) those chromosomes that will participate in the creation of descendants for the next population, i.e. for the next generation. This choice is made according to the principle natural selection, according to which best chance Chromosomes with the highest values of the fitness function are eligible to participate in the creation of new individuals. There are various selection methods. The most popular is the so-called roulette wheel selection method, which received its name by analogy with the famous gambling. Each chromosome can be associated with a sector of the roulette wheel, the value of which is set proportional to the value of the fitness function of a given chromosome. Therefore, the greater the value of the fitness function, the larger the sector on the roulette wheel. The entire roulette wheel corresponds to the sum of the fitness function values of all chromosomes of the population in question. Each chromosome denoted for (where denotes the population size) corresponds to a sector of the wheel, expressed as a percentage according to the formula

, (4.2)

, (4.3)

where is the value of the chromosome fitness function, a is the probability of chromosome selection. Chromosome selection can be thought of as the result of turning a roulette wheel, since the “winning” (i.e. selected) chromosome belongs to the sector of the wheel that lands. Obviously, the larger the sector, the more likely“victory” of the corresponding chromosome. Therefore, the probability of choosing a given chromosome turns out to be proportional to the value of its fitness function. If the entire circumference of a roulette wheel is represented as a digital interval, then the choice of a chromosome can be identified with the choice of a number from the interval, where they respectively indicate the beginning and end of the fragment of the circle corresponding to this sector of the wheel; it's obvious that . In this case, choosing using a roulette wheel comes down to choosing a number from the interval that corresponds to a specific point on the circumference of the wheel. Other selection methods will be discussed in paragraph 4.8.1.

As a result of the selection process, a parent population, also called a mating pool, is created with a size equal to the size of the current population.

The application of genetic operators to chromosomes selected through selection leads to the formation of a new population of descendants from the parent population created in the previous step.

The classical genetic algorithm uses two main genetic operators: the crossover operator and the mutation operator. However, it should be noted that the mutation operator plays a clearly secondary role compared to the crossing operator. This means that crossing in a classical genetic algorithm is almost always carried out, while mutation is quite rare. The probability of crossing, as a rule, is quite high (usually), while the probability of mutation is set to be very small (most often). This follows from an analogy with the world of living organisms, where mutations occur extremely rarely.

In a genetic algorithm, chromosome mutation can be performed on a population of parents before crossing or on a population of offspring resulting from crossing.

Crossover operator. At the first stage of crossing, pairs of chromosomes are selected from the parent population (parental pool). This is a temporary population consisting of chromosomes selected as a result of selection and intended for further transformation by crossing and mutation operators in order to form a new population of descendants. On at this stage Chromosomes from the parent population are combined into pairs. This is done randomly according to the probability of mating. Next, for each pair of parents selected in this way, the position of the gene (locus) in the chromosome is played out, which determines the so-called crossing point. If the chromosome of each parent consists of genes, then it is obvious that the crossing point is a natural number less than . Therefore, fixing the crossing point comes down to randomly selecting a number from the interval. As a result of crossing a pair of parental chromosomes, the following pair of offspring is obtained:

1) a descendant whose chromosome at positions from 1 to consists of the genes of the first parent, and at positions from to - from the genes of the second parent;

2) a descendant whose chromosome at positions from 1 to consists of the genes of the second parent, and at positions from to - from the genes of the first parent.

The operation of the crossing operator will be illustrated by examples 4.4 and 4.5 (clauses 4.5 and 4.6).

The mutation operator is likely to change the value of a gene on a chromosome to its opposite (i.e., from 0 to 1 or vice versa). For example, if a gene at position 7 in a chromosome undergoes a mutation, then its value of 1 changes to 0, which leads to the formation of a chromosome. As mentioned above, the probability of mutation is usually very small, and it depends on it whether a given gene will mutate or not. The mutation probability can be emulated, e.g. random selection numbers from the interval for each gene and selecting to perform this operation those genes for which the drawn number is less than or equal to the value .

Formation of a new population. Chromosomes resulting from the application of genetic operators to the chromosomes of the temporary parent population are included in the new population. This becomes the so-called current population for a given iteration of the genetic algorithm. At each next iteration, the values of the fitness function are calculated for all chromosomes of this population, after which the stopping condition of the algorithm is checked and either the result is recorded in the form of a chromosome with highest value fitness function, or a transition to next step genetic algorithm, i.e. to selection. In a classical genetic algorithm, the entire previous population of chromosomes is replaced by a new population of descendants of the same size.

Choosing the “best” chromosome. If the condition for stopping the algorithm is met, then the result of the work should be displayed, i.e. present the desired solution to the problem. The best solution is considered to be the chromosome with the highest fitness function value.

In conclusion, it should be recognized that genetic algorithms have inherited the properties of the natural evolutionary process, which consists of genetic changes in populations of organisms over time.

The main factor of evolution is natural selection (i.e. natural selection), which leads to the fact that among genetically different individuals of the same population, only those most adapted to survive and leave offspring environment. Genetic algorithms also highlight the selection stage, at which individuals with the highest fitness function values are selected from the current population and included in the parent population. On next stage, which is sometimes called evolution, uses the genetic operators of crossover and mutation to perform the recombination of genes on chromosomes.

The crossing operation involves the exchange of fragments of chains between two parent chromosomes. Pairs of parents for crossing are selected randomly from the parent pool so that the probability of selecting a particular chromosome for crossing is equal to probability . For example, if two chromosomes are randomly selected as parents from the parent population in the manner presented in the description of the corresponding operator. This results in the values of the selected genes being inverted from 0 to 1 and back. The value is usually very small, so only a small number of genes undergo mutation. Crossing is a key operator of genetic algorithms that determines their capabilities and efficiency. Mutation plays a more limited role. It introduces some diversity into the population and prevents losses that might occur due to the exclusion of some significant gene as a result of crossing.

The basic (classical) genetic algorithm is known in the literature as a tool in which three types of operations are distinguished: reproduction, crossing and mutation. The terms selection and reproduction in this context are used as synonyms. At the same time, reproduction in in this case is associated rather with the creation of copies of the chromosomes of the parental pool, while the more common content of this concept denotes the process of formation of new individuals descending from specific parents (see Section 4.1). If we accept this interpretation, then the operators of crossing and mutation can be considered operators of reproduction, and selection can be considered the selection of individuals (chromosomes) for reproduction.

Genetic algorithms(GA) are stochastic, heuristic optimization methods first proposed by John Holland in 1975. They are based on the idea of evolution by natural selection. In addition to faster finding of the extremum, the positive properties of genetic algorithms include finding the “global” extremum. In problems where the objective function has a significant number of local extrema, in contrast to the gradient method, genetic algorithms do not “get stuck” at local extremum points, but allow one to find a “global” minimum.

Genetic algorithms work with a collection of individuals - a population, where each individual represents a possible solution to a given problem. It is assessed by the measure of its “adaptability” according to how “good” the solution to the problem corresponding to it is. In nature, this is equivalent to assessing how efficient an organism is when competing for resources. The fittest individuals are able to “reproduce” offspring through “crossbreeding” with other individuals of the population. This leads to the emergence of new individuals that combine some of the characteristics they inherit from their parents. The least fit individuals are less likely to reproduce, so whatever traits they possess will gradually disappear from the population over the course of evolution. Sometimes mutations, or spontaneous changes in genes, occur.

Thus, from generation to generation, good characteristics spread throughout the population. Crossing the fittest individuals leads to the fact that the most inherited promising areas search spaces. Eventually the population will converge to an optimal solution to the problem. The advantage of a GA is that it finds approximate optimal solutions in a relatively short time.
GA operates with the following terminology:

Chromosome - solution to the problem under consideration, carrier hereditary information. The set of chromosomes (values of the parameters of the objective function) characterizes the individual. A chromosome consists of genes.
Genes are elements for encoding hereditary information (target function parameters). Genes most often act as bit coding of information.
An individual is a set of chromosomes (a set of parameters for which the value of the objective function is sought).
Adaptability of an individual– the value of the objective function for a given set of parameters in relation to the required value.

The GA performs the following actions on individuals

Initially, the GA function generates a certain amount possible solutions(individuals), and then calculates fitness for each - proximity to the truth. These decisions produce offspring (a crossover operation is performed). More tailored solutions have greater chance to reproduction, and “weak” individuals gradually “die off.” Thus, the process of evolution takes place. At certain stages this process Spontaneous gene changes (mutations and inversions) occur. Useful changes that lead to an increase in the fitness of the individual give rise to their offspring, while “useless” changes “die off.” After crossing, mutations and inversions, the fitness of individuals of the new generation is again determined. The process is repeated until a solution is found or a sufficient approximation to it is obtained.

As an example of the use of a genetic algorithm, consider the problem of numerical search for a solution discussed in this article.

The objective function will have the form

As a crossover function, we will use the operation of finding the arithmetic mean of the two points under consideration. Several points are selected for crossing the best solution(with the value of the objective function closest to zero).

A mutation will be the operation of generating a new random number for the population under consideration.

The inversion will change the value of the chromosome by some small amount, thus searching in the vicinity of the point with the best solution.
Implementation in C++

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80

#define_USE_MATH_DEFINES
#include
#include
#include
using namespace std;
double func(double x)
{
return sin(M_PI * x / 180) - 1 / x;
}
double mutation(double x0, double x1) // mutation: generating a random variable
{
const int NUM = 100000000;
return fabs((double )((rand() * NUM) % (int )((x1 - x0)*NUM) + 1) / NUM) + x0;
}
double inversion(double x, double eps) // inversion: search in the vicinity of a point
{
static int sign = 0;
sign++;
sign %= 2;
if (sign == 0) return x - eps;
else return x + eps;
}
void crossover(double *x, double eps, double x0, double x1) // crossover: arithmetic mean
{
int k = 99;
for (int i = 0; i< 8; i++)
for (int j = i + 1; j< 8; j++)
{
x[k] = (x[i] + x[j]) / 2;
k--;
}
for (int i = 0; i< 8; i++)
{
x[k] = inversion(x[i], eps); k--;
}
for (int i = 8; i< k; i++)
x[i] = mutation(x0, x1);
}
void sort(double *x, double *y) // sorting
{
for (int i = 0; i< 100; i++)
for (int j = i + 1; j< 100; j++)
if (fabs(y[j])< fabs(y[i])) {
double temp = y[i];
y[i] = y[j];
y[j] = temp;
temp = x[i];
x[i] = x[j];
x[j] = temp;
}
}
double genetic(double x0, double x1, double eps) // search for a solution using GA
{
double population;
double f;
int iter = 0;
for (int i = 0; i< 100; i++) // Formation of the initial population
{
population[i] = mutation(x0, x1);
f[i] = func(population[i]);
}
sort(population, f);
do(
iter++;
crossover(population, eps, x0, x1);
for (int i = 0; i< 100; i++)
f[i] = func(population[i]);
sort(population, f);
) while (fabs(f) > eps && iter<20000);
cout<< iter << " iterations" << endl;
return population;
}
int main()
{
srand(time(NULL ));
cout<< genetic(1.0, 10.0, 0.000001);
cin.get();
return 0;
}

Execution result

The use of genetic algorithms does not always give better results compared to other methods. However, this method has an undeniable advantage when solving multidimensional problems of searching for a global extremum, containing a significant number of local extrema.

The idea of genetic algorithms (GAs) appeared quite a long time ago (1950-1975), but they really became an object of study only in recent decades. The pioneer in this field is considered to be D. Holland, who borrowed much from genetics and adapted it for computers. GAs are widely used in artificial intelligence systems, neural networks, and optimization problems.

Evolutionary search

Models of genetic algorithms were created based on evolution in nature and random search methods. In this case, random search is the implementation of the simplest function of evolution - random mutations and subsequent selection.

From a mathematical point of view, evolutionary search means nothing more than the transformation of an existing finite set of solutions into a new one. The function responsible for this process is genetic search. The main difference between such an algorithm and a random search is the active use of information accumulated during iterations (repetitions).

Why are genetic algorithms needed?

GAs pursue the following goals:

explain adaptation mechanisms both in the natural environment and in the intellectual research (artificial) system;
modeling of evolutionary functions and their application for the effective search for solutions to various problems, mainly optimization ones.

At the moment, the essence of genetic algorithms and their modified versions can be called the search for effective solutions, taking into account the quality of the result. In other words, finding the best balance between performance and accuracy. This happens due to the well-known paradigm of “survival of the fittest individual” in uncertain and unclear conditions.

Features of GA

Let us list the main differences between GA and most other methods for finding an optimal solution.

working with task parameters encoded in a certain way, and not directly with them;
the search for a solution occurs not by refining the initial approximation, but in a variety of possible solutions;
using only the target function, without resorting to its derivatives and modifications;
the use of a probabilistic approach to analysis, instead of a strictly deterministic one.

Performance criteria

Genetic algorithms make calculations based on two conditions:

Execute a specified number of iterations.
The quality of the solution found meets the requirements.

If one of these conditions is met, the genetic algorithm will stop performing further iterations. In addition, the use of GAs from different regions of the solution space allows them to be much better at finding new solutions that have more appropriate values of the objective function.

Basic terminology

Due to the fact that GAs are based on genetics, most of the terminology corresponds to it. Any genetic algorithm works based on initial information. The set of initial values is the population P t = (n 1, n 2, ..., n n), where n i = (r 1, ..., g v). Let's take a closer look:

t is the iteration number. t 1, ..., t k - means iterations of the algorithm from number 1 to k, and at each iteration a new population of solutions is created.
n is the population size P t .
n 1, ..., n i - chromosome, individual, or organism. A chromosome or chain is an encoded sequence of genes, each of which has a serial number. It should be borne in mind that a chromosome can be a special case of an individual (organism).
g v are the genes that are part of the encoded solution.
A locus is the serial number of a gene on a chromosome. Allele is the value of a gene, which can be either numerical or functional.

What does "encoded" mean in the context of GA? Typically, any value is encoded based on some alphabet. The simplest example is the conversion of numbers from the decimal number system to binary representation. Thus, the alphabet is represented as a set (0, 1), and the number 157 10 will be encoded by chromosome 10011101 2, consisting of eight genes.

Parents and descendants

Parents are elements that are selected according to a given condition. For example, often such a condition is randomness. The selected elements, through the operations of crossing and mutation, generate new ones, which are called descendants. Thus, during the implementation of one iteration of the genetic algorithm, parents create a new generation.

Finally, evolution in this context will be an alternation of generations, each new one of which has a different set of chromosomes for the sake of better fitness, that is, a more suitable compliance with given conditions. The general genetic background in the process of evolution is called a genotype, and the formation of a connection between an organism and the external environment is called a phenotype.

Fitness function

The magic of the genetic algorithm is in the fitness function. Each individual has its own meaning, which can be found out through the adaptation function. Its main task is to evaluate these values for different alternative solutions and select the best one. In other words, the fittest.

In optimization problems, the fitness function is called the target function, in control theory it is called the error, in game theory - the cost function, etc. What exactly will be presented as the fitness function depends on the problem being solved.

Ultimately, we can conclude that genetic algorithms analyze a population of individuals, organisms or chromosomes, each of which is represented by a combination of genes (a set of some values), and search for an optimal solution by transforming the individuals of the population through artificial evolution among them.

Deviations in one direction or another of individual elements are generally in accordance with the normal distribution law of quantities. At the same time, GA ensures the heredity of traits, the most adapted of which are fixed, thereby providing a better population.

Basic genetic algorithm

Let's break down the simplest (classical) GA step by step.

Initialization of initial values, that is, determination of the primary population, the set of individuals with which evolution will occur.
Establishing the primary fitness of each individual in a population.
Checking the conditions for stopping iterations of the algorithm.
Using the selection function.
Application of genetic operators.
Creation of a new population.
Steps 2-6 are repeated in a cycle until the necessary condition is met, after which the most adapted individual is selected.

Let's briefly go through the less obvious parts of the algorithm. There can be two conditions for termination of work:

Number of iterations.
Quality of the solution.

The genetic operators are the mutation operator and the crossover operator. A mutation changes random genes with a certain probability. Typically, the probability of mutation has a low numerical value. Let's talk in more detail about the procedure of the genetic algorithm "crossing". It happens according to the following principle:

For each pair of parents containing L genes, a crossing point Tsk i is randomly selected.
The first descendant is formed by joining the genes of the first parent [Tsk i+1; L] genes of the second parent.
The second child is constructed in the reverse way. Now the genes of the first parent are added to the genes of the second parent at positions [Tsk i+1; L].

Trivial example

Let's solve the problem using a genetic algorithm using the example of searching for an individual with the maximum number of units. Let an individual consist of 10 genes. Let's set a primary population of eight individuals. Obviously, the best individual should be 1111111111. Let’s create a GA to solve.

Initialization. Let's select 8 random individuals:

The table shows that individuals 3 and 7 have the largest number of units, and therefore are the most suitable members of the population for solving the problem. Since at the moment there is no solution of the required quality, the algorithm continues to work. It is necessary to carry out selection of individuals. For simplicity of explanation, let selection occur randomly, and we get a sample of individuals (n 7, n 3, n 1, n 7, n 3, n 7, n 4, n 2) - these are the parents for the new population.

Using genetic operators. Again, for simplicity, we assume that the probability of mutation is 0. In other words, all 8 individuals pass on their genes as they are. To carry out the crossing, we will make pairs of individuals randomly: (n 2, n 7), (n 1, n 7), (n 3, n 4) and (n 3, n 7). Crossing points for each pair are also randomly selected:

Creation of a new population consisting of descendants:

Further actions are obvious. The most interesting thing about GAs comes from estimating the average number of units in each population. In the first population, on average, there were 5.375 units per individual, in the population of descendants - 6.25 units per individual. And this feature will be observed even if, during mutations and crossings, the individual with the largest number of units in the first population is lost.

Implementation plan

Creating a genetic algorithm is a rather complex task. First, we list the plan in the form of steps, after which we will analyze each of them in more detail.

Definition of representation (alphabet).
Definition of random change operators.
Determination of survival of individuals.
Generation of the primary population.

The first stage states that the alphabet into which all elements of a set of solutions or a population will be encoded must be flexible enough to simultaneously allow the necessary operations of random permutations and assess the fitness of elements, both primary and those that have gone through changes. It has been mathematically established that it is impossible to create an ideal alphabet for these purposes, therefore its compilation is one of the most difficult and critical stages to ensure the stable operation of the GA.

No less difficult is the definition of modification and creation operators. There are many operators who are capable of performing the required actions. For example, it is known from biology that each species can reproduce in two ways: sexually (crossing) and asexually (mutations). In the first case, parents exchange genetic material, in the second, mutations occur determined by the internal mechanisms of the body and external influences. In addition, reproduction models that do not exist in nature can be used. For example, use the genes of three or more parents. Similar to crossing, a variety of mechanisms can be incorporated into the genetic mutation algorithm.

Choosing a method of survival can be extremely deceptive. There are many ways in genetic algorithm for selection. And, as practice shows, the rule “survival of the fittest” is not always the best. When solving complex technical problems, it often turns out that the best solution emerges from many average or even worse ones. Therefore, it is often common to use a probabilistic approach, which states that the best solution has a greater chance of survival.

The last stage provides flexibility to the algorithm that no other one has. The primary population of solutions can be set either based on any known data or completely randomly by simply rearranging genes within individuals and creating random individuals. However, it is always worth remembering that the effectiveness of the algorithm largely depends on the initial population.

Efficiency

The effectiveness of a genetic algorithm depends entirely on the correct implementation of the steps described in the plan. A particularly influential point here is the creation of a primary population. There are many approaches to this. Let's describe a few:

Creation of a complete population that will include all possible variants of individuals in some given area.
Randomly generate individuals based on all valid values.
Point random creation of individuals, when a range for generation is selected among the acceptable values.
Combining the first three methods of creating a population.

Thus, we can conclude that the effectiveness of genetic algorithms largely depends on the programmer’s experience in this matter. This is both a disadvantage of genetic algorithms and their advantage.