Understanding Evolutionary and Genetic Algorithms and Optimisation Problems

Lewis Watson included in Machine Learning

2023-01-08 2330 words 11 minutes

Contents

Evolutionary Computing

A Evolutionary Algorithm (EA) is a subset of Artificial Intelligence (AI), drawing inspiration from natural selection and biological evolution. These algorithms are particularly useful for solving complex optimisation problems (see P, NP, NP-Complete and NP-Hard Problems in Computer Science by Baeldung) where traditional methods may fall short. Often referred to as Genetic Algorithms (GAs), a specific type of evolutionary algorithm, they mimic the process of natural evolution to optimise towards a solution.

EAs are an example of Artificial Intelligence that generally does not fall under the ‘Machine Learning’ category - however, there have been some projects using EA in place of gradient decent, sometimes referred to as “neuroevolution”. See NEAT, HyperNEAT, OpenAI’s Evolution Strategies.

In this blog post, we will explore the basics of genetic algorithms, their applications, why they are an important tool in the field of computer science and finally we will solve a challenge using a genetic algorithm.

How do Genetic Algorithms (GAs) work?

Generic algorithms are a family of optimisation algorithms. They use mechanisms inspired by biological evolution, let’s start with some commonly used definitions in this field.

Chromosome (sometimes referred to as an Individual) - A chromosome can be thought of as a complete encoded solution to a problem. This does not mean the solution is a good solution. As an example, think of a robot with the task of exploring the ocean floor, if the robot just spins in a circle then although a solution, it’s not a good solution.

Technically speaking in complex problems an individual can be made of multiple chromosomes, but for simplicity this post will use these terms interchangeably.

Gene - Component in a solution, think of this as a single step taken to carry out a task. In our example moving forward/back/left/right could all be examples of a gene.

Population - A group of potential solutions (chromosomes).

Fitness/Fitness Function - The fitness is a calculable metric for how well a solution has performed its task. In our example the fitness could be calculated by how many discoveries have been made by the robot, i.e identified fish species, ground types, coral etc.

Note: For some problem types we are trying to maximise the fitness value and in other problems minimise i.e the aim of the algorithm is to find the smallest fitness such as in the travelling salesman problem.

Let’s expand on our example problem…

![[myplot 2.png]]

Let’s say, we have a robot (R) on the bottom of the sea, there are 5 coral. The corals’ positions are (1, 3), (4, 7), (6, 2), (8, 8), and (9, 5).

The robot’s task is to collect a sample from each of the corals whilst minimising the distance it has to travel. We can represent a potential solution as [C1,C2,C3,C4,C5] representing the order [(1, 3), (4, 7), (6, 2), (8, 8), (9, 5)].

The fitness of this solution can be calculated by summing the distances between each point, Assuming the starting position of the R as position (2,1) and moving through each coral in the order specified. In this case, our potential solution’s fitness value would be 22.11 units - remember we want to minimise this distance. In python this would look something like:

distance = math.sqrt((next_position[0] - current_position[0]) ** 2 +(next_position[1] - current_position[1]) ** 2)

The Genetic Algorithm Process

Now that we’ve set up our problem, let’s walk through the steps of how a genetic algorithm would approach solving it.

Initialisation

The first step in any genetic algorithm is to create an initial population of potential solutions. In our coral sampling problem, this would involve generating a set of random orderings of the coral positions.

For example, if we have a population size of 4, our initial population might look like this:

[(1, 3), (4, 7), (6, 2), (8, 8), (9, 5)]
[(6, 2), (9, 5), (1, 3), (8, 8), (4, 7)]
[(8, 8), (1, 3), (9, 5), (4, 7), (6, 2)]
[(4, 7), (6, 2), (8, 8), (1, 3), (9, 5)]

Each of these represents a possible path for our robot to follow when collecting coral samples. in a real world application our population size would be much larger.

Evaluation

Once we have our initial population, we need to evaluate how good each solution is. This is done using our fitness function. For our problem, we’ll calculate the total distance traveled for each path, starting from the robot’s initial position (0, 0).

Let’s say our fitness function calculates the following distances:

22.11 units
29.34 units
36.84 units
34.88 units

Remember, in this case, a lower distance is better so, solution 1 is currently our best.

Selection

Now that we know the fitness of each solution, we need to select which solutions will be used to create the next generation. There are several methods for doing this, but two common ones are:

K-Tournament Selection: Randomly select a K-sized group of solutions and choose the best one from that group, repeat until we have desired population size.
Roulette Wheel Selection: Solutions with better fitness have a higher chance of being selected, proportional to their fitness.

Using tournament selection with groups of 2, we might end up selecting solutions 1 and 4 to be “parents” for the next generation.

Selection Pressure

Selection pressure is a crucial concept in evolutionary algorithms, referring to the intensity with which the better solutions are chosen over the weaker ones. It plays a vital role in balancing broad exploration of the search area and exploitation within the algorithm.

High selection pressure means that only the top-performing solutions are chosen for reproduction, which can lead to faster convergence but might also cause the algorithm to get stuck in local optima. Conversely, low selection pressure allows for a more diverse range of solutions to be selected, fostering greater exploration but potentially drastically slowing down convergence.

In our coral sampling problem, if we apply a high selection pressure, we might see rapid improvements initially, but risk missing out on potentially better solutions that require more generations to evolve.

Crossover

Crossover is the process of combining parts of two parent solutions to create new offspring solutions. There are many ways to perform crossover, but for our problem, we might use a method called “One-Point Crossover”.

Let’s say we’re creating a new solution from parents 1 and 4:

Parent 1: [(1, 3), (4, 7), (6, 2), (8, 8), (9, 5)] Parent 4: [(4, 7), (6, 2), (8, 8), (1, 3), (9, 5)]

We select one point in Parent 1, everything after that point should be filled with Parent 4’s values that aren’t in the solution. Let’s select point 3.

New solution: [(1, 3), (4, 7), (6, 2),_ ,_]

Then we fill in the remaining positions with the elements from Parent 4 that aren’t already in our new solution, in the order they appear in Parent 4:

New solution: [(1, 3), (4, 7), (6, 2), (8, 8), (9, 5)]

Mutation

Mutation introduces small random changes to solutions. This helps maintain genetic diversity and prevents the algorithm from getting stuck in ’local optima’. A local optimum is a solution that is the best within a neighbouring set of solutions, but may not be the best possible solution overall, we call the best solution possible the global optimum. In our case, mutation might involve swapping the positions of two randomly chosen corals in a solution.

For example, our solution after crossover might mutate to:

[(1, 3), (9, 5), (6, 2), (8, 8), (4, 7)]

Here, (9, 5) and (4, 7) have swapped positions.

Other more random mutations can also be used but we risk losing a coral or point. For example if we randomly replace the (8, 8) gene with a (1, 3) gene then we would have an invalid solution as the (8, 8) gene is no longer present.

To prevent this we can either, check if the chromosome is invalid in our fitness function (then setting it to a low performing value) or we could implement a ‘repair function’, where we somehow fix an invalid function and make it valid against our task’s constraints. In our example we could find the first duplicate and replace it with the first ‘missing’ gene - then loop. This does risk lessening selection pressure which could lead to getting stuck in a local optima. A large part of GA design is balancing our task constraints with effective selection pressure.

Replacement

After creating new solutions through selection, crossover, and mutation, we need to form the next generation. There are different strategies for this, but a common one is to replace the entire old population with the new solutions. Some algorithms also use “elitism”, where the best solutions from the old population are guaranteed to survive to the next generation.

Termination

The process of evaluation, selection, crossover, mutation, and replacement is repeated for many generations. The algorithm terminates when it meets certain criteria, such as:

A fixed number of generations have passed
A solution with a satisfactory fitness has been found
The population has converged (solutions are very similar)
No improvement has been seen for a certain number of generations

In our coral sampling problem, we might stop when we find a path that’s shorter than a certain threshold, or after a fixed number of generations if we’re working with limited computational resources.

Practical Challenge: stayFit

The next part of this post is going to involve solving a Capture the Flag challenge, which is a challenge in which the goal is to get a flag or phrase/password that proves you have found the solution. I wrote a Genetic Algorithm (GA) challenge for ENUSEC’s LTDH 2024 CTF. You can also see my other challenge write ups here if you are interested.

The Challenge Description:

Welcome to EVOLVE, Darwin’s favourite gym, where our flags are exactly 30 random characters [a-z,A-Z,0-9] long! (NOT including the ’ltdh{’ # and ‘}’! The lower the rep count, the better! Can you find the correct flag? Connection Info: NC IP Port

In order to be able to solve this problem we need to create a genetic algorithm to solve it. We can manually write all the code for a evolutionary algorithm (and if you’re interested, take a look at my EA in C project on GitHub) but often it is easier to use a library like DEAP for Python or Jenetics for Java. For the ‘solution’ to the challenge I will be using Python.

The challenge description is written to hint at a few things. ‘Evolve’, ‘Darwin’, ‘Fitness’ were all hints to suggest needing to use genetic algorithms. Next we also know we are looking for 30 random characters in the [a-z,A-Z,0-9] set. Users were also given an IP and Port where they can connect to using netcat - this is common for CTF challenges so it is expected players would understand this. Upon connecting to the server using netcat, users are presented with the following:

Enter the flag:

Once you enter a 30 character (plus the ltdh{} flag format) string you are sent back a number, the closer to the flag the higher the value you get in response (aiming for higher reps). This is effectively our fitness function.

I recommend reading the DEAP documentation and trying out the challenge for yourself however, I will provide the full solution code below. So we can run it offline I will be calling a verify function from another python file as opposed to calling a remote server, however, it is the exact same as the code that ran on the remote server for the CTF at LTDH. The ‘verify’/fitness function is as follows:

def fitness(input,target="ltdh{c29faG93X2RvX3lvdV9zdGF5X2ZpdA}"):  
    if len(target) > len(input):  
        return -1  
    fitness = 0  
    for i in range(len(target)):  
        if target[i] == input[i]:  
            fitness += 1  
    return fitness / len(target) * 100

The full solution code:

import random  
import verify  
from deap import base, creator, tools, algorithms  
import numpy  
  
# Allowed characters in the flags  
ALLOWED_CHARS = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789{}"  
LENGTH = 36  
  
# Define the fitness function  
def fitness(individual):  
    value = verify.fitness(individual)  
    return value,  
  
# Define the individual initialization function  
def individual():  
    return [random.choice(ALLOWED_CHARS) for _ in range(LENGTH)]  
  
# Setting up the DEAP framework for our problem  
creator.create("FitnessMax", base.Fitness, weights=(1.0,))  
creator.create("Individual", list, fitness=creator.FitnessMax)  
  
toolbox = base.Toolbox()  
toolbox.register("individual", tools.initIterate, creator.Individual, individual)  
toolbox.register("population", tools.initRepeat, list, toolbox.individual)  
toolbox.register("evaluate", fitness)  
toolbox.register("mate", tools.cxTwoPoint)  
  
# Custom mutation function to change a random character  
def mutate(individual):  
    index = random.randrange(len(individual))  
    individual[index] = random.choice(ALLOWED_CHARS)  
    return individual,  
  
toolbox.register("mutate", mutate)  
toolbox.register("select", tools.selTournament, tournsize=3)  
  
# Genetic Algorithm constants  
POPULATION_SIZE = 100  
GENERATIONS = 700  
  
def main():  
    random.seed(36)  
    pop = toolbox.population(n=POPULATION_SIZE)  
    hof = tools.HallOfFame(1)  
  
    stats = tools.Statistics(lambda ind: ind.fitness.values)  
    stats.register("avg", numpy.mean)  
    stats.register("min", numpy.min)  
    stats.register("max", numpy.max)  
  
    algorithms.eaSimple(pop, toolbox, cxpb=0.6, mutpb=0.15, ngen=GENERATIONS,  
                        stats=stats, halloffame=hof, verbose=True)  
  
    return pop, stats, hof  
  
if __name__ == "__main__":  
    population, stats, hof = main()  
    print("Best individual is:", "".join(hof[0]), hof[0].fitness.values)

Process Overview:

Initialisation: We start with a population of 100 random solutions.
Evaluation: Each solution is evaluated based on how close it is to the target flag.
Selection: The best solutions are selected to be parents for the next generation using tournament selection.
Crossover: Selected parents are crossed over to produce new offspring.
Mutation: Offspring undergo mutation to introduce variability.
Replacement: The old population is replaced with the new offspring.
Termination: The process repeats for 700 generations or until a satisfactory solution is found.

After this has run we get back the correct flag: ltdh{c29faG93X2RvX3lvdV9zdGF5X2ZpdA} which is then verified by the CTFd website and the players who solved it were awarded points.

Conclusion

Evolutionary Algorithms (EAs) find applications across numerous fields beyond theoretical optimisation problems. Such as: Aerospace Engineering, Employed in the optimisation of airfoil designs to improve aerodynamic efficiency (see Airfoil Optimisation). Robotics, Autonomous Robot Evolution (see ARE Project). Video Games, NPC (Non-Player Character) behaviour optimisation (see EvolvingBehavior).

By leveraging the power of evolutionary algorithms and the DEAP library, we can effectively solve complex optimisation problems like the EVOLVE challenge. This approach demonstrates the flexibility and robustness of genetic algorithms in finding solutions where traditional methods may struggle.

Interested in seeing some more CTF challenge write ups? Click here.