This article was published as part of the Data Science Blogathon
Genetic algorithms are search algorithms inspired by Darwin's theory of evolution in nature.
-
By simulating the process of natural selection, reproduction and mutation, genetic algorithms can produce high-quality solutions to various problems, including search and optimization.
-
Through the effective use of the Theory of Evolution, genetic algorithms can overcome the problems faced by traditional algorithms.
According to Darwin's theory of evolution, an evolution maintains a population of individuals that vary from each other (variation). Those who are better adapted to their environment have a better chance of survival, reproduce and pass on their traits to the next generation (survival of the fittest).
How do genetic algorithms work?
Before entering into the operation of a genetic algorithm, Let's dive into the basic terminology of genetic algorithms.
Chromosome / individual
A chromosome is a collection of genes. For instance, a chromosome can be represented as a binary string where each bit is a gene.
Image source: practical genetic algorithm with Python, Eyal Wirsansky
population
Since an individual is represented as a chromosome, a population is a collection of such chromosomes.
Image source: practical genetic algorithm with Python, Eyal Wirsansky
Fitness function
In each iteration, individuals are assessed based on their fitness scores that are calculated using the fitness function.
People who achieve a better fitness score represent better solutions and are more likely to be chosen to cross over and pass on to the next generation.
For instance, whether genetic algorithms are used for feature selection, then the precision of the model with those selected characteristics would be the adequacy function if it were a classification problem.
selection
After calculating the fitness of each individual in the population, a selection process is used to determine which of the individuals in the population will be able to reproduce and create the offspring that will form the next generation.
Types of selection methods available,
- Roulette wheel selection
- Tournament selection
- Range-based selection
Transversal
Generally, two individuals of the current generation are chosen and their genes are exchanged between two individuals to create a new individual that represents the offspring. This process is also called mating or crossing..
Types of cross methods available,
- Crossing a point
- Two-point crossover
- Uniform crossover
Mutation
Mutation is a random change on a chromosome to introduce new patterns on a chromosome. For instance, flip a bit in a binary string.
Types of mutation methods available,
- Flip bit mutation
- Gaussian mutation
- Exchange mutation
General workflow of a simple genetic algorithm
Image source: practical genetic algorithms with Python, Eyal Wirsansky
How are genetic algorithms different from traditional algorithms?
-
A search space is a set of all possible solutions to the problem. Traditional algorithms keep only one set in a search space, whereas genetic algorithms use multiple sets in a search space (feature selection using RFE versus genetic algorithms).
-
Traditional algorithms require more information to perform a search, whereas genetic algorithms only require an objective function to calculate the fitness of an individual.
-
Traditional algorithms cannot work in parallel, while genetic algorithms can work in parallel (the calculation of the fitness of individuals is independent).
-
A big difference in genetic algorithms is that instead of directly operating on candidate solutions, genetic algorithms operate on their representations (or encoding), often called chromosomes.
-
Traditional algorithms can only produce a solution in the end, while in genetic algorithms multiple optimal solutions of different generations can be obtained.
-
Traditional algorithms are no more likely to produce global optimal solutions, genetic algorithms are not guaranteed to produce global optimal solutions as well, but they are more likely to produce global optimal solutions due to genetic operators such as crossover and mutation.
-
Traditional algorithms are deterministic in nature, while genetic algorithms are probabilistic and stochastic in nature.
-
Real world problems are multimodal (contain multiple optimal solutions locally), traditional algorithms do not handle these problems well while genetic algorithms, con la configuración de parametersThe "parameters" are variables or criteria that are used to define, measure or evaluate a phenomenon or system. In various fields such as statistics, Computer Science and Scientific Research, Parameters are critical to establishing norms and standards that guide data analysis and interpretation. Their proper selection and handling are crucial to obtain accurate and relevant results in any study or project.... correcta, can handle these problems very well due to the large solution space.
Advantages of genetic algorithms
-
Parallelism
-
Global optimization
-
A larger set of space for solutions
-
Requires less information
-
Provides multiple optimal solutions
-
Probabilist in nature
-
Genetic representations using chromosomes
Disadvantages of genetic algorithms
Use cases of genetic algorithms in machine learning
TPOT(Tree-based pipeline optimization) is an Auto-ML framework that uses genetic algorithms to optimize machine learning pipelines using the genetic algorithm framework called DEAP (Distributed evolutionary algorithms in Python).
Selection of characteristics using genetic algorithms
Using genetic algorithms, let's perform feature selection on a wine dataset available in sklearn.
import random import pandas as pd from sklearn.base import clone from deap.algorithms import eaSimple from deap import base, creator, tools from sklearn.datasets import load_wine from sklearn.metrics import accuracy_score from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import train_test_split data = load_wine() features = pd.DataFrame( data.data, columns=data.feature_names ) target = pd.Series(data.target) # split the data into train and test x_train, x_test, y_train, y_test = train_test_split( features, target, test_size=0.2, stratify=target ) model = RandomForestClassifier()
Divida los datos en características y destino y luego divida las características y el destino para trainingTraining is a systematic process designed to improve skills, physical knowledge or abilities. It is applied in various areas, like sport, Education and professional development. An effective training program includes goal planning, regular practice and evaluation of progress. Adaptation to individual needs and motivation are key factors in achieving successful and sustainable results in any discipline.... y prueba.
def compute_fitness_score(individual): """ Select the features from the individual, train and compute the accuracy_score. Example: individual = [0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1] The 1 represents the presence of features and 0 represents the absence of features """ column_support = pd.Series(individual).astype(bool) global x_train, y_train, x_test, y_test, model x_train_ = x_train[x_train.columns[column_support]] x_test_ = x_test[x_test.columns[column_support]] model.fit(x_train_, y_train) y_pred = model.predict(x_test_) score = accuracy_score(y_test, y_pred) return score,
Define the fitness score. compute_fitness_score takes an individual as input, for instance, let's consider the following individual [0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1], on this list, 1 represents the presence of that particular characteristic and 0 represents the absence of that characteristic. The columns are selected based on the individual and then the model is fitted and the precision score is calculated. This accuracy score is the physical fitness score of an individual.
n_genes = x_train.shape[1] n_generations = 10 n_population = 10 crossover_probability = 0.6 mutation_probability = 0.2 def setup_toolbox(): # container for individuals creator.create('FitnessMax', base.Fitness, weights=(1.0,)) creator.create('Individual', list, fitness=creator.FitnessMax) toolbox = base.Toolbox() toolbox.register( 'individual_generator_function', random.randint, 0, 1 ) # method to populate individual toolbox.register( 'individual_generator', tools.initRepeat, creator.Individual, toolbox.individual_generator_function, n_genes ) # method to create population toolbox.register( 'population_generator', tools.initRepeat, list, toolbox.individual_generator ) # fitness calculation toolbox.register( 'evaluate', compute_fitness_score ) # selection toolbox.register( 'select', tools.selTournament, tournsize=3 ) # crossover toolbox.register('mate', tools.cxOnePoint) # mutation toolbox.register( 'mutate', tools.mutFlipBit, indpb=mutation_probability ) return toolbox
n_genes represent the number of genes in an individual that is equal to the number of characteristics, n_generations represents the number of generations that is equal to 10 and also n_population which represents the population number. The crossover probability is set to 0,6 and the mutation probability is set to 0,2.
Toolbox is a class in a deap framework that provides us with the necessary utilities to define individuals, populations and much more.
The steps performed in the setup_toolbox method are,
- First, we define an individual who has an aptitude score associated with him
- Then, we define a function that generates genes of an individual, in our case 0 O 1 in an individual
- We define a function that generates individuals from a population, for instance, [0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1] is an individual in a population.
- Then, we define a function that generates a population, for instance, [[0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1],
[0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1]]is a population of size 2.
-
Then, we define a function for evaluation, in our case we use the compute_fitness_score function.
-
We choose the tournament selection method for the selection process, a crossover point for the crossover process and a shift bit mutation for the mutation process.
# setup deap toolbox toolbox = setup_toolbox() # create a population population = toolbox.population_generator(n_population) # A simple evolutionary algorithm final_population, logbook = eaSimple( population, toolbox, crossover_probability, mutation_probability, n_generations )
We first generate a population of individuals of size 10. Then we call the eaSimple method in depth, that performs a simple genetic algorithm.
The arguments of the eaSimple function,
- The first argument is the population
- toolbox is the instance of the Toolbox class from deap
- The probability of crossing denotes the probability that a pair of individuals will cross
- Mutation probability denotes the probability that an individual will mutate
- n_generations denotes the number of generations to evolve.
The eaSimple algorithm works as follows,
- The fitness of individuals in a population is calculated
- Select the fittest individual using the tournament selection method.
- Crossing of individuals using the one-point crossing method
- Mutate individuals using the flip bit mutation
- Recalculate the fitness of individuals.
- Follow the steps 2 a 5 until reaching the desired number of generations.
Thus, can perform feature selection using a genetic algorithm.
References
[1] Practical Genetic Algorithms with Python, Eyal Wirsansky
Thanks!
The media shown in this article is not the property of DataPeaker and is used at the author's discretion.