search.ipynb

       "</head>\n",
       "<body>\n",
       "<h2></h2>\n",
       "\n",
       "<div class=\"highlight\"><pre><span></span><span class=\"k\">def</span> <span class=\"nf\">init_population</span><span class=\"p\">(</span><span class=\"n\">pop_number</span><span class=\"p\">,</span> <span class=\"n\">gene_pool</span><span class=\"p\">,</span> <span class=\"n\">state_length</span><span class=\"p\">):</span>\n",
       "    <span class=\"sd\">&quot;&quot;&quot;Initializes population for genetic algorithm</span>\n",
       "<span class=\"sd\">    pop_number  :  Number of individuals in population</span>\n",
       "<span class=\"sd\">    gene_pool   :  List of possible values for individuals</span>\n",
       "<span class=\"sd\">    state_length:  The length of each individual&quot;&quot;&quot;</span>\n",
       "    <span class=\"n\">g</span> <span class=\"o\">=</span> <span class=\"nb\">len</span><span class=\"p\">(</span><span class=\"n\">gene_pool</span><span class=\"p\">)</span>\n",
       "    <span class=\"n\">population</span> <span class=\"o\">=</span> <span class=\"p\">[]</span>\n",
       "    <span class=\"k\">for</span> <span class=\"n\">i</span> <span class=\"ow\">in</span> <span class=\"nb\">range</span><span class=\"p\">(</span><span class=\"n\">pop_number</span><span class=\"p\">):</span>\n",
       "        <span class=\"n\">new_individual</span> <span class=\"o\">=</span> <span class=\"p\">[</span><span class=\"n\">gene_pool</span><span class=\"p\">[</span><span class=\"n\">random</span><span class=\"o\">.</span><span class=\"n\">randrange</span><span class=\"p\">(</span><span class=\"mi\">0</span><span class=\"p\">,</span> <span class=\"n\">g</span><span class=\"p\">)]</span> <span class=\"k\">for</span> <span class=\"n\">j</span> <span class=\"ow\">in</span> <span class=\"nb\">range</span><span class=\"p\">(</span><span class=\"n\">state_length</span><span class=\"p\">)]</span>\n",
       "        <span class=\"n\">population</span><span class=\"o\">.</span><span class=\"n\">append</span><span class=\"p\">(</span><span class=\"n\">new_individual</span><span class=\"p\">)</span>\n",
       "\n",
       "    <span class=\"k\">return</span> <span class=\"n\">population</span>\n",
       "</pre></div>\n",
       "</body>\n",
       "</html>\n"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "psource(init_population)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The function takes as input the number of individuals in the population, the gene pool and the length of each individual/state. It creates individuals with random genes and returns the population when done."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Explanation\n",
    "\n",
    "Before we solve problems using the genetic algorithm, we will explain how to intuitively understand the algorithm using a trivial example.\n",
    "\n",
    "#### Generating Phrases\n",
    "\n",
    "In this problem, we use a genetic algorithm to generate a particular target phrase from a population of random strings. This is a classic example that helps build intuition about how to use this algorithm in other problems as well. Before we break the problem down, let us try to brute force the solution. Let us say that we want to generate the phrase \"genetic algorithm\". The phrase is 17 characters long. We can use any character from the 26 lowercase characters and the space character. To generate a random phrase of length 17, each space can be filled in 27 ways. So the total number of possible phrases is\n",
    "\n",
    "$$ 27^{17} = 2153693963075557766310747 $$\n",
    "\n",
    "which is a massive number. If we wanted to generate the phrase \"Genetic Algorithm\", we would also have to include all the 26 uppercase characters into consideration thereby increasing the sample space from 27 characters to 53 characters and the total number of possible phrases then would be\n",
    "\n",
    "$$ 53^{17} = 205442259656281392806087233013 $$\n",
    "\n",
    "If we wanted to include punctuations and numerals into the sample space, we would have further complicated an already impossible problem. Hence, brute forcing is not an option. Now we'll apply the genetic algorithm and see how it significantly reduces the search space. We essentially want to *evolve* our population of random strings so that they better approximate the target phrase as the number of generations increase. Genetic algorithms work on the principle of Darwinian Natural Selection according to which, there are three key concepts that need to be in place for evolution to happen. They are:\n",
    "\n",
    "* **Heredity**: There must be a process in place by which children receive the properties of their parents. <br> \n",
    "For this particular problem, two strings from the population will be chosen as parents and will be split at a random index and recombined as described in the `recombine` function to create a child. This child string will then be added to the new generation.\n",
    "\n",
    "\n",
    "* **Variation**: There must be a variety of traits present in the population or a means with which to introduce variation. <br>If there is no variation in the sample space, we might never reach the global optimum. To ensure that there is enough variation, we can initialize a large population, but this gets computationally expensive as the population gets larger. Hence, we often use another method called mutation. In this method, we randomly change one or more characters of some strings in the population based on a predefined probability value called the mutation rate or mutation probability as described in the `mutate` function. The mutation rate is usually kept quite low. A mutation rate of zero fails to introduce variation in the population and a high mutation rate (say 50%) is as good as a coin flip and the population fails to benefit from the previous recombinations. An optimum balance has to be maintained between population size and mutation rate so as to reduce the computational cost as well as have sufficient variation in the population.\n",
    "\n",
    "\n",
    "* **Selection**: There must be some mechanism by which some members of the population have the opportunity to be parents and pass down their genetic information and some do not. This is typically referred to as \"survival of the fittest\". <br>\n",
    "There has to be some way of determining which phrases in our population have a better chance of eventually evolving into the target phrase. This is done by introducing a fitness function that calculates how close the generated phrase is to the target phrase. The function will simply return a scalar value corresponding to the number of matching characters between the generated phrase and the target phrase."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Before solving the problem, we first need to define our target phrase."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "target = 'Genetic Algorithm'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "We then need to define our gene pool, i.e the elements which an individual from the population might comprise of. Here, the gene pool contains all uppercase and lowercase letters of the English alphabet and the space character."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# The ASCII values of uppercase characters ranges from 65 to 91\n",
    "u_case = [chr(x) for x in range(65, 91)]\n",
    "# The ASCII values of lowercase characters ranges from 97 to 123\n",
    "l_case = [chr(x) for x in range(97, 123)]\n",
    "\n",
    "gene_pool = []\n",
    "gene_pool.extend(u_case) # adds the uppercase list to the gene pool\n",
    "gene_pool.extend(l_case) # adds the lowercase list to the gene pool\n",
    "gene_pool.append(' ')    # adds the space character to the gene pool"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We now need to define the maximum size of each population. Larger populations have more variation but are computationally more  expensive to run algorithms on."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "max_population = 100"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As our population is not very large, we can afford to keep a relatively large mutation rate."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "mutation_rate = 0.07 # 7%"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Great! Now, we need to define the most important metric for the genetic algorithm, i.e the fitness function. This will simply return the number of matching characters between the generated sample and the target phrase."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def fitness_fn(sample):\n",
    "    # initialize fitness to 0\n",
    "    fitness = 0\n",
    "    for i in range(len(sample)):\n",
    "        # increment fitness by 1 for every matching character\n",
    "        if sample[i] == target[i]:\n",
    "            fitness += 1\n",
    "    return fitness"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Before we run our genetic algorithm, we need to initialize a random population. We will use the `init_population` function to do this. We need to pass in the maximum population size, the gene pool and the length of each individual, which in this case will be the same as the length of the target phrase."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "population = init_population(max_population, gene_pool, len(target))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will now define how the individuals in the population should change as the number of generations increases. First, the `select` function will be run on the population to select *two* individuals with high fitness values. These will be the parents which will then be recombined using the `recombine` function to generate the child."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "parents = select(2, population, fitness_fn) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# The recombine function takes two parents as arguments, so we need to unpack the previous variable\n",
    "child = recombine(*parents)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we need to apply a mutation according to the mutation rate. We call the `mutate` function on the child with the gene pool and mutation rate as the additional arguments."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "child = mutate(child, gene_pool, mutation_rate)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The above lines can be condensed into\n",
    "\n",
    "`child = mutate(recombine(*select(2, population, fitness_fn)), gene_pool, mutation_rate)`\n",
    "\n",
    "And, we need to do this `for` every individual in the current population to generate the new population."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "population = [mutate(recombine(*select(2, population, fitness_fn)), gene_pool, mutation_rate) for i in range(len(population))]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The individual with the highest fitness can then be found using the `max` function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "current_best = max(population, key=fitness_fn)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's print this out"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['j', 'F', 'm', 'F', 'N', 'i', 'c', 'v', 'm', 'j', 'V', 'o', 'd', 'r', 't', 'V', 'H']\n"
     ]
    }
   ],
   "source": [
    "print(current_best)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We see that this is a list of characters. This can be converted to a string using the join function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "jFmFNicvmjVodrtVH\n"
     ]
    }
   ],
   "source": [
    "current_best_string = ''.join(current_best)\n",
    "print(current_best_string)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We now need to define the conditions to terminate the algorithm. This can happen in two ways\n",
    "1. Termination after a predefined number of generations\n",
    "2. Termination when the fitness of the best individual of the current generation reaches a predefined threshold value.\n",
    "\n",
    "We define these variables below"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "ngen = 1200 # maximum number of generations\n",
    "# we set the threshold fitness equal to the length of the target phrase\n",
    "# i.e the algorithm only terminates whne it has got all the characters correct \n",
    "# or it has completed 'ngen' number of generations\n",
    "f_thres = len(target)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "To generate `ngen` number of generations, we run a `for` loop `ngen` number of times. After each generation, we calculate the fitness of the best individual of the generation and compare it to the value of `f_thres` using the `fitness_threshold` function. After every generation, we print out the best individual of the generation and the corresponding fitness value. Lets now write a function to do this."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def genetic_algorithm_stepwise(population, fitness_fn, gene_pool=[0, 1], f_thres=None, ngen=1200, pmut=0.1):\n",
    "    for generation in range(ngen):\n",
    "        population = [mutate(recombine(*select(2, population, fitness_fn)), gene_pool, pmut) for i in range(len(population))]\n",
    "        # stores the individual genome with the highest fitness in the current population\n",
    "        current_best = ''.join(max(population, key=fitness_fn))\n",
    "        print(f'Current best: {current_best}\\t\\tGeneration: {str(generation)}\\t\\tFitness: {fitness_fn(current_best)}\\r', end='')\n",
    "        \n",
    "        # compare the fitness of the current best individual to f_thres\n",
    "        fittest_individual = fitness_threshold(fitness_fn, f_thres, population)\n",
    "        \n",
    "        # if fitness is greater than or equal to f_thres, we terminate the algorithm\n",
    "        if fittest_individual:\n",
    "            return fittest_individual, generation\n",
    "    return max(population, key=fitness_fn) , generation       "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The function defined above is essentially the same as the one defined in `search.py` with the added functionality of printing out the data of each generation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.01//EN\"\n",
       "   \"http://www.w3.org/TR/html4/strict.dtd\">\n",
       "\n",
       "<html>\n",
       "<head>\n",
       "  <title></title>\n",
       "  <meta http-equiv=\"content-type\" content=\"text/html; charset=None\">\n",
       "  <style type=\"text/css\">\n",
       "td.linenos { background-color: #f0f0f0; padding-right: 10px; }\n",
       "span.lineno { background-color: #f0f0f0; padding: 0 5px 0 5px; }\n",
       "pre { line-height: 125%; }\n",
       "body .hll { background-color: #ffffcc }\n",
       "body  { background: #f8f8f8; }\n",
       "body .c { color: #408080; font-style: italic } /* Comment */\n",
       "body .err { border: 1px solid #FF0000 } /* Error */\n",
       "body .k { color: #008000; font-weight: bold } /* Keyword */\n",
       "body .o { color: #666666 } /* Operator */\n",
       "body .ch { color: #408080; font-style: italic } /* Comment.Hashbang */\n",
       "body .cm { color: #408080; font-style: italic } /* Comment.Multiline */\n",
       "body .cp { color: #BC7A00 } /* Comment.Preproc */\n",
       "body .cpf { color: #408080; font-style: italic } /* Comment.PreprocFile */\n",
       "body .c1 { color: #408080; font-style: italic } /* Comment.Single */\n",
       "body .cs { color: #408080; font-style: italic } /* Comment.Special */\n",
       "body .gd { color: #A00000 } /* Generic.Deleted */\n",
       "body .ge { font-style: italic } /* Generic.Emph */\n",
       "body .gr { color: #FF0000 } /* Generic.Error */\n",
       "body .gh { color: #000080; font-weight: bold } /* Generic.Heading */\n",
       "body .gi { color: #00A000 } /* Generic.Inserted */\n",
       "body .go { color: #888888 } /* Generic.Output */\n",
       "body .gp { color: #000080; font-weight: bold } /* Generic.Prompt */\n",
       "body .gs { font-weight: bold } /* Generic.Strong */\n",
       "body .gu { color: #800080; font-weight: bold } /* Generic.Subheading */\n",
       "body .gt { color: #0044DD } /* Generic.Traceback */\n",
       "body .kc { color: #008000; font-weight: bold } /* Keyword.Constant */\n",
       "body .kd { color: #008000; font-weight: bold } /* Keyword.Declaration */\n",
       "body .kn { color: #008000; font-weight: bold } /* Keyword.Namespace */\n",
       "body .kp { color: #008000 } /* Keyword.Pseudo */\n",
       "body .kr { color: #008000; font-weight: bold } /* Keyword.Reserved */\n",
       "body .kt { color: #B00040 } /* Keyword.Type */\n",
       "body .m { color: #666666 } /* Literal.Number */\n",
       "body .s { color: #BA2121 } /* Literal.String */\n",
       "body .na { color: #7D9029 } /* Name.Attribute */\n",
       "body .nb { color: #008000 } /* Name.Builtin */\n",
       "body .nc { color: #0000FF; font-weight: bold } /* Name.Class */\n",
       "body .no { color: #880000 } /* Name.Constant */\n",
       "body .nd { color: #AA22FF } /* Name.Decorator */\n",
       "body .ni { color: #999999; font-weight: bold } /* Name.Entity */\n",
       "body .ne { color: #D2413A; font-weight: bold } /* Name.Exception */\n",
       "body .nf { color: #0000FF } /* Name.Function */\n",
       "body .nl { color: #A0A000 } /* Name.Label */\n",
       "body .nn { color: #0000FF; font-weight: bold } /* Name.Namespace */\n",
       "body .nt { color: #008000; font-weight: bold } /* Name.Tag */\n",
       "body .nv { color: #19177C } /* Name.Variable */\n",
       "body .ow { color: #AA22FF; font-weight: bold } /* Operator.Word */\n",
       "body .w { color: #bbbbbb } /* Text.Whitespace */\n",
       "body .mb { color: #666666 } /* Literal.Number.Bin */\n",
       "body .mf { color: #666666 } /* Literal.Number.Float */\n",
       "body .mh { color: #666666 } /* Literal.Number.Hex */\n",
       "body .mi { color: #666666 } /* Literal.Number.Integer */\n",
       "body .mo { color: #666666 } /* Literal.Number.Oct */\n",
       "body .sa { color: #BA2121 } /* Literal.String.Affix */\n",
       "body .sb { color: #BA2121 } /* Literal.String.Backtick */\n",
       "body .sc { color: #BA2121 } /* Literal.String.Char */\n",
       "body .dl { color: #BA2121 } /* Literal.String.Delimiter */\n",
       "body .sd { color: #BA2121; font-style: italic } /* Literal.String.Doc */\n",
       "body .s2 { color: #BA2121 } /* Literal.String.Double */\n",
       "body .se { color: #BB6622; font-weight: bold } /* Literal.String.Escape */\n",
       "body .sh { color: #BA2121 } /* Literal.String.Heredoc */\n",
       "body .si { color: #BB6688; font-weight: bold } /* Literal.String.Interpol */\n",
       "body .sx { color: #008000 } /* Literal.String.Other */\n",
       "body .sr { color: #BB6688 } /* Literal.String.Regex */\n",
       "body .s1 { color: #BA2121 } /* Literal.String.Single */\n",
       "body .ss { color: #19177C } /* Literal.String.Symbol */\n",
       "body .bp { color: #008000 } /* Name.Builtin.Pseudo */\n",
       "body .fm { color: #0000FF } /* Name.Function.Magic */\n",
       "body .vc { color: #19177C } /* Name.Variable.Class */\n",
       "body .vg { color: #19177C } /* Name.Variable.Global */\n",
       "body .vi { color: #19177C } /* Name.Variable.Instance */\n",
       "body .vm { color: #19177C } /* Name.Variable.Magic */\n",
       "body .il { color: #666666 } /* Literal.Number.Integer.Long */\n",
       "\n",
       "  </style>\n",
       "</head>\n",
       "<body>\n",
       "<h2></h2>\n",
       "\n",
       "<div class=\"highlight\"><pre><span></span><span class=\"k\">def</span> <span class=\"nf\">genetic_algorithm</span><span class=\"p\">(</span><span class=\"n\">population</span><span class=\"p\">,</span> <span class=\"n\">fitness_fn</span><span class=\"p\">,</span> <span class=\"n\">gene_pool</span><span class=\"o\">=</span><span class=\"p\">[</span><span class=\"mi\">0</span><span class=\"p\">,</span> <span class=\"mi\">1</span><span class=\"p\">],</span> <span class=\"n\">f_thres</span><span class=\"o\">=</span><span class=\"bp\">None</span><span class=\"p\">,</span> <span class=\"n\">ngen</span><span class=\"o\">=</span><span class=\"mi\">1000</span><span class=\"p\">,</span> <span class=\"n\">pmut</span><span class=\"o\">=</span><span class=\"mf\">0.1</span><span class=\"p\">):</span>\n",
       "    <span class=\"sd\">&quot;&quot;&quot;[Figure 4.8]&quot;&quot;&quot;</span>\n",
       "    <span class=\"k\">for</span> <span class=\"n\">i</span> <span class=\"ow\">in</span> <span class=\"nb\">range</span><span class=\"p\">(</span><span class=\"n\">ngen</span><span class=\"p\">):</span>\n",
       "        <span class=\"n\">population</span> <span class=\"o\">=</span> <span class=\"p\">[</span><span class=\"n\">mutate</span><span class=\"p\">(</span><span class=\"n\">recombine</span><span class=\"p\">(</span><span class=\"o\">*</span><span class=\"n\">select</span><span class=\"p\">(</span><span class=\"mi\">2</span><span class=\"p\">,</span> <span class=\"n\">population</span><span class=\"p\">,</span> <span class=\"n\">fitness_fn</span><span class=\"p\">)),</span> <span class=\"n\">gene_pool</span><span class=\"p\">,</span> <span class=\"n\">pmut</span><span class=\"p\">)</span>\n",
       "                      <span class=\"k\">for</span> <span class=\"n\">i</span> <span class=\"ow\">in</span> <span class=\"nb\">range</span><span class=\"p\">(</span><span class=\"nb\">len</span><span class=\"p\">(</span><span class=\"n\">population</span><span class=\"p\">))]</span>\n",
       "\n",
       "        <span class=\"n\">fittest_individual</span> <span class=\"o\">=</span> <span class=\"n\">fitness_threshold</span><span class=\"p\">(</span><span class=\"n\">fitness_fn</span><span class=\"p\">,</span> <span class=\"n\">f_thres</span><span class=\"p\">,</span> <span class=\"n\">population</span><span class=\"p\">)</span>\n",
       "        <span class=\"k\">if</span> <span class=\"n\">fittest_individual</span><span class=\"p\">:</span>\n",
       "            <span class=\"k\">return</span> <span class=\"n\">fittest_individual</span>\n",
       "\n",
       "\n",
       "    <span class=\"k\">return</span> <span class=\"n\">argmax</span><span class=\"p\">(</span><span class=\"n\">population</span><span class=\"p\">,</span> <span class=\"n\">key</span><span class=\"o\">=</span><span class=\"n\">fitness_fn</span><span class=\"p\">)</span>\n",
       "</pre></div>\n",
       "</body>\n",
       "</html>\n"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "psource(genetic_algorithm)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We have defined all the required functions and variables. Let's now create a new population and test the function we wrote above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Current best: Genetic Algorithm\t\tGeneration: 472\t\tFitness: 17\r"
     ]
    }
   ],
   "source": [
    "population = init_population(max_population, gene_pool, len(target))\n",
    "solution, generations = genetic_algorithm_stepwise(population, fitness_fn, gene_pool, f_thres, ngen, mutation_rate)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The genetic algorithm was able to converge!\n",
    "We implore you to rerun the above cell and play around with `target, max_population, f_thres, ngen` etc parameters to get a better intuition of how the algorithm works. To summarize, if we can define the problem states in simple array format and if we can create a fitness function to gauge how good or bad our approximate solutions are, there is a high chance that we can get a satisfactory solution using a genetic algorithm. \n",
    "- There is also a better GUI version of this program `genetic_algorithm_example.py` in the GUI folder for you to play around with."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Usage\n",
    "\n",
    "Below we give two example usages for the genetic algorithm, for a graph coloring problem and the 8 queens problem.\n",
    "\n",
    "#### Graph Coloring\n",
    "\n",
    "First we will take on the simpler problem of coloring a small graph with two colors. Before we do anything, let's imagine how a solution might look. First, we have to represent our colors. Say, 'R' for red and 'G' for green. These make up our gene pool. What of the individual solutions though? For that, we will look at our problem. We stated we have a graph. A graph has nodes and edges, and we want to color the nodes. Naturally, we want to store each node's color. If we have four nodes, we can store their colors in a list of genes, one for each node. A possible solution will then look like this: ['R', 'R', 'G', 'R']. In the general case, we will represent each solution with a list of chars ('R' and 'G'), with length the number of nodes.\n",
    "\n",
    "Next we need to come up with a fitness function that appropriately scores individuals. Again, we will look at the problem definition at hand. We want to color a graph. For a solution to be optimal, no edge should connect two nodes of the same color. How can we use this information to score a solution? A naive (and ineffective) approach would be to count the different colors in the string. So ['R', 'R', 'R', 'R'] has a score of 1 and ['R', 'R', 'G', 'G'] has a score of 2. Why that fitness function is not ideal though? Why, we forgot the information about the edges! The edges are pivotal to the problem and the above function only deals with node colors. We didn't use all the information at hand and ended up with an ineffective answer. How, then, can we use that information to our advantage?\n",
    "\n",
    "We said that the optimal solution will have all the edges connecting nodes of different color. So, to score a solution we can count how many edges are valid (aka connecting nodes of different color). That is a great fitness function!\n",
    "\n",
    "Let's jump into solving this problem using the `genetic_algorithm` function."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First we need to represent the graph. Since we mostly need information about edges, we will just store the edges. We will denote edges with capital letters and nodes with integers:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "edges = {\n",
    "    'A': [0, 1],\n",
    "    'B': [0, 3],\n",
    "    'C': [1, 2],\n",
    "    'D': [2, 3]\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Edge 'A' connects nodes 0 and 1, edge 'B' connects nodes 0 and 3 etc.\n",
    "\n",
    "We already said our gene pool is 'R' and 'G', so we can jump right into initializing our population. Since we have only four nodes, `state_length` should be 4. For the number of individuals, we will try 8. We can increase this number if we need higher accuracy, but be careful! Larger populations need more computating power and take longer. You need to strike that sweet balance between accuracy and cost (the ultimate dilemma of the programmer!)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[['R', 'G', 'G', 'R'], ['R', 'G', 'R', 'R'], ['G', 'R', 'G', 'R'], ['R', 'G', 'R', 'G'], ['G', 'R', 'R', 'G'], ['G', 'R', 'G', 'R'], ['G', 'R', 'R', 'R'], ['R', 'G', 'G', 'G']]\n"
     ]
    }
   ],
   "source": [
    "population = init_population(8, ['R', 'G'], 4)\n",
    "print(population)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We created and printed the population. You can see that the genes in the individuals are random and there are 8 individuals each with 4 genes.\n",
    "\n",
    "Next we need to write our fitness function. We previously said we want the function to count how many edges are valid. So, given a coloring/individual `c`, we will do just that:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def fitness(c):\n",
    "    return sum(c[n1] != c[n2] for (n1, n2) in edges.values())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Great! Now we will run the genetic algorithm and see what solution it gives."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['R', 'G', 'R', 'G']\n"
     ]
    }
   ],
   "source": [
    "solution = genetic_algorithm(population, fitness, gene_pool=['R', 'G'])\n",
    "print(solution)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The algorithm converged to a solution. Let's check its score:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "4\n"
     ]
    }
   ],
   "source": [
    "print(fitness(solution))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The solution has a score of 4. Which means it is optimal, since we have exactly 4 edges in our graph, meaning all are valid!\n",
    "\n",
    "*NOTE: Because the algorithm is non-deterministic, there is a chance a different solution is given. It might even be wrong, if we are very unlucky!*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Eight Queens\n",
    "\n",
    "Let's take a look at a more complicated problem.\n",
    "\n",
    "In the *Eight Queens* problem, we are tasked with placing eight queens on an 8x8 chessboard without any queen threatening the others (aka queens should not be in the same row, column or diagonal). In its general form the problem is defined as placing *N* queens in an NxN chessboard without any conflicts.\n",
    "\n",
    "First we need to think about the representation of each solution. We can go the naive route of representing the whole chessboard with the queens' placements on it. That is definitely one way to go about it, but for the purpose of this tutorial we will do something different. We have eight queens, so we will have a gene for each of them. The gene pool will be numbers from 0 to 7, for the different columns. The *position* of the gene in the state will denote the row the particular queen is placed in.\n",
    "\n",
    "For example, we can have the state \"03304577\". Here the first gene with a value of 0 means \"the queen at row 0 is placed at column 0\", for the second gene \"the queen at row 1 is placed at column 3\" and so forth.\n",
    "\n",
    "We now need to think about the fitness function. On the graph coloring problem we counted the valid edges. The same thought process can be applied here. Instead of edges though, we have positioning between queens. If two queens are not threatening each other, we say they are at a \"non-attacking\" positioning. We can, therefore, count how many such positionings are there.\n",
    "\n",
    "Let's dive right in and initialize our population:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[0, 2, 7, 1, 7, 3, 2, 4], [2, 7, 5, 4, 4, 5, 2, 0], [7, 1, 6, 0, 1, 3, 0, 2], [0, 3, 6, 1, 3, 0, 5, 4], [0, 4, 6, 4, 7, 4, 1, 6]]\n"
     ]
    }
   ],
   "source": [
    "population = init_population(100, range(8), 8)\n",
    "print(population[:5])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We have a population of 100 and each individual has 8 genes. The gene pool is the integers from 0 to 7, in string form. Above you can see the first five individuals.\n",
    "\n",
    "Next we need to write our fitness function. Remember, queens threaten each other if they are at the same row, column or diagonal.\n",
    "\n",
    "Since positionings are mutual, we must take care not to count them twice. Therefore for each queen, we will only check for conflicts for the queens after her.\n",
    "\n",
    "A gene's value in an individual `q` denotes the queen's column, and the position of the gene denotes its row. We can check if the aforementioned values between two genes are the same. We also need to check for diagonals. A queen *a* is in the diagonal of another queen, *b*, if the difference of the rows between them is equal to either their difference in columns (for the diagonal on the right of *a*) or equal to the negative difference of their columns (for the left diagonal of *a*). Below is given the fitness function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def fitness(q):\n",
    "    non_attacking = 0\n",
    "    for row1 in range(len(q)):\n",
    "        for row2 in range(row1+1, len(q)):\n",
    "            col1 = int(q[row1])\n",
    "            col2 = int(q[row2])\n",
    "            row_diff = row1 - row2\n",
    "            col_diff = col1 - col2\n",
    "\n",
    "            if col1 != col2 and row_diff != col_diff and row_diff != -col_diff:\n",
    "                non_attacking += 1\n",
    "\n",
    "    return non_attacking"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that the best score achievable is 28. That is because for each queen we only check for the queens after her. For the first queen we check 7 other queens, for the second queen 6 others and so on. In short, the number of checks we make is the sum 7+6+5+...+1. Which is equal to 7\\*(7+1)/2 = 28.\n",
    "\n",
    "Because it is very hard and will take long to find a perfect solution, we will set the fitness threshold at 25. If we find an individual with a score greater or equal to that, we will halt. Let's see how the genetic algorithm will fare."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[5, 0, 6, 3, 7, 4, 1, 3]\n",
      "26\n"
     ]
    }
   ],
   "source": [
    "solution = genetic_algorithm(population, fitness, f_thres=25, gene_pool=range(8))\n",
    "print(solution)\n",
    "print(fitness(solution))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Above you can see the solution and its fitness score, which should be no less than 25."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With that this tutorial on the genetic algorithm comes to an end. Hope you found this guide helpful!"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}