mdp.ipynb 190 ko
Newer Older
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "collapsed": true
   },
   "source": [
    "sequential_decision_environment = GridMDP([[-0.4, -0.4, -0.4, +1],\n",
    "                                           [-0.4, None, -0.4, -1],\n",
    "                                           [-0.4, -0.4, -0.4, -0.4]],\n",
    "                                          terminals=[(3, 2), (3, 1)])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      ">   >      >   .\n",
      "^   None   ^   .\n",
      "^   >      ^   <\n"
     ]
    }
   ],
   "source": [
    "pi = best_policy(sequential_decision_environment, value_iteration(sequential_decision_environment, .001))\n",
    "from utils import print_table\n",
    "print_table(sequential_decision_environment.to_arrows(pi))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is exactly the output we expected\n",
    "![title](images/-0.4.jpg)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As the reward for each state is now more negative, life is certainly more unpleasant.\n",
    "The agent takes the shortest route to the +1 state and is willing to risk falling into the -1 state by accident."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Case 3\n",
    "---\n",
    "R(s) = -4 in all states except terminal states"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "collapsed": true
   },
   "source": [
    "sequential_decision_environment = GridMDP([[-4, -4, -4, +1],\n",
    "                                           [-4, None, -4, -1],\n",
    "                                           [-4, -4, -4, -4]],\n",
    "                                          terminals=[(3, 2), (3, 1)])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      ">   >      >   .\n",
      "^   None   >   .\n",
      ">   >      >   ^\n"
     ]
    }
   ],
   "source": [
    "pi = best_policy(sequential_decision_environment, value_iteration(sequential_decision_environment, .001))\n",
    "from utils import print_table\n",
    "print_table(sequential_decision_environment.to_arrows(pi))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is exactly the output we expected\n",
    "![title](images/-4.jpg)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The living reward for each state is now lower than the least rewarding terminal. Life is so _painful_ that the agent heads for the nearest exit as even the worst exit is less painful than any living state."
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Case 4\n",
    "---\n",
    "R(s) = 4 in all states except terminal states"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "collapsed": true
   },
   "source": [
    "sequential_decision_environment = GridMDP([[4, 4, 4, +1],\n",
    "                                           [4, None, 4, -1],\n",
    "                                           [4, 4, 4, 4]],\n",
    "                                          terminals=[(3, 2), (3, 1)])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      ">   >      <   .\n",
      ">   None   <   .\n",
      ">   >      >   v\n"
     ]
    }
   ],
   "source": [
    "pi = best_policy(sequential_decision_environment, value_iteration(sequential_decision_environment, .001))\n",
    "from utils import print_table\n",
    "print_table(sequential_decision_environment.to_arrows(pi))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this case, the output we expect is\n",
    "![title](images/4.jpg)\n",
    "<br>\n",
    "As life is positively enjoyable and the agent avoids _both_ exits.\n",
    "Even though the output we get is not exactly what we want, it is definitely not wrong.\n",
    "The scenario here requires the agent to anything but reach a terminal state, as this is the only way the agent can maximize its reward (total reward tends to infinity), and the program does just that.\n",
    "<br>\n",
    "Currently, the GridMDP class doesn't support an explicit marker for a \"do whatever you like\" action or a \"don't care\" condition.\n",
    "You can however, extend the class to do so.\n",
    "<br>\n",
    "For in-depth knowledge about sequential decision problems, refer **Section 17.1** in the AIMA book."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "## Appendix\n",
    "\n",
    "Surprisingly, it turns out that there are six other optimal policies for various ranges of R(s). \n",
    "You can try to find them out for yourself.\n",
    "See **Exercise 17.5**.\n",
    "To help you with this, we have a GridMDP editor in `grid_mdp.py` in the GUI folder. \n",
    "<br>\n",
    "Here's a brief tutorial about how to use it\n",
    "<br>\n",
    "Let us use it to solve `Case 2` above\n",
    "1. Run `python gui/grid_mdp.py` from the master directory.\n",
    "2. Enter the dimensions of the grid (3 x 4 in this case), and click on `'Build a GridMDP'`\n",
    "3. Click on `Initialize` in the `Edit` menu.\n",
    "4. Set the reward as -0.4 and click `Apply`. Exit the dialog. \n",
    "![title](images/ge0.jpg)\n",
    "<br>\n",
    "5. Select cell (1, 1) and check the `Wall` radio button. `Apply` and exit the dialog.\n",
    "![title](images/ge1.jpg)\n",
    "<br>\n",
    "6. Select cells (4, 1) and (4, 2) and check the `Terminal` radio button for both. Set the rewards appropriately and click on `Apply`. Exit the dialog. Your window should look something like this.\n",
    "![title](images/ge2.jpg)\n",
    "<br>\n",
    "7. You are all set up now. Click on `Build and Run` in the `Build` menu and watch the heatmap calculate the utility function.\n",
    "![title](images/ge4.jpg)\n",
    "<br>\n",
    "Green shades indicate positive utilities and brown shades indicate negative utilities. \n",
    "The values of the utility function and arrow diagram will pop up in separate dialogs after the algorithm converges."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
    "001e6c8ed3fc4eeeb6ab7901992314dd": {
     "views": []
    },
    "00f29880456846a8854ab515146ec55b": {
     "views": []
    },
    "010f52f7cde545cba25593839002049b": {
     "views": []
    },
    "01473ad99aa94acbaca856a7d980f2b9": {
     "views": []
    },
    "021a4a4f35da484db5c37c5c8d0dbcc2": {
     "views": []
    },
    "02229be5d3bc401fad55a0378977324a": {
     "views": []
    },
    "022a5fdfc8e44fb09b21c4bd5b67a0db": {
     "views": [
      {
      }
     ]
    },
    "025c3b0250b94d4c8d9b33adfdba4c15": {
     "views": []
    },
    "028f96abfed644b8b042be1e4b16014d": {
     "views": []
    },
    "0303bad44d404a1b9ad2cc167e42fcb7": {
     "views": []
    },
    "031d2d17f32347ec83c43798e05418fe": {
     "views": []
    },
    "03de64f0c2fd43f1b3b5d84aa265aeb7": {
     "views": []
    },
    "03fdd484675b42ad84448f64c459b0e0": {
     "views": []
    },
    "044cf74f03fd44fd840e450e5ee0c161": {
     "views": []
    },
    "054ae5ba0a014a758de446f1980f1ba5": {
     "views": []
    },
    "0675230fb92f4539bc257b768fb4cd10": {
     "views": [
      {
      }
     ]
    },
    "06c93b34e1f4424aba9a0b172c428260": {
     "views": []
    },
    "077a5ea324be46c3ad0110671a0c6a12": {
     "views": []
    },
    "0781138d150142a08775861a69beaec9": {
     "views": []
    },
    "0783e74a8c2b40cc9b0f5706271192f4": {
     "views": [
      {
      }
     ]
    },
    "07c7678b73634e728085f19d7b5b84f7": {
     "views": []
    },
    "07febf1d15a140d8adb708847dd478ec": {
     "views": []
    },
    "08299b681cd9477f9b19a125e186ce44": {
    "083af89d82e445aab4abddfece61d700": {
    "08a1129a8bd8486bbfe2c9e49226f618": {
    "08a2f800c0d540fdb24015156c7ffc15": {
    "097d8d0feccc4c76b87bbcb3f1ecece7": {
    "098f12158d844cdf89b29a4cd568fda0": {
     "views": [
      {
      }
     ]
    },
    "09e96f9d5d32453290af60fbd29ca155": {
     "views": []
    },
    "0a2ec7c49dcd4f768194483c4f2e8813": {
     "views": []
    },
    "0b1d6ed8fe4144b8a24228e1befe2084": {
     "views": []
    },
    "0b299f8157d24fa9830653a394ef806a": {
     "views": []
    },
    "0b2a4ac81a244ff1a7b313290465f8f4": {
     "views": []
    },
    "0b52cfc02d604bc2ae42f4ba8c7bca4f": {
     "views": []
    },
    "0b65fb781274495ab498ad518bc274d4": {
     "views": [
      {
      }
     ]
    },
    "0b865813de0841c49b41f6ad5fb85c6a": {
     "views": []
    },
    "0c2070d20fb04864aeb2008a6f2b8b30": {
     "views": []
    },
    "0cf5319bcde84f65a1a91c5f9be3aa28": {
     "views": []
    },
    "0d721b5be85f4f8aafe26b3597242d60": {
     "views": []
    },
    "0d9f29e197ad45d6a04bbb6864d3be6d": {
    "0e03c7e2c0414936b206ed055e19acba": {
    "0e2265aa506a4778bfc480d5e48c388b": {
    "0e4e3d0b6afc413e86970ec4250df678": {
    "0e6a5fe6423542e6a13e30f8929a8b02": {
    "0e7b2f39c94343c3b0d3b6611351886e": {
    "0eb5005fa34440988bcf3be231d31511": {
    "104703ad808e41bc9106829bb0396ece": {
    "109c376b28774a78bf90d3da4587d834": {
    "10b24041718843da976ac616e77ea522": {
    "11516bb6db8b45ef866bd9be8bb59312": {
    "1203903354fa467a8f38dbbad79cbc81": {
    "124ecbe68ada40f68d6a1807ad6bcdf9": {
    "1264becdbb63455183aa75f236a3413e": {
    "13061cc21693480a8380346277c1b877": {
    "130dd4d2c9f04ad28d9a6ac40045a329": {
    "1350a087b5a9422386c3c5f04dd5d1c9": {
    "139bd19be4a4427a9e08f0be6080188e": {
    "13f9f589d36c477f9b597dda459efd16": {
    "140917b5c77348ec82ea45da139a3045": {
    "145419657bb1401ba934e6cea43d5fd1": {
    "15d748f1629d4da1982cd62cfbcb1725": {
    "17ad015dbc744ac6952d2a6da89f0289": {
    "17b6508f32e4425e9f43e5407eb55ed3": {
    "185598d8e5fc4dffae293f270a6e7328": {
    "196473b25f384f3895ee245e8b7874e9": {
    "19c0f87663a0431285a62d4ad6748046": {
    "1a00a7b7446d4ad8b08c9a2a9ea9c852": {
    "1a97f5b88cdc4ae0871578c06bbb9965": {
    "1a9a07777b0c4a45b33e25a70ebdc290": {
    "1af711fe8e4f43f084cef6c89eec40ae": {
     "views": [
      {
      }
     ]
    },
    "1aff6a6e15b34bb89d7579d445071230": {
     "views": []
    },
    "1b1ea7e915d846aea9efeae4381b2c48": {
     "views": []
    },
    "1ba02ae1967740b0a69e07dbe95635cb": {
     "views": []
    },
    "1c5c913acbde4e87a163abb2e24e6e38": {
     "views": [
      {
      }
     ]
    },
    "1cfca0b7ef754c459e1ad97c1f0ceb3b": {
     "views": []
    },
    "1d8f6a4910e649589863b781aab4c4d4": {
     "views": []
    },
    "1e64b8f5a1554a22992693c194f7b971": {
     "views": []
    },
    "1e8f0a2bf7614443a380e53ed27b48c0": {
    "1f4e6fa4bacc479e8cd997b26a5af733": {
    "1fdf09158eb44415a946f07c6aaba620": {
     "views": []
    },
    "200e3ebead3d4858a47e2f6d345ca395": {
     "views": [
      {
      }
     ]
    },
    "2050d4b462474a059f9e6493ba06ac58": {
    "20b5c21a6e6a427ba3b9b55a0214f75e": {
    "20b99631feba4a9c98c9d5f74c620273": {
    "20bcff5082854ab89a7977ae56983e30": {
    "20d708bf9b7845fa946f5f37c7733fee": {
    "210b36ea9edf4ee49ae1ae3fe5005282": {
    "21415393cb2d4f72b5c3f5c058aeaf66": {
    "2186a18b6ed8405a8a720bae59de2ace": {
    "220dc13e9b6942a7b9ed9e37d5ede7ba": {
    "221a735fa6014a288543e6f8c7e4e2ef": {
    "2288929cec4d4c8faad411029f5e21fa": {
    "22b86e207ea6469d85d8333870851a86": {
    "23283ad662a140e3b5e8677499e91d64": {
    "23a7cc820b63454ca6be3dcfd2538ac1": {
    "240ed02d576546028af3edfab9ea8558": {
    "24678e52a0334cb9a9a56f92c29750be": {
    "247820f6d83f4dd9b68f5df77dbda4b7": {
    "24b6a837fbd942c9a68218fb8910dcd5": {
    "24ee3204f26348bca5e6a264973e5b56": {
    "262c7bb5bd7447f791509571fe74ae44": {
    "263595f22d0d45e2a850854bcefe4731": {
    "2640720aa6684c5da6d7870abcbc950b": {
    "265ca1ec7ad742f096bb8104d0cf1550": {
    "26bf66fba453464fac2f5cd362655083": {
    "29769879478f49e8b4afd5c0b4662e87": {
    "29a13bd6bc8d486ca648bf30c9e4c2a6": {
    "29c5df6267584654b76205fc5559c553": {
    "29ce25045e7248e5892e8aafc635c416": {
    "2a17207c43c9424394299a7b52461794": {
    "2a777941580945bc83ddb0c817ed4122": {
    "2ae1844e2afe416183658d7a602e5963": {
    "2afa2938b41944cf8c14e41a431e3969": {
    "2bdc5f9b161548e3aab8ea392b5af1a1": {
    "2c26b2bcfc96473584930a4b622d268e": {
    "2ca2a914a5f940b18df0b5cde2b79e4b": {
    "2ca2c532840548a9968d1c6b2f0acdd8": {
    "2d17c32bfea143babe2b114d8777b15d": {
    "2d3acd8872c342eab3484302cac2cb05": {
     "views": [
      {
      }
     ]
    },
    "2dc514cc2f5547aeb97059a5070dc9e3": {
    "2e1351ad05384d058c90e594bc6143c1": {
     "views": [
      {
      }
     ]
    },
    "2e9b80fa18984615933e41c1c1db2171": {
    "2ef17ee6b7c74a4bbbbbe9b1a93e4fb6": {
    "2f5438f1b34046a597a467effd43df11": {
     "views": [
      {
      }
     ]
    },
    "2f8d22417f3e421f96027fca40e1554f": {
    "2fb0409cfb49469d89a32597dc3edba9": {
    "303ccef837984c97b7e71f2988c737a4": {
    "3058b0808dca48a0bba9a93682260491": {
    "306b65493c28411eb10ad786bbf85dc5": {
    "30f5d30cf2d84530b3199015c5ff00eb": {
    "310b1ac518bd4079bdb7ecaf523a6809": {
    "313eca81d9d24664bcc837db54d59618": {
    "31413caf78c14548baa61e3e3c9edc55": {
    "317fbd3cb6324b2fbdfd6aa46a8d1192": {
    "319425ba805346f5ba366c42e220f9c6": {
     "views": [
      {
      }
     ]
    },
    "31fc8165275e473f8f75c6215b5184ff": {
    "329f12edaa0c44d2a619450f188e8777": {
    "32edf057582f4a6ca30ce3cb685bf971": {
    "330e74773ba148e18674cfa3e63cd6cc": {
    "332a89c03bfb49c2bb291051d172b735": {
     "views": [
      {
      }
     ]
    },
    "3347dfda0aca450f89dd9b39ca1bec7d": {
    "336e8bcfd7cc4a85956674b0c7bffff2": {
    "3376228b3b614d4ab2a10b2fd0f484fd": {
    "3380a22bc67c4be99c61050800f93395": {
    "34b5c16cbea448809c2ccbce56f8d5a5": {
    "34bb050223504afc8053ce931103f52c": {
    "34c28187175d49198b536a1ab13668c4": {
    "3521f32644514ecf9a96ddfa5d80fb9b": {
    "36511bd77ed74f668053df749cc735d4": {
    "36541c3490bd4268b64daf20d8c24124": {
    "37aa1dd4d76a4bac98857b519b7b523a": {
    "37aa3cfa3f8f48989091ec46ac17ae48": {
    "386991b0b1424a9c816dac6a29e1206b": {
    "386cf43742234dda994e35b41890b4d8": {
    "388571e8e0314dfab8e935b7578ba7f9": {
     "views": [
      {
      }
     ]
    },
    "3974e38e718547efaf0445da2be6a739": {
    "398490e0cc004d22ac9c4486abec61e1": {
    "399875994aba4c53afa8c49fae8d369e": {
    "39b64aa04b1d4a81953e43def0ef6e10": {
    "39ffc3dd42d94a27ba7240d10c11b565": {
    "3a21291c8e7249e3b04417d31b0447cf": {
     "views": [
      {
      }
     ]
    },
    "3a377d9f46704d749c6879383c89f5d3": {
    "3a44a6f1f62742849e96d957033a0039": {
    "3b22d68709b046e09fe70f381a3944cd": {
     "views": [
      {
      }
     ]
    },
    "3b329209c8f547acae1925dc3eb4af77": {
    "3c1b2ec10a9041be8a3fad9da78ff9f6": {
     "views": [
      {
      }
     ]
    },
    "3c2be3c85c6d41268bb4f9d63a43e196": {
    "3c6796eff7c54238a7b7776e88721b08": {
    "3cbca3e11edf439fb7f8ba41693b4824": {
    "3d4b6b7c0b0c48ff8c4b8d78f58e0f1c": {
    "3de1faf0d2514f49a99b3d60ea211495": {
    "3df60d9ac82b42d9b885d895629e372e": {
    "3e5b9fd779574270bf58101002c152ce": {
     "views": [
      {
      }
     ]
    },
    "3e80f34623c94659bfab5b3b56072d9a": {
    "3e8bb05434cb4a0291383144e4523840": {
     "views": [
      {
      }
     ]
    },
    "3ea1c8e4f9b34161928260e1274ee048": {
    "3f32f0915bc6469aaaf7170eff1111e3": {
    "3fe69a26ae7a46fda78ae0cb519a0f8b": {
    "4000ecdd75d9467e9dffd457b35aa65f": {
    "402d346f8b68408faed2fd79395cf3fb": {
    "402f4116244242148fdc009bb399c3bd": {
    "4049e0d7c0d24668b7eae2bb7169376e": {
    "4088c9ed71b0467b9b9417d5b04eda0e": {
    "40d70faa07654b6cb13496c32ba274b3": {
    "4146be21b7614abe827976787ec570f1": {
    "4198c08edda440dd93d1f6ce3e4efa62": {
    "42023d7d3c264f9d933d4cee4362852b": {
    "421ad8c67f754ce2b24c4fa3a8e951cf": {
    "4263fe0cef42416f8d344c1672f591f9": {
    "428e42f04a1e4347a1f548379c68f91b": {
     "views": [
      {
      }
     ]
    },
    "42a47243baf34773943a25df9cf23854": {
    "4343b72c91d04a7c9a6080f30fc63d7d": {
    "43488264fc924c01a30fa58604074b07": {
    "4379175239b34553bf45c8ef9443ac55": {
     "views": [
      {
      }
     ]
    },
    "43859798809a4a289c58b4bd5e49d357": {
    "43ad406a61a34249b5622aba9450b23d": {
    "4421c121414d464bb3bf1b5f0e86c37b": {
     "views": [
      {
      }
     ]
    },
    "445cc08b4da44c2386ac9379793e3506": {
    "447cff7e256c434e859bb7ce9e5d71c8": {
    "44af7da9d8304f07890ef7d11a9f95fe": {
    "45021b6f05db4c028a3b5572bc85217f": {
    "457768a474844556bf9b215439a2f2e9": {
    "45d5689de53646fe9042f3ce9e281acc": {
    "461aa21d57824526a6b61e3f9b5af523": {
    "472ca253aab34b098f53ed4854d35f23": {
    "4731208453424514b471f862804d9bb8": {
    "47dfef9eaf0e433cb4b3359575f39480": {
     "views": []
    },
    "48220a877d494a3ea0cc9dae19783a13": {
     "views": []
    },
    "4882c417949b4b6788a1c3ec208fb1ac": {
     "views": []
    },
    "49f5c38281984e3bad67fe3ea3eb6470": {
     "views": []
    },
    "4a0d39b43eee4e818d47d382d87d86d1": {
     "views": []
    },
    "4a470bf3037047f48f4547b594ac65fa": {
     "views": []
    },
    "4abab5bca8334dfbb0434be39eb550db": {
    "4b48e08fd383489faa72fc76921eac4e": {
    "4b9439e6445c4884bd1cde0e9fd2405e": {
    "4b9fa014f9904fcf9aceff00cc1ebf44": {
    "4bdc63256c3f4e31a8fa1d121f430518": {
    "4bebb097ddc64bbda2c475c3a0e92ab5": {
    "4c201df21ca34108a6e7b051aa58b7f6": {
    "4ced8c156fd941eca391016fc256ce40": {
    "4d281cda33fa489d86228370e627a5b0": {
     "views": [
      {
      }
     ]
    },
    "4d85e68205d94965bdb437e5441b10a1": {
     "views": []
    },
    "4e0e6dd34ba7487ba2072d352fe91bf5": {
     "views": []
    },
    "4e82b1d731dd419480e865494f932f80": {
     "views": []
    },
    "4e9f52dea051415a83c4597c4f7a6c00": {
     "views": []
    },
    "4ec035cba73647358d416615cf4096ee": {
     "views": [