probability-4e.ipynb 39,3 ko
Newer Older
Peter Norvig's avatar
Peter Norvig a validé
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
Peter Norvig's avatar
Peter Norvig a validé
    "# And if you have both headache and fever, and were not vaccinated, \n",
    "# then the flu is very likely, especially if it is a high fever.\n",
    "enumeration_ask(Flu, {Headache: T, Fever: 'mild', Vaccinated: F}, flu_net)"
Peter Norvig's avatar
Peter Norvig a validé
   "execution_count": 33,
Peter Norvig's avatar
Peter Norvig a validé
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
Peter Norvig's avatar
Peter Norvig a validé
       "{F: 0.055534567434831886, T: 0.9444654325651682}"
Peter Norvig's avatar
Peter Norvig a validé
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
Peter Norvig's avatar
Peter Norvig a validé
    "enumeration_ask(Flu, {Headache: T, Fever: 'high', Vaccinated: F}, flu_net)"
   ]
  },
  {
   "cell_type": "markdown",
Peter Norvig's avatar
Peter Norvig a validé
   "metadata": {},
Peter Norvig's avatar
Peter Norvig a validé
    "# Entropy\n",
    "\n",
    "We can compute the entropy of a probability distribution:"
Peter Norvig's avatar
Peter Norvig a validé
   "execution_count": 34,
Peter Norvig's avatar
Peter Norvig a validé
    "collapsed": false
   },
   "outputs": [],
   "source": [
Peter Norvig's avatar
Peter Norvig a validé
    "def entropy(probdist):\n",
    "    \"The entropy of a probability distribution.\"\n",
    "    return - sum(p * math.log(p, 2)\n",
    "                 for p in probdist.values())"
Peter Norvig's avatar
Peter Norvig a validé
   "execution_count": 35,
Peter Norvig's avatar
Peter Norvig a validé
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
Peter Norvig's avatar
Peter Norvig a validé
       "1.0"
Peter Norvig's avatar
Peter Norvig a validé
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
Peter Norvig's avatar
Peter Norvig a validé
    "entropy(ProbDist(heads=0.5, tails=0.5))"
Peter Norvig's avatar
Peter Norvig a validé
   "execution_count": 36,
Peter Norvig's avatar
Peter Norvig a validé
    "collapsed": false
Peter Norvig's avatar
Peter Norvig a validé
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.011397802630112312"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
Peter Norvig's avatar
Peter Norvig a validé
    "entropy(ProbDist(yes=1000, no=1))"
Peter Norvig's avatar
Peter Norvig a validé
   "execution_count": 37,
Peter Norvig's avatar
Peter Norvig a validé
    "collapsed": false
Peter Norvig's avatar
Peter Norvig a validé
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.8687212463394045"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
Peter Norvig's avatar
Peter Norvig a validé
    "entropy(P(Alarm, {Earthquake: T, Burglary: F}))"
Peter Norvig's avatar
Peter Norvig a validé
   "execution_count": 38,
Peter Norvig's avatar
Peter Norvig a validé
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
Peter Norvig's avatar
Peter Norvig a validé
       "0.011407757737461138"
Peter Norvig's avatar
Peter Norvig a validé
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
Peter Norvig's avatar
Peter Norvig a validé
    "entropy(P(Alarm, {Earthquake: F, Burglary: F}))"
   ]
  },
  {
   "cell_type": "markdown",
Peter Norvig's avatar
Peter Norvig a validé
   "metadata": {},
   "source": [
    "For non-Boolean variables, the entropy can be greater than 1 bit:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
Peter Norvig's avatar
Peter Norvig a validé
    "collapsed": false
Peter Norvig's avatar
Peter Norvig a validé
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1.5"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
Peter Norvig's avatar
Peter Norvig a validé
    "entropy(P(Fever, {Flu: T}))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
Peter Norvig's avatar
Peter Norvig a validé
    "collapsed": false
Peter Norvig's avatar
Peter Norvig a validé
    "# Unknown Outcomes: Smoothing\n",
    "\n",
    "So far we have dealt with discrete distributions where we know all the possible outcomes in advance. For Boolean variables, the only outcomes are `T` and `F`. For `Fever`, we modeled exactly three outcomes.  However, in some applications we will encounter new, previously unknown outcomes over time. For example, we could train a model on the distribution of words in English, and then somebody could coin a brand new word. To deal with this, we introduce\n",
    "the `DefaultProbDist` distribution, which uses the key `None` to stand as a placeholder for any unknown outcome(s)."
Peter Norvig's avatar
Peter Norvig a validé
   "execution_count": 40,
Peter Norvig's avatar
Peter Norvig a validé
    "collapsed": true
   },
   "outputs": [],
   "source": [
Peter Norvig's avatar
Peter Norvig a validé
    "class DefaultProbDist(ProbDist):\n",
    "    \"\"\"A Probability Distribution that supports smoothing for unknown outcomes (keys).\n",
    "    The default_value represents the probability of an unknown (previously unseen) key. \n",
    "    The key `None` stands for unknown outcomes.\"\"\"\n",
    "    def __init__(self, default_value, mapping=(), **kwargs):\n",
    "        self[None] = default_value\n",
    "        self.update(mapping, **kwargs)\n",
    "        normalize(self)\n",
    "        \n",
    "    def __missing__(self, key): return self[None]        "
Peter Norvig's avatar
Peter Norvig a validé
   "execution_count": 41,
Peter Norvig's avatar
Peter Norvig a validé
    "collapsed": false
Peter Norvig's avatar
Peter Norvig a validé
   "outputs": [],
Peter Norvig's avatar
Peter Norvig a validé
    "import re\n",
    "\n",
    "def words(text): return re.findall(r'\\w+', text.lower())\n",
Peter Norvig's avatar
Peter Norvig a validé
    "english = words('''This is a sample corpus of English prose. To get a better model, we would train on much\n",
    "more text. But this should give you an idea of the process. So far we have dealt with discrete \n",
    "distributions where we  know all the possible outcomes in advance. For Boolean variables, the only \n",
    "outcomes are T and F. For Fever, we modeled exactly three outcomes. However, in some applications we \n",
    "will encounter new, previously unknown outcomes over time. For example, when we could train a model on the \n",
    "words in this text, we get a distribution, but somebody could coin a brand new word. To deal with this, \n",
    "we introduce the DefaultProbDist distribution, which uses the key `None` to stand as a placeholder for any \n",
    "unknown outcomes. Probability theory allows us to compute the likelihood of certain events, given \n",
    "assumptions about the components of the event. A Bayesian network, or Bayes net for short, is a data \n",
    "structure to represent a joint probability distribution over several random variables, and do inference on it.''')\n",
    "\n",
    "E = DefaultProbDist(0.1, Counter(english))"
Peter Norvig's avatar
Peter Norvig a validé
   "execution_count": 42,
Peter Norvig's avatar
Peter Norvig a validé
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
Peter Norvig's avatar
Peter Norvig a validé
       "0.052295177222545036"
Peter Norvig's avatar
Peter Norvig a validé
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
Peter Norvig's avatar
Peter Norvig a validé
    "# 'the' is a common word:\n",
    "E['the']"
Peter Norvig's avatar
Peter Norvig a validé
   "execution_count": 43,
Peter Norvig's avatar
Peter Norvig a validé
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
Peter Norvig's avatar
Peter Norvig a validé
       "0.005810575246949448"
Peter Norvig's avatar
Peter Norvig a validé
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
Peter Norvig's avatar
Peter Norvig a validé
    "# 'possible' is a less-common word:\n",
    "E['possible']"
Peter Norvig's avatar
Peter Norvig a validé
   "execution_count": 44,
Peter Norvig's avatar
Peter Norvig a validé
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
Peter Norvig's avatar
Peter Norvig a validé
       "0.0005810575246949449"
Peter Norvig's avatar
Peter Norvig a validé
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
Peter Norvig's avatar
Peter Norvig a validé
    "# 'impossible' was not seen in the training data, but still gets a non-zero probability ...\n",
    "E['impossible']"
Peter Norvig's avatar
Peter Norvig a validé
   "execution_count": 45,
Peter Norvig's avatar
Peter Norvig a validé
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
Peter Norvig's avatar
Peter Norvig a validé
       "0.0005810575246949449"
Peter Norvig's avatar
Peter Norvig a validé
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
Peter Norvig's avatar
Peter Norvig a validé
    "# ... as do other rare, previously unseen words:\n",
    "E['llanfairpwllgwyngyll']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that this does not mean that 'impossible' and 'llanfairpwllgwyngyll' and all the other unknown words\n",
    "*each* have probability 0.004.\n",
    "Rather, it means that together, all the unknown words total probability 0.004. With that\n",
    "interpretation, the sum of all the probabilities is still 1, as it should be. In the `DefaultProbDist`, the\n",
    "unknown words are all represented by the key `None`:"
Peter Norvig's avatar
Peter Norvig a validé
   "execution_count": 46,
Peter Norvig's avatar
Peter Norvig a validé
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
Peter Norvig's avatar
Peter Norvig a validé
       "0.0005810575246949449"
Peter Norvig's avatar
Peter Norvig a validé
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
Peter Norvig's avatar
Peter Norvig a validé
    "E[None]"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}