Marko Jerkić

Learner's Deep Learning Blog

Probability Distributions Part I – Bernoulli, Multinoulli

Probability distributions are used in statistics to describe how likely a random variable is to take on each of it’s possible states. Random variables can be discrete and continuous.

A discrete random variable has a finite number of possible outcomes, whereas a continuous random variable has an infinite number of possible outcomes.

The Bernoulli distribution

The Bernoulli distribution deals with the probability of a binary random variable, which means that it has only two possible outcomes.

The Bernoulli distribution is controlled by one parameter $latex -\phi \in [0, 1] &s=1 $ , where $latex \phi &s=1 $ gives the probability that the random variable will have the value 1.

Based on that, there are a couple of basic statements we can assume:

$latex P(x = 1) = \phi  \\ P(x=0) = 1 – \phi \\ \mathbb{E}[x] = \phi \\ Var_{x}(x) = \phi (1 – \phi) &s=1 $


We can create an experiment using numpy.

The most applicable use of the Bernoulli distribution is a coin flip. It is used constantly in in the beginnings of sports matches. The referee flips a coin and one of the team captains needs to all heads or tails. This would assume that the coin can not fall flat on the side and stay like that forever. In sports matches, it is customary to use a coin that is “fair“, that is, it’s weight is equal on both its sides.

So, if you ran something like this:

The output would probably look something like this: [1 0 1 1 0 0 0 0 1 1].

And the histogram of the possibilities something like this:

The multinoulli distribution

The mulinoulli, or categorical, distribution is a distribution over a variable which has k possible states. Each state has an unrelated possibility.

It is controlled by $latex \phi \in [0, 1]^{k – 1} &s=1 $ , where $latex \phi_i &s=1 $ gives the possibility of the $latex i &s=1$th possible outcome, and the $latex k &s=1$th outcome is given by $latex 1 – 1^T\phi &s=1 $.

If you were to run the scrip above, you would get something like this: [ 3 69 89 74 21 61 50 19 78 36].

Each element at $latex i &s=1 $th position shows how many times did the random variable take the $latex i &s=1 $th possible state. I ran it 500 times with 10 possible outcomes.

A histogram of the possibility for this distribution would look something like this:

Part II

For Part II, which is about the Gaussian (normal) and exponential distributions, go here.


  • Deep Learning (Adaptive Computation and Machine Learning series), by I.Goodfellow, Y.Bengio, A.Courville,

  • StatLect:,

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top