0
1

More than 5 years have passed since last update.

Questions and Answers on Likelihood

Last updated at Posted at 2019-04-18

Overview

There are already a number of useful resources explaining the concept of likelihood. In this post, I would like to share questions on likelihood I had initially, and answers to them for those who have questions as below:

Q1. When we say likelihood should it always mean likelihood function?
Q2. Why the probability density function in Bayes' theorem is called likelihood?

This post is intended to be a complementary resource for those who are just learning the concept of likelihood and not fully comfortable with it. I would appreciate any comment or suggestion.

Likelihood

Before going to questions, I first review likelihood here again.

Presumptions when we say likelihood

We assume a probability distribution of $x$ with parameter $\theta$ as $P_{\theta}(x)$, which is a real distribution we want to know.

Whenever we say likelihood, we are already assuming followings:

  • There is a series of $x$ observations as $X^{\prime} = (x_1, x_2, ... )$
  • We are trying to explain the observation by a statistic model $p_{\theta}(x)$, which is our assumption to the real probability distribution.

Likelihood (function)

Likelihood function is defined as below, when $p_{\theta}(x)$ is independent for any observation:

L(\theta) = p_{\theta}(x_1) \cdot p_{\theta}(x_2) \cdot p_{\theta}(x_3) ... = 

Although the form of the equation is similar as probability density function, all $x$s in $L(\theta)$ are constants. Obviously, $L(\theta)$ is a function of $\theta$ and can be plotted on $L-\theta$ plane. $\theta$ can be a vector, but we assume it is a scalar for simplicity for now.

In the maximum likelihood estimate, we choose $\theta=\theta_{best}$ that maximize the likelihood. As it is usually said, the integral of the likelihood function is not necessary to be 1 since parameter $\theta$ is not a random variable of a probability distribution function.

Question 1 - When we say likelihood should it always mean likelihood function?

When we search likelihood, it hits explanations on likelihood function and it is also mentioned that likelihood function is abbreviated as likelihood. My question here was "So when we see likelihood, should we always imagine a function?"

The answer is no. The value of likelihood function can also be called as likelihood as some references do. This value is called likelihood and of course it cannot be called as likelihood function. Thus, we cannot always assume likelihood as likelihood function.

When $\theta$ is set to a certain value, say $\theta^{\prime}$, this results in one value $L(\theta^{\prime})$, which is a relative measure of how good the statistic model we assumed $p_{\theta}(x)$ is to explain the set of fixed observations $X^{\prime}$.

To me, it is more clear to say:

likelihood is a metric to relatively quantify how an assumed statistic model is good to explain given observations. A parametric function to calculate likelihood is called likelihood function or simply likelihood also.

Likelifood function and its value are referred in the same word likelihood. This is similar as, for example, a kinetic energy of a point mass is written as a function of velocity but a certain value of the kinetic energy is also called kinetic energy.

Question 2 - How the Likelihood definition relates to the likelihood term in Bayes' theorem?

In Bayes' theorem

P(\theta|D) = \frac{P(D|\theta)}{P(D)}P(\theta)

The conditional probability $P(D|\theta)$ is called a likelihood. The prior $P(\theta)$ will be updated by multiplying the likelihood and normalized by evidence $P(D)$ to calculate the posterior $P(\theta|D)$.

In text book examples of Baysian inference, $P(D|\theta)$ is sometime treated as a known conditional probability and does not seem to be a likelihood function. However, in general, this term is unknown. We usually have access to a series of observations $(d_1, d_2, ... )$ but do not know a mechanism, i.e. $P(D|\theta)$, that caused the observations.

Thus, $P(D|\theta)$ would be statistically modeled with parameters $\theta$ given an observation $D$. The form of $P(D|\theta)$ won't change since observations are assumed to be independent. This is consistent with the explanation of the likelihood function above.

Maximum likelihood estimate (MLE)

In Maximum Likelihood Estimate, the likelihood $P(D|\theta)$ as a function of $\theta$ is maximized with respect to $\theta$. One specific value of $\theta = \theta^{\prime}$ that maximize the likelihood is chosen to be a most possible value for the parameter. This is essentially equivalent to calculating the posterior $P(\theta|D)$ by assuming $\frac{P(\theta)}{P(D)}$ is constant and avoiding a prior belief for $\theta$ distribution to be a part of the estimate [4].

Bayesian Estimate

In Bayesian Estimate, it treats $\theta$ as a random variable (= pips on a dice) defined by a probability distribution. The value of $\theta$ is estimated by minimizing a loss function, such as variance [4, 5].

References

[1] https://en.wikipedia.org/wiki/Likelihood_function
[2] http://www.genstat.net/statistics.html (Japanese)
[3] https://qiita.com/kenmatsu4/items/b28d1b3b3d291d0cc698 (Japanese)
[4] https://stats.stackexchange.com/questions/74082/what-is-the-difference-in-bayesian-estimate-and-maximum-likelihood-estimate
[5] https://medium.com/datadriveninvestor/maximum-likelihood-estimation-v-s-bayesian-estimation-bfac171a8b85

0
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
1