Pseudolikelihood

In statistical theory, a pseudolikelihood is an approximation to the joint probability distribution of a collection of random variables. The practical use of this is that it can provide an approximation to the likelihood function of a set of observed data which may either provide a computationally simpler problem for estimation, or may provide a way of obtaining explicit estimates of model parameters.

The pseudolikelihood approach was introduced by Julian Besag[1] in the context of analysing data having spatial dependence.

Definition

Given a set of random variables X = X 1 , X 2 , , X n {\displaystyle X=X_{1},X_{2},\ldots ,X_{n}} the pseudolikelihood of X = x = ( x 1 , x 2 , , x n ) {\displaystyle X=x=(x_{1},x_{2},\ldots ,x_{n})} is

L ( θ ) := i P r θ ( X i = x i X j = x j  for  j i ) = i P r θ ( X i = x i X i = x i ) {\displaystyle L(\theta ):=\prod _{i}\mathrm {Pr} _{\theta }(X_{i}=x_{i}\mid X_{j}=x_{j}{\text{ for }}j\neq i)=\prod _{i}\mathrm {Pr} _{\theta }(X_{i}=x_{i}\mid X_{-i}=x_{-i})}

in discrete case and

L ( θ ) := i p θ ( x i x j  for  j i ) = i p θ ( x i x i ) = i p θ ( x i x 1 , , x ^ i , , x n ) {\displaystyle L(\theta ):=\prod _{i}p_{\theta }(x_{i}\mid x_{j}{\text{ for }}j\neq i)=\prod _{i}p_{\theta }(x_{i}\mid x_{-i})=\prod _{i}p_{\theta }(x_{i}\mid x_{1},\ldots ,{\hat {x}}_{i},\ldots ,x_{n})}

in continuous one. Here X {\displaystyle X} is a vector of variables, x {\displaystyle x} is a vector of values, p θ ( ) {\displaystyle p_{\theta }(\cdot \mid \cdot )} is conditional density and θ = ( θ 1 , , θ p ) {\displaystyle \theta =(\theta _{1},\ldots ,\theta _{p})} is the vector of parameters we are to estimate. The expression X = x {\displaystyle X=x} above means that each variable X i {\displaystyle X_{i}} in the vector X {\displaystyle X} has a corresponding value x i {\displaystyle x_{i}} in the vector x {\displaystyle x} and x i = ( x 1 , , x ^ i , , x n ) {\displaystyle x_{-i}=(x_{1},\ldots ,{\hat {x}}_{i},\ldots ,x_{n})} means that the coordinate x i {\displaystyle x_{i}} has been omitted. The expression P r θ ( X = x ) {\displaystyle \mathrm {Pr} _{\theta }(X=x)} is the probability that the vector of variables X {\displaystyle X} has values equal to the vector x {\displaystyle x} . This probability of course depends on the unknown parameter θ {\displaystyle \theta } . Because situations can often be described using state variables ranging over a set of possible values, the expression P r θ ( X = x ) {\displaystyle \mathrm {Pr} _{\theta }(X=x)} can therefore represent the probability of a certain state among all possible states allowed by the state variables.

The pseudo-log-likelihood is a similar measure derived from the above expression, namely (in discrete case)

l ( θ ) := log L ( θ ) = i log P r θ ( X i = x i X j = x j  for  j i ) . {\displaystyle l(\theta ):=\log L(\theta )=\sum _{i}\log \mathrm {Pr} _{\theta }(X_{i}=x_{i}\mid X_{j}=x_{j}{\text{ for }}j\neq i).}

One use of the pseudolikelihood measure is as an approximation for inference about a Markov or Bayesian network, as the pseudolikelihood of an assignment to X i {\displaystyle X_{i}} may often be computed more efficiently than the likelihood, particularly when the latter may require marginalization over a large number of variables.

Properties

Use of the pseudolikelihood in place of the true likelihood function in a maximum likelihood analysis can lead to good estimates, but a straightforward application of the usual likelihood techniques to derive information about estimation uncertainty, or for significance testing, would in general be incorrect.[2]

References

  1. ^ Besag, J. (1975), "Statistical Analysis of Non-Lattice Data", The Statistician, 24 (3): 179–195, doi:10.2307/2987782, JSTOR 2987782
  2. ^ Dodge, Y. (2003) The Oxford Dictionary of Statistical Terms, Oxford University Press. ISBN 0-19-920613-9 [full citation needed]