Appendix A. Kalman filtering for maximum likelihood estimation given corrupted observations.
Introduction
The Kalman filter is used to extend likelihood estimation to cases with hidden states, such as when observations are corrupted and the true population size is unobserved. The following algorithm is based on section 3.4 in Harvey (1989), which was used by Lindley (2003) for estimation for population processes. The Kalman filter is well-known and widely used in engineering and computer science applications. There are a multitude of books on the Kalman filter, including Harvey (1989). One of the more penetrable introductions of the Kalman filter alone (but not on maximum likelihood estimation) is chapter 1 of Maybeck (1979).
The state-space model
The diffusion approximation for a stochastic exponential growth model can be written as a linear state space model (written in the notation familiar in the engineering literature):
![]() |
(A.1)
|
![]() |
(A.2)
|
where xt = log
Nt is the true log population size and yt
= log Ot is the log observations of the population size.
B is
,
the mean population growth rate. Q is the
2,
otherwise known as the process error or environmental variability. R
is the variability associated with sampling error or other non-process error.
Only yt, is observed; the underlying parameters, B,
Q, and R, and the underlying true population size, xt,
is hidden. If we make the assumption that vt is normally
distributed, then the model is a linear Gaussian state-space model.
We can calculate the probability
of the observed time series,
,
as follows:
![]() |
(A.3)
|
where
is
the probability of yt given all the observations before time
t and
. Denote
the expected value of
as
. The conditional probability
is distributed normal
with a mean
and some variance,
denoted
, which depends
on the particular parameters,
= {B, Q, R}, that generated the data. Thus, the probability
of the time series given a particular set of parameters,
,
is
![]() |
(A.4)
|
from the probability density of
a normal with mean
and
variance
. The log likelihood
of
given the data,
, is
+
a constant. |
(A.5)
|
For Eq. A.5, we need estimates of
=
and
=
.
Observe from Eq. A.1 that
![]() |
(A.6)
|
The Kalman filter below gives optimal
estimates of
and
which
are then used in Eq. A.6 to calculate the log likelihood of
.
The maximum likelihood estimates
of B, Q, and R are found by using some type of maximization routine on Eq. A.6
to find the set of parameters
= {B,
Q, R} that maximize the likelihood. Matlab code for this algorithm is given
at the end of this appendix.
The Kalman filter
First, some notation:
![]() |
The Kalman recursion: Start
at t = 1 and step forward to T. Assume an initial
and
initial
to start the recursion.
One could let these be free variables and find the maximum likelihood values
when maximizing Eq. A.6, but that is not done here and the algorithm should
not be very sensitive to these starting values. At each step, compute:
![]() |
This is the well-known Kalman filter,
but it looks a little different than what you’ll see in engineering texts.
First generally it is assumed that yt is a series of measurements
from multiple instruments, thus the Kalman filter is always written in matrix
form. Here since yt is one measurement, it can be written
in scalar form. Second, the Kalman filter is usually presented for the model
.
In this application, A = 1, C = 1 and ut = 1,
so the filter is simplified quite a bit.
Literature cited
Harvey, A. C. 1991. Forecasting, structural time series models and the Kalman filter. Cambridge University Press, Cambridge, UK.
Lindley, S. T. 2003. Estimation of population growth and extinction parameters from noisy data. Ecological Applications 13:806813.
Maybeck, P. S. 1979. Stochastic models, estimation and control. Volume 1. Academic Press, New York, New York, USA.
Matlab code
function [mu,sigma2,sigma2np]=kalman_ests(data)
y=log(data);
%Start with some reasonable initial parameter estimates
muest=mean(y(2:end)-y(1:(end-1)));
tmp1=var(y(2:end)-y(1:(end-1)));
tmp4=var(y(5:end)-y(1:(end-4)));
sigma2est=(tmp4-tmp1)/3;
sigma2npest=(var(y(2:end)-y(1:(end-1)))-max(0.0001,sigma2est))/2;
pi1=max(0.0001,sigma2est)+max(0.0001,sigma2npest); %var of y(1)
%log transform the variances so the search algorithm doesn’t give negative
% variances
startvals=[muest log(max(0.0001,sigma2est)) log(max(0.0001,sigma2npest))];
%fminsearch is a Nelder-Mead minimization matlab function
a=fminsearch('kalman_loglik',startvals,[],y,y(1),pi1);
MLmuest=a(1);
MLsigma2est=exp(a(2));
MLsigma2npest=exp(a(3));
function negloglik = kalman_loglik(v,y,initx,V1)
T=length(y);
B = v(1); %mu
Q = exp(v(2)); %s2
R = exp(v(3)); %s2np
%initialize
xtt=zeros(1,T); Vtt=zeros(1,T); xtt1=zeros(1,T); Vtt1=zeros(1,T);
xtT=zeros(1,T); VtT=zeros(1,T); J=zeros(1,T); Vtt1T=zeros(1,T);
Ft=zeros(1,T); vt=zeros(1,T);
%forward pass gets you E[x(t) given y(1:t)]
x10=initx;
V10=V1;
for(t=1:T)
if(t==1)
xtt1(1) = initx; %denotes x_1^0
Vtt1(1) = V1; %denotes V_1^0
else
xtt1(t) = xtt(t-1) + B; %xtt1 denotes x_t^(t-1); Harvey 3.2.2a
Vtt1(t) = Vtt(t-1) + Q; %Harvey 3.2.2b
end
Kt = Vtt1(t)/(Vtt1(t)+R);
Ft(t) = Vtt1(t)+R;
vt(t) = y(t)-xtt1(t);
xtt(t) = xtt1(t) + Kt*(y(t) - xtt1(t)); %Harvey 3.2.3a
Vtt(t) = Vtt1(t)-Kt*Vtt1(t); %Harvey 3.3.3b
end
%Calculate negative log likelihood
negloglik = (1/2)*sum(vt.^2./Ft) + (1/2)*sum(log(abs(Ft))) + (T/2)*log(2*pi);