The property of unbiasedness
(for an estimator of theta) is defined by

where the biasvector
delta can be written as

and the precision
vector as

is a positive definite symmetric K by K matrix.
If two different estimators of the
same parameter exist one can compute the difference between their
precision vectors: if this vector is positive semi definite this
means we know that the second estimator has a "smaller"
covariance matrix and can therefore be called better
than the first estimator.
An estimator is said to be efficient if it is unbiased and at the same the time no other
estimator exists with a lower covariance matrix.
If Y is a random variable
of independent observations with a probability distribution f then
the joint distribution can be written as

The function of the unknown
parameter, as a function of the values of the random variable, is
called the likelihood
function which has the same structure as the joint probability
function but is dependent on the random variable in stead of the
unknown parameter.
The information
matrix is defined as the negative of the expected value of the
Hessian matrix of the log likelihood function L


The Cramér-Rao
lower bound is defined as the inverse of the information matrix

denoted omega.
If an estimator is unbiased

a positive semi definite matrix. Expression (I.VI-6) is called the Cramér-Rao
Proof of this inequality
can be easily obtained. If we consider only one parameter, by
definition of the likelihood function we may write

which can be derived with
respect to the parameter

Deriving a second time

This implies that E((D ln L)2)
= - E(D2 ln L) which is equivalent to the information
If the estimator is
unbiased then

It follows from (I.VI-10)

On combining (I.VI-13) with
(I.VI-12) and applying the Cauchy-Schwarz inequality we obtain

which the Cramér-Rao inequality follows immediately.
Note that according to the
Cramér-Rao lower bound

not vice versa. This is because the Cramér-Rao lower bound is not
always attainable (for unbiased estimators).
The property of sufficiency
can be formulated as

while the property of consistency is defined as

delta is a small scalar and epsilon is a vector containing elements
with "small" values.
The large sample properties
apply only when the number of observations converges towards
infinity in the limit. Accordingly, we can define the large
sample consistency as

epsilon is "small".
By definition we can also
use a shorter notation

"plim" is the so-called "probability limit". In
this case we say that the estimator for theta converges
in probability to the population value of theta.
A short example will
clarify the concept of large sample consistency. Let us take the
sample mean as an estimator of the population mean. Then it is
possible to prove large sample consistency on using eq. (I.III-47)
applied to the sample mean:

The standard deviation of
the sample mean is known to be

On combining (I.VI-20) and
(I.VI-21) we obtain


Now it obvious that

where the RHS can be made
arbitrarily close to 1 by increasing T (the number of sample
observations). Now we may conclude

A sufficient, but not
necessary, condition for large
sample efficiency is

According to Slutsky's
theorem the following holds


properties of plims are


is true even if both estimators are dependent on each other: this is
not so with the mathematical expectation) and finally

AT is a square
parameter matrix.
Note the following
definition of asymptotically distributed parameter vectors


The concept of asymptotic
efficiency can be used to compare
estimators. Formally this is written:


Finally we describe Cramér's theorem because it enables us to combine plims with
convergence in distribution. Formally this theorem states that if

