Abstract:
When data appear more dispersed than expected under a reference model, the situation
is termed as overdispersion. In modelling a count variable in terms of some independent
predictor variables, theoretically most established and the simplest available reference
model is Poisson regression model. For standard Poisson regression model, variance
is equal to mean and there is no extra parameter for dispersion. However, in practi-
cal scenario, the estimated variance from data often exceeds the mean and the data
is considered to be overdispersed. To solve the overdispersion problem, two common
alternative approaches are i) tting a more general parametric distribution ii) having a
di erent form mean variance relationship without fully specifying the distribution. Both
approaches include parameters for overdispersion to be estimated from data. However,
when there is no overdispersion, Poisson regression model is preferred for its simplicity,
interpretability and theoretical basis. Therefore, robust test for detecting the signi cance
of parameter related to overdispersion is important to use before going for alterative to
Poisson regression.
In this work, we have investigated tests for detecting overdispersion when Poisson model
is used for count data. The tests discussed are derived from partial score and are applica-
ble against negative binomial or more generally mixed Poisson alternatives. These tests
do not require tting alternative models that incorporate overdispersion to check the ab-
sence of overdispersion. Only Poisson model is needed to be tted. Four test statistics
are illustrated with their distributional approximations for computing signi cance level.
The test statistics have been analyzed and compared based on the assumptions on de-
riving the statistics, their limiting distributions and applicability for di erent number of
observation in sample. A simulation study was done to check adequacy of distributional
assumption for three of them who follow approximately normal distribution. The study
involved generating samples of the statistics and proportion of the time each exceeded
the standard normal upper 20%, 10%, 5%, and 1% point were tabulated. From the
results, the normality assumption of one of the statistics has been observed to be good
for large sample size but less accurate for small size. Another one of the statistics has
been found to have almost accurate standard normal distribution even for small sample.
Some comparisons and recommendations relating to the applicability and assumptions
of the statistics are also presented.
Description:
This thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Applied Statistics of East West University, Dhaka, Bangladesh