Country Meta Data Input




Data Upload Checklist



Statistical Analysis

Model Selection


Model Screening


Model Fitting

Direct estimates

For \(y_j\) be the binary outcome of interest for the \(j^{\text{th}}\) individual in the survey and \(w_j\) be the design weight associated with this individual. For a given area denoted as \(i\), we have the weighted estimator: \(\hat{p}^{W}_{i} = \frac{\sum_{j \in S_i} y_j \cdot w_j}{\sum_{j \in S_i} w_j}\), where \(S_i\) is the set of individual index within the \(i\)-th region.

Direct estimates at different Admin levels are calculated using surveyPrev::directEST() in the surveyPrev package and SUMMER::smoothSurvey() function in the SUMMER package internally (Li et al. 2020).

The confidence intervals are computed on the logit scale, i.e., if we use \(D_i\) to denote the design-based variance of \(\hat p^{W}_i\), then the asymptotic design-based variance of \(\text{logit}(\hat p^{W}_i)\) is

$$ V_i = \frac{D_i}{(\hat{p}^{W}_i)^2 \times (1 - \hat{p}^{W}_i)^2} $$ and we compute the confidence interval on the probability scale by exponentiation of the confidence interval at logit scale.

Currently the package defaults to a two-stage stratified cluster sampling design, with the sampling clusters (enumeration areas) being stratified by Admin-1 (certain countries Admin-2) areas and urban/rural strata, which is the most common sampling design in the DHS.

Note that under this model, the expected death counts for the same week/month over different years remains the same, thus it does not account for any across-year variation or time trend. The standard error of the expected death count \(\tilde Y_t\) is estimated by the sample standard deviation of the death counts in the same month/week during pre-pandemic years, divided by the square root of the number of observations used to compute the sample average.

Finally the 95% lower and upper confidence interval of the expected deaths are computed by the Wald type interval $$ (\tilde Y_t - 1.96\times SE(\tilde Y_t), \tilde Y_t + 1.96\times SE(\tilde Y_t)) $$

The excess death counts are computed by $$ E_t = Y_t - \tilde Y_t $$ and the 95% confidence interval is given by $$ (Y_t - \tilde Y_t - 1.96\times SE(\tilde Y_t), Y_t - \tilde Y_t + 1.96\times SE(\tilde Y_t)) $$

Area-level (Fay-Herriot) Model

Fay-Herriot models provides smoothed estimates at the areal level using direct estimate \(\hat p^{W}_{i}\) as input. The direct estimates are modeled as a noisy observation of the true prevalence, with the variance of noise determined by the design-based variance. We consider the spatial Fay-Herriot model for the logit transformed direct estimates, which is defined as follows:

$$\text{logit}(\hat p^{W}_{i})|\lambda_{i} \sim\textrm{Normal}(\lambda_{i}, V_{i}^{HT}),$$ $$\lambda_{i}= \alpha + e_i+S_i.$$ Here \(\text{expit}(\lambda_{i})\) is the latent true prevalence, and \(e_i\) and \(S_i\) are unstructured and structured spatial random effects. Inference is carried out using Bayesian methods and so the model specific is completed by priors on \(\alpha\), \(e\) and \(S\), and their hyperpriors. More details of the Bayesian model setup can be found in Wakefield, Okonek, and Pedersen (2020). Area level Fay-Herriot model are viewed as the most reliable model choice, since they acknowledge the design through the sue of a weighted estimate and its associated variance. See chapters 4 to 6 of Rao and Molina (2015).

As of now the package allows only an overall intercept \(\alpha\), but future versions of the package will allow area level covariates to be included. The default prior for the intercept is \(N(0, 1000)\). The structured and non-structured random effects are implemented used Besag-York-MolliƩ (BYM) via BYM2 parameterization, with default PC priors such that the marginal standard deviation has a prior such that \(Prob(\sigma > 1) = 0.01\) and the proportion of variation explained by the spatial effect, \(\phi\) has a prior such that \(Prob(\phi \gt 0.5) = \frac{2}{3}\) (Riebler et al. 2016; Simpson et al. 2017).

The app implements spatial Fay-Herriot models at different administrative levels using surveyPrev::fhModel() and internally SUMMER::smoothSurvey().

A Fay-Herriot model at fine spatial level (over Admin-2) can be fitted by treating areas without direct estimates as missing data, though it is usually not recommended due to data sparsity. In addition, numerical issue arise when design-based variance of direct estimate is close to zero and logit precision become too large. Our implementation fixes this issue by identifying regions with too small design variance(< 1e-30), and deleting the clusters in these regions, before fitting the model. However, this step creates additional bias in the results if the number of clusters deleted is large.

Unit-level (Cluster-level) Model

Unit-level models assume smoothing models for counts of events in each cluster (Wakefield, Okonek, and Pedersen 2020; Li et al. 2020). In terms of traditional Small Area Estimation literature, cluster-level models are a type of unit-level model. Currently, the app implements an unstratified model without taking into account the urban/rural stratification in the sampling design.

Let \(Y_c\) be the number of events in cluster \(c\), and \(n_c\) be the number of individuals at risk, where \(c= 1,\dots,C\). The unstratified model assumes the hierarchical structure:

$$Y_c \mid p_c,d\sim \textrm{BetaBinomial}(n_c,p_c,d),$$ $$p_c=\textrm{expit}(\alpha+e_{i[s_c]}+S_{i[s_c]}),$$

where \(\alpha\) is the intercept, and \(i[s_c]\) indexes the area within which the cluster \(s_c\) resides. Similar to the area-level model, \(e_i\) and \(S_i\) are unstructured and structured spatial random effects with the same prior as before. The Beta-binomial distribution arise from a hierarchical model where the probability follows a \(\text{Beta}(a, b)\) prior. The overdispersion parameter, \(d=\frac{1}{\alpha+\beta+1}\), is between 0 and 1 and represent the the intracluster correlation between Bernoulli draws within cluster.

The default prior for \(d\) is \(\text{logit}(d) \sim \text{Normal}(0,0.4)\).

The app implements the unit-level model via surveyPrev::clusterModel() function, with BYM2 model for the spatial random effects. The app currently only supports unstratified models. Please refer to surveyPrev package for stratified unit-level model.

Results Tabulation


Subnational Results Mapping


Comparing Multiple Maps



Click to produce plot and/or apply changes.

Click to produce plot and/or apply changes.

Subnational Estimate Comparison - Scatter Plot

Scatter plot comparing estimates from fitted models for the same Admin level


Subnational Posterior Density Plot