saeforhealth

Statistical Analysis

Model Selection

Model Screening

Model Fitting

Direct estimates

For $y_j$ be the binary outcome of interest for the $j^{\text{th}}$ individual in the survey and $w_j$ be the design weight associated with this individual. For a given area denoted as $i$, we have the weighted estimator: $\hat{p}^{W}_{i} = \frac{\sum_{j \in S_i} y_j \cdot w_j}{\sum_{j \in S_i} w_j}$, where $S_i$ is the set of individual index within the $i$-th region.

Direct estimates at different Admin levels are calculated using surveyPrev::directEST() in the surveyPrev package and SUMMER::smoothSurvey() function in the SUMMER package internally (Li et al. 2020).

The confidence intervals are computed on the logit scale, i.e., if we use $D_i$ to denote the design-based variance of $\hat p^{W}_i$, then the asymptotic design-based variance of $\text{logit}(\hat p^{W}_i)$ is

$$ V_i = \frac{D_i}{(\hat{p}^{W}_i)^2 \times (1 - \hat{p}^{W}_i)^2} $$ and we compute the confidence interval on the probability scale by exponentiation of the confidence interval at logit scale.

Currently the package defaults to a two-stage stratified cluster sampling design, with the sampling clusters (enumeration areas) being stratified by Admin-1 (certain countries Admin-2) areas and urban/rural strata, which is the most common sampling design in the DHS.

Note that under this model, the expected death counts for the same week/month over different years remains the same, thus it does not account for any across-year variation or time trend. The standard error of the expected death count $\tilde Y_t$ is estimated by the sample standard deviation of the death counts in the same month/week during pre-pandemic years, divided by the square root of the number of observations used to compute the sample average.

Finally the 95% lower and upper confidence interval of the expected deaths are computed by the Wald type interval $$ (\tilde Y_t - 1.96\times SE(\tilde Y_t), \tilde Y_t + 1.96\times SE(\tilde Y_t)) $$

The excess death counts are computed by $$ E_t = Y_t - \tilde Y_t $$ and the 95% confidence interval is given by $$ (Y_t - \tilde Y_t - 1.96\times SE(\tilde Y_t), Y_t - \tilde Y_t + 1.96\times SE(\tilde Y_t)) $$

Area-level (Fay-Herriot) Model

Fay-Herriot models provides smoothed estimates at the areal level using direct estimate $\hat p^{W}_{i}$ as input. The direct estimates are modeled as a noisy observation of the true prevalence, with the variance of noise determined by the design-based variance. We consider the spatial Fay-Herriot model for the logit transformed direct estimates, which is defined as follows:

$$\text{logit}(\hat p^{W}_{i})|\lambda_{i} \sim\textrm{Normal}(\lambda_{i}, V_{i}^{HT}),$$ $$\lambda_{i}= \alpha + e_i+S_i.$$ Here $\text{expit}(\lambda_{i})$ is the latent true prevalence, and $e_i$ and $S_i$ are unstructured and structured spatial random effects. Inference is carried out using Bayesian methods and so the model specific is completed by priors on $\alpha$, $e$ and $S$, and their hyperpriors. More details of the Bayesian model setup can be found in Wakefield, Okonek, and Pedersen (2020). Area level Fay-Herriot model are viewed as the most reliable model choice, since they acknowledge the design through the sue of a weighted estimate and its associated variance. See chapters 4 to 6 of Rao and Molina (2015).

As of now the package allows only an overall intercept $\alpha$, but future versions of the package will allow area level covariates to be included. The default prior for the intercept is $N(0, 1000)$. The structured and non-structured random effects are implemented used Besag-York-Mollié (BYM) via BYM2 parameterization, with default PC priors such that the marginal standard deviation has a prior such that $Prob(\sigma > 1) = 0.01$ and the proportion of variation explained by the spatial effect, $\phi$ has a prior such that $Prob(\phi \gt 0.5) = \frac{2}{3}$ (Riebler et al. 2016; Simpson et al. 2017).

The app implements spatial Fay-Herriot models at different administrative levels using surveyPrev::fhModel() and internally SUMMER::smoothSurvey().

A Fay-Herriot model at fine spatial level (over Admin-2) can be fitted by treating areas without direct estimates as missing data, though it is usually not recommended due to data sparsity. In addition, numerical issue arise when design-based variance of direct estimate is close to zero and logit precision become too large. Our implementation fixes this issue by identifying regions with too small design variance(< 1e-30), and deleting the clusters in these regions, before fitting the model. However, this step creates additional bias in the results if the number of clusters deleted is large.

Unit-level (Cluster-level) Model

Unit-level models assume smoothing models for counts of events in each cluster (Wakefield, Okonek, and Pedersen 2020; Li et al. 2020). In terms of traditional Small Area Estimation literature, cluster-level models are a type of unit-level model. Currently, the app implements an unstratified model without taking into account the urban/rural stratification in the sampling design.

Let $Y_c$ be the number of events in cluster $c$, and $n_c$ be the number of individuals at risk, where $c= 1,\dots,C$. The unstratified model assumes the hierarchical structure:

$$Y_c \mid p_c,d\sim \textrm{BetaBinomial}(n_c,p_c,d),$$ $$p_c=\textrm{expit}(\alpha+e_{i[s_c]}+S_{i[s_c]}),$$

where $\alpha$ is the intercept, and $i[s_c]$ indexes the area within which the cluster $s_c$ resides. Similar to the area-level model, $e_i$ and $S_i$ are unstructured and structured spatial random effects with the same prior as before. The Beta-binomial distribution arise from a hierarchical model where the probability follows a $\text{Beta}(a, b)$ prior. The overdispersion parameter, $d=\frac{1}{\alpha+\beta+1}$, is between 0 and 1 and represent the the intracluster correlation between Bernoulli draws within cluster.

The default prior for $d$ is $\text{logit}(d) \sim \text{Normal}(0,0.4)$.

The app implements the unit-level model via surveyPrev::clusterModel() function, with BYM2 model for the spatial random effects. The app currently only supports unstratified models. Please refer to surveyPrev package for stratified unit-level model.

Loading country meta data, please wait...

Country Meta Data Input

Data Upload Checklist

Statistical Analysis

Model Selection

Model Screening

Model Fitting

Direct estimates

Area-level (Fay-Herriot) Model

Unit-level (Cluster-level) Model

Results Tabulation

Subnational Results Mapping

Comparing Multiple Maps

Subnational Estimate Comparison - Scatter Plot

Scatter plot comparing estimates from fitted models for the same Admin level

Subnational Posterior Density Plot