BPD and RDS on preterm infants: Bivariate dichotomous data
Source:vignettes/BPD_RDS.Rmd
BPD_RDS.Rmd
Abstract
The mglm4twin
package implements multivariate
generalized linear models for twin and family data as proposed by Bonat
and Hjelmborg (2020). The core fit function mglm4twin
is
employed for fitting a set of models. In this introductory vignette we
restrict ourselves to model bivariate binary outcomes. We present a set
of special specifications of the standard ACE for binary trait in the
context of twin data. This vignette reproduces the data analysis
presented in the Section 5.3 Dataset 3: Bivariate dichotomous data of
Bonat and Hjelmborg (2020).
Install and loading the mglm4twin package
To install the stable version of mglm4twin
,
use devtools::install_git()
. For more information, visit mglm4twin/README.
library(devtools)
install_git("bonatwagner/mglm4twin")
library(mglm4twin)
packageVersion("mglm4twin")
#> [1] '0.5.0'
Data set
In this example, we use the dataset analysed by Feng et al. (2009) regarding bronchopulmonary dysplasia (BPD) and respiratory distress syndrome (RDS) on preterm infants. Both diseases are lung related and expected to have a genetic component. The dataset consists of 200 twin-pairs being 137 DZ and 63 MZ. Additionally, we considered the covariates: birth weight (BW), gestation age (GA) and gender (male and female).
Model specification
The model specification is done in three steps:
- Specifying the linear predictor (mean model).
- Specifying the matrix linear predictor (twin dependence structure).
- Specifying link and variance function suitable to response variable type.
Specyfing the linear predictor
In the bpdrds data set we have three covariates available for composing the linear predictor. Thus, the linear predictor is given by
form_BPD <- BPD ~ BW + GA + gender + Group*Twin_pair
form_RDS <- RDS ~ BW + GA + gender + Group*Twin_pair
It is important to note that in this example we have two outcomes. Consequently, we have to specify a linear predictor for each of them.
Specyfing the matrix linear predictor
The matrix linear predictor describes the twin dependence structure.
The mglm4twin
package provides a set of pre-specified
covariance structures. However, the users can easily specify their own
covariance structure through a linear combination of known matrices. In
this example, we are going to specify a set of special cases of the
standard ACE models for twin data. Furthermore, for comparison purposes
we are going to fit univariate and bivariate models.
## Univariate models
uni_E <- mt_twin(N_DZ = 137, N_MZ = 63, n_resp = 1, model = "E")
uni_AE <- mt_twin(N_DZ = 137, N_MZ = 63, n_resp = 1, model = "AE")
uni_CE <- mt_twin(N_DZ = 137, N_MZ = 63, n_resp = 1, model = "CE")
uni_ACE <- mt_twin(N_DZ = 137, N_MZ = 63, n_resp = 1, model = "ACE")
## Bivariate models
biv_E <- mt_twin(N_DZ = 137, N_MZ = 63, n_resp = 2, model = "E")
biv_AE <- mt_twin(N_DZ = 137, N_MZ = 63, n_resp = 2, model = "AE")
biv_CE <- mt_twin(N_DZ = 137, N_MZ = 63, n_resp = 2, model = "CE")
biv_ACE <- mt_twin(N_DZ = 137, N_MZ = 63, n_resp = 2, model = "ACE")
Specyfing link and variance functions
The specificantion of the link and variance functions depends on the
response variable type. In the case of the dichotomous data the
mglm4twin
package offers the logit
,
probit
, cauchit
, cloglog
and
loglog
link functions. For the variance function the
binomialP
represents the standard binomial variance
function,
i.e.
which is suitable for binary outcomes. In this example, we follow Bonat
and Hjelmborg (2020) and use the logit
link function and
the binomialP
variance function for both outcomes.
It is interesting to note that the user should specify the link and variance functions for each outcome.
Model fitting
The main fitting function in the mglm4twin
package is
the mglm4twin()
function. It is the same function for
univariate and multivariate models. The main difference is that now we
should pass a vector of linear predictors, link and variance functions.
In this example, first we are going to fit the univariate models for
each response variable and then the bivariate counterparts.
## Univariate fit
# Univariate E model
fitE_BPD <- mglm4twin(linear_pred = c(form_BPD), matrix_pred = uni_E,
link = c("logit"), variance = c("binomialP"),
data = bpdrds)
#> Automatic initial values selected.
fitE_RDS <- mglm4twin(linear_pred = c(form_RDS), matrix_pred = uni_E,
link = c("logit"), variance = c("binomialP"),
data = bpdrds)
#> Automatic initial values selected.
# Univariate AE model
fitAE_BPD <- mglm4twin(linear_pred = c(form_BPD), matrix_pred = uni_AE,
link = c("logit"), variance = c("binomialP"),
data = bpdrds)
#> Automatic initial values selected.
fitAE_RDS <- mglm4twin(linear_pred = c(form_RDS), matrix_pred = uni_AE,
link = c("logit"), variance = c("binomialP"),
data = bpdrds)
#> Automatic initial values selected.
# Univariate CE model
fitCE_BPD <- mglm4twin(linear_pred = c(form_BPD), matrix_pred = uni_CE,
link = c("logit"), variance = c("binomialP"),
data = bpdrds)
#> Automatic initial values selected.
fitCE_RDS <- mglm4twin(linear_pred = c(form_RDS), matrix_pred = uni_CE,
link = c("logit"), variance = c("binomialP"),
data = bpdrds)
#> Automatic initial values selected.
# Univariate ACE model
fitACE_BPD <- mglm4twin(linear_pred = c(form_BPD), matrix_pred = uni_ACE,
link = c("logit"), variance = c("binomialP"),
data = bpdrds)
#> Automatic initial values selected.
fitACE_RDS <- mglm4twin(linear_pred = c(form_RDS), matrix_pred = uni_ACE,
link = c("logit"), variance = c("binomialP"),
data = bpdrds)
#> Automatic initial values selected.
## Bivariate fit
# E model
fitE_biv <- mglm4twin(linear_pred = c(form_BPD, form_RDS), matrix_pred = biv_E,
link = c("logit","logit"),
variance = c("binomialP","binomialP"),
data = bpdrds)
#> Automatic initial values selected.
# AE model
fitAE_biv <- mglm4twin(linear_pred = c(form_BPD, form_RDS), matrix_pred = biv_AE,
link = c("logit","logit"),
variance = c("binomialP","binomialP"),
data = bpdrds)
#> Automatic initial values selected.
# CE model
fitCE_biv <- mglm4twin(linear_pred = c(form_BPD, form_RDS), matrix_pred = biv_CE,
link = c("logit","logit"),
variance = c("binomialP","binomialP"),
data = bpdrds)
#> Automatic initial values selected.
# ACE model
fitACE_biv <- mglm4twin(linear_pred = c(form_BPD, form_RDS), matrix_pred = biv_ACE,
link = c("logit", "logit"),
variance = c("binomialP","binomialP"),
data = bpdrds)
#> Automatic initial values selected.
Model output
The mglm4twin
package offers a general
summary()
function that can be customized to show the
dispersion components (standard output) or measures of interest
(biometric = TRUE
) as genetic correlation and
heritability.
Model comparison
For model comparison the mglm4twin
package offers the
gof()
function for computing pseudo version of the
log-likelihood, AIC, BIC and pKLIC information criterio. It is
intereting to note that the gof()
function allows to combine
two univariate fits for jointly computing the goodness-of-fit measures
under the assumption of zero correlation between the outcomes.
## Univariate models
uni <- round(rbind("E" = gof(list(fitE_BPD, fitE_RDS)),
"AE" = gof(list(fitAE_BPD, fitAE_RDS)) ,
"CE" = gof(list(fitCE_BPD, fitCE_RDS)),
"ACE" = gof(list(fitACE_BPD, fitACE_RDS))), 2)
## Bivariate models
multi <- round(rbind("E" = gof(fitE_biv), "AE" = gof(fitAE_biv),
"CE" = gof(fitCE_biv), "ACE" = gof(fitACE_biv)), 2)
Reproducing Table 3 of Bonat and Hjelmborg (2020)
Table3
#> ACE CE AE E ACE CE AE E
#> plogLik -323.03 -334.97 -324.22 -359.99 -311.86 -323.82 -312.90 -347.59
#> pAIC 686.06 705.94 684.44 751.98 669.72 687.64 665.80 729.18
#> pBIC 779.75 790.26 768.76 826.93 777.47 781.33 759.49 808.82
#> Df 20.00 18.00 18.00 16.00 23.00 20.00 20.00 17.00
Reproducing Table 4 of Bonat and Hjelmborg (2020)
For reprocucing Table 4 of Bonat and Hjelmborg (2020), we have to
compute the genetic and environment correlation along with the
heritability, environmentability and bivariate heritability and
environmentability. The mglm4twin
offers some facilities for
such computation, see the code below.
## Genetic correlation
tab_rho_a <- mt_compute_rho(Estimates = coef(fitAE_biv, model = "AE"),
vcov = vcov(fitAE_biv, model = "AE"),
component = "A", n_resp = 2)
## Environment correlation
tab_rho_e <- mt_compute_rho(Estimates = coef(fitAE_biv, model = "AE"),
vcov = vcov(fitAE_biv, model = "AE"),
component = "E", n_resp = 2)
The other measures can be extracted from the
summary()
.
#> Call: BPD ~ BW + GA + gender + Group * Twin_pair
#>
#> Link function: logit
#> Variance function: binomialP
#> Regression:
#> Estimates std.error z value Pr(>|z|)
#> (Intercept) 7.367820116 2.5735435410 2.8629087 0.004197715
#> BW -0.001313375 0.0006317681 -2.0788877 0.037627670
#> GA -0.266207139 0.1064004168 -2.5019370 0.012351590
#> genderMale 0.255583845 0.2871737357 0.8899973 0.373467344
#> GroupMZ 0.729534353 0.6765774479 1.0782718 0.280912498
#> Twin_pair 0.282639987 0.2680857707 1.0542894 0.291750439
#> GroupMZ:Twin_pair -0.208591104 0.3727852429 -0.5595476 0.575788026
#>
#> Call: RDS ~ BW + GA + gender + Group * Twin_pair
#>
#> Link function: logit
#> Variance function: binomialP
#> Regression:
#> Estimates std.error z value Pr(>|z|)
#> (Intercept) 17.7898625766 2.9262172582 6.07947429 1.205772e-09
#> BW 0.0003041043 0.0004392348 0.69235019 4.887174e-01
#> GA -0.6161383846 0.1082421380 -5.69222297 1.253959e-08
#> genderMale 0.4320882307 0.2646558138 1.63264213 1.025443e-01
#> GroupMZ -0.2550136464 0.7176032820 -0.35536856 7.223135e-01
#> Twin_pair 0.3702896784 0.2717155215 1.36278442 1.729505e-01
#> GroupMZ:Twin_pair -0.0163479996 0.4352948436 -0.03755615 9.700416e-01
#>
#> Dispersion:
#> Genetic component:
#> Estimates std.error z value Pr(>|z|)
#> A1 0.7499721 0.17663106 4.245981 2.176393e-05
#> A2 0.4797312 0.11943718 4.016599 5.904417e-05
#> A12 0.2696486 0.07602441 3.546869 3.898384e-04
#>
#> Environment component:
#> Estimates std.error z value Pr(>|z|)
#> E1 0.315126079 0.16682685 1.8889410 5.889973e-02
#> E2 0.624749528 0.10451108 5.9778307 2.261286e-09
#> E12 0.005736647 0.05068788 0.1131759 9.098911e-01
#>
#> Measures associated with genetic structure:
#> Genetic correlation:
#> Estimates std.error z value Pr(>|z|)
#> rho12 0.4495486 0.1126027 3.992342 6.542384e-05
#>
#> Heritability:
#> Estimates std.error z value Pr(>|z|)
#> h1 0.7041342 0.14775291 4.765620 1.882738e-06
#> h2 0.4343500 0.07043692 6.166511 6.981330e-10
#>
#> Bivariate heritability:
#> Estimates std.error z value Pr(>|z|)
#> h12 0.9791687 0.1838009 5.327333 9.966523e-08
#>
#> Measures associated with environment structure:
#> Environment correlation:
#> Estimates std.error z value Pr(>|z|)
#> rho12 0.01292894 0.1137449 0.1136661 0.9095025
#>
#> Environmentality:
#> Estimates std.error z value Pr(>|z|)
#> e1 0.2958658 0.14775291 2.002436 4.523785e-02
#> e2 0.5656500 0.07043692 8.030590 9.700495e-16
#>
#> Bivariate environmentality:
#> Estimates std.error z value Pr(>|z|)
#> e12 0.02083135 0.1838009 0.1133365 0.9097638
#>
#> Algorithm: chaser
#> Correction: FALSE
#> Number iterations: 10
Table4
#> BPD BPD_std RDS RDS_std BPD_RDS BPD_RDS_std
#> Heritability 0.7 0.15 0.43 0.07 0.98 0.18
#> Environmentability 0.3 0.15 0.57 0.07 0.02 0.18
#> Genetic Cor NA NA NA NA 0.45 0.11
#> Environment Cor NA NA NA NA 0.01 0.11