Skip to contents

Abstract

The mglm4twin package implements multivariate generalized linear models for twin and family data as proposed by Bonat and Hjelmborg (2020). The core fit function mglm4twin is employed for fitting a set of models. In this introductory vignette we restrict ourselves to model bivariate binary outcomes. We present a set of special specifications of the standard ACE for binary trait in the context of twin data. This vignette reproduces the data analysis presented in the Section 5.3 Dataset 3: Bivariate dichotomous data of Bonat and Hjelmborg (2020).

Install and loading the mglm4twin package

To install the stable version of mglm4twin, use devtools::install_git(). For more information, visit mglm4twin/README.

library(devtools)
install_git("bonatwagner/mglm4twin")
library(mglm4twin)
packageVersion("mglm4twin")
#> [1] '0.5.0'

Data set

In this example, we use the dataset analysed by Feng et al. (2009) regarding bronchopulmonary dysplasia (BPD) and respiratory distress syndrome (RDS) on preterm infants. Both diseases are lung related and expected to have a genetic component. The dataset consists of 200 twin-pairs being 137 DZ and 63 MZ. Additionally, we considered the covariates: birth weight (BW), gestation age (GA) and gender (male and female).

data(bpdrds)
head(bpdrds)
#>   Twin gender GA   BW RDS BPD Group Twin_pair
#> 1    3   Male 29 1470   1   0    DZ         1
#> 2    3   Male 29 1495   1   0    DZ         2
#> 3    8 Female 28 1060   1   0    DZ         1
#> 4    8   Male 28 1010   1   1    DZ         2
#> 5    9   Male 31 1620   0   0    DZ         1
#> 6    9   Male 31 1685   1   0    DZ         2

Model specification

The model specification is done in three steps:

  1. Specifying the linear predictor (mean model).
  2. Specifying the matrix linear predictor (twin dependence structure).
  3. Specifying link and variance function suitable to response variable type.

Specyfing the linear predictor

In the bpdrds data set we have three covariates available for composing the linear predictor. Thus, the linear predictor is given by

form_BPD <- BPD ~ BW + GA + gender + Group*Twin_pair
form_RDS <- RDS ~ BW + GA + gender + Group*Twin_pair

It is important to note that in this example we have two outcomes. Consequently, we have to specify a linear predictor for each of them.

Specyfing the matrix linear predictor

The matrix linear predictor describes the twin dependence structure. The mglm4twin package provides a set of pre-specified covariance structures. However, the users can easily specify their own covariance structure through a linear combination of known matrices. In this example, we are going to specify a set of special cases of the standard ACE models for twin data. Furthermore, for comparison purposes we are going to fit univariate and bivariate models.

## Univariate models
uni_E <- mt_twin(N_DZ = 137, N_MZ = 63, n_resp = 1, model = "E")
uni_AE <- mt_twin(N_DZ = 137, N_MZ = 63, n_resp = 1, model = "AE")
uni_CE <- mt_twin(N_DZ = 137, N_MZ = 63, n_resp = 1, model = "CE")
uni_ACE <- mt_twin(N_DZ = 137, N_MZ = 63, n_resp = 1, model = "ACE")

## Bivariate models
biv_E <- mt_twin(N_DZ = 137, N_MZ = 63, n_resp = 2, model = "E")
biv_AE <- mt_twin(N_DZ = 137, N_MZ = 63, n_resp = 2, model = "AE")
biv_CE <- mt_twin(N_DZ = 137, N_MZ = 63, n_resp = 2, model = "CE")
biv_ACE <- mt_twin(N_DZ = 137, N_MZ = 63, n_resp = 2, model = "ACE")

The specificantion of the link and variance functions depends on the response variable type. In the case of the dichotomous data the mglm4twinpackage offers the logit, probit, cauchit, cloglog and loglog link functions. For the variance function the binomialP represents the standard binomial variance function, i.e. μ(1μ)\mu(1-\mu) which is suitable for binary outcomes. In this example, we follow Bonat and Hjelmborg (2020) and use the logit link function and the binomialP variance function for both outcomes.

link = c("logit", "logit")
variance = c("binomialP", "binomialP")

It is interesting to note that the user should specify the link and variance functions for each outcome.

Model fitting

The main fitting function in the mglm4twin package is the mglm4twin() function. It is the same function for univariate and multivariate models. The main difference is that now we should pass a vector of linear predictors, link and variance functions. In this example, first we are going to fit the univariate models for each response variable and then the bivariate counterparts.

## Univariate fit

# Univariate E model
fitE_BPD <- mglm4twin(linear_pred = c(form_BPD), matrix_pred = uni_E,
                      link = c("logit"), variance = c("binomialP"),
                      data = bpdrds)
#> Automatic initial values selected.
fitE_RDS <- mglm4twin(linear_pred = c(form_RDS), matrix_pred = uni_E,
                      link = c("logit"), variance = c("binomialP"),
                      data = bpdrds)
#> Automatic initial values selected.

# Univariate AE model
fitAE_BPD <- mglm4twin(linear_pred = c(form_BPD), matrix_pred = uni_AE,
                      link = c("logit"), variance = c("binomialP"),
                      data = bpdrds)
#> Automatic initial values selected.
fitAE_RDS <- mglm4twin(linear_pred = c(form_RDS), matrix_pred = uni_AE,
                       link = c("logit"), variance = c("binomialP"),
                       data = bpdrds)
#> Automatic initial values selected.

# Univariate CE model
fitCE_BPD <- mglm4twin(linear_pred = c(form_BPD), matrix_pred = uni_CE,
                       link = c("logit"), variance = c("binomialP"),
                       data = bpdrds)
#> Automatic initial values selected.
fitCE_RDS <- mglm4twin(linear_pred = c(form_RDS), matrix_pred = uni_CE,
                       link = c("logit"), variance = c("binomialP"),
                       data = bpdrds)
#> Automatic initial values selected.

# Univariate ACE model
fitACE_BPD <- mglm4twin(linear_pred = c(form_BPD), matrix_pred = uni_ACE,
                       link = c("logit"), variance = c("binomialP"),
                       data = bpdrds)
#> Automatic initial values selected.
fitACE_RDS <- mglm4twin(linear_pred = c(form_RDS), matrix_pred = uni_ACE,
                        link = c("logit"), variance = c("binomialP"),
                        data = bpdrds)
#> Automatic initial values selected.

## Bivariate fit

# E model
fitE_biv <- mglm4twin(linear_pred = c(form_BPD, form_RDS), matrix_pred = biv_E, 
                      link = c("logit","logit"), 
                      variance = c("binomialP","binomialP"), 
                      data = bpdrds)
#> Automatic initial values selected.
# AE model
fitAE_biv <- mglm4twin(linear_pred = c(form_BPD, form_RDS), matrix_pred = biv_AE, 
                       link = c("logit","logit"), 
                       variance = c("binomialP","binomialP"), 
                       data = bpdrds)
#> Automatic initial values selected.
# CE model
fitCE_biv <- mglm4twin(linear_pred = c(form_BPD, form_RDS), matrix_pred = biv_CE, 
                       link = c("logit","logit"), 
                       variance = c("binomialP","binomialP"), 
                       data = bpdrds)
#> Automatic initial values selected.
# ACE model
fitACE_biv <- mglm4twin(linear_pred = c(form_BPD, form_RDS), matrix_pred = biv_ACE, 
                        link = c("logit", "logit"),
                        variance = c("binomialP","binomialP"),
                        data = bpdrds)
#> Automatic initial values selected.

Model output

The mglm4twin package offers a general summary() function that can be customized to show the dispersion components (standard output) or measures of interest (biometric = TRUE) as genetic correlation and heritability.

Model comparison

For model comparison the mglm4twin package offers the gof()function for computing pseudo version of the log-likelihood, AIC, BIC and pKLIC information criterio. It is intereting to note that the gof()function allows to combine two univariate fits for jointly computing the goodness-of-fit measures under the assumption of zero correlation between the outcomes.

## Univariate models
uni <- round(rbind("E" = gof(list(fitE_BPD, fitE_RDS)), 
                   "AE" = gof(list(fitAE_BPD, fitAE_RDS)) ,
                   "CE" = gof(list(fitCE_BPD, fitCE_RDS)),
                   "ACE" = gof(list(fitACE_BPD, fitACE_RDS))), 2)

## Bivariate models
multi <- round(rbind("E" = gof(fitE_biv), "AE" = gof(fitAE_biv), 
                     "CE" = gof(fitCE_biv), "ACE" = gof(fitACE_biv)), 2)

Reproducing Table 3 of Bonat and Hjelmborg (2020)

Table3
#>             ACE      CE      AE       E     ACE      CE      AE       E
#> plogLik -323.03 -334.97 -324.22 -359.99 -311.86 -323.82 -312.90 -347.59
#> pAIC     686.06  705.94  684.44  751.98  669.72  687.64  665.80  729.18
#> pBIC     779.75  790.26  768.76  826.93  777.47  781.33  759.49  808.82
#> Df        20.00   18.00   18.00   16.00   23.00   20.00   20.00   17.00

Reproducing Table 4 of Bonat and Hjelmborg (2020)

For reprocucing Table 4 of Bonat and Hjelmborg (2020), we have to compute the genetic and environment correlation along with the heritability, environmentability and bivariate heritability and environmentability. The mglm4twinoffers some facilities for such computation, see the code below.

## Genetic correlation
tab_rho_a <- mt_compute_rho(Estimates = coef(fitAE_biv, model = "AE"), 
                            vcov = vcov(fitAE_biv, model = "AE"),
                            component = "A", n_resp = 2)

## Environment correlation
tab_rho_e <- mt_compute_rho(Estimates = coef(fitAE_biv, model = "AE"), 
                            vcov = vcov(fitAE_biv, model = "AE"),
                            component = "E", n_resp = 2)

The other measures can be extracted from the summary().

#> Call: BPD ~ BW + GA + gender + Group * Twin_pair
#> 
#> Link function: logit
#> Variance function: binomialP
#> Regression:
#>                      Estimates    std.error    z value    Pr(>|z|)
#> (Intercept)        7.367820116 2.5735435410  2.8629087 0.004197715
#> BW                -0.001313375 0.0006317681 -2.0788877 0.037627670
#> GA                -0.266207139 0.1064004168 -2.5019370 0.012351590
#> genderMale         0.255583845 0.2871737357  0.8899973 0.373467344
#> GroupMZ            0.729534353 0.6765774479  1.0782718 0.280912498
#> Twin_pair          0.282639987 0.2680857707  1.0542894 0.291750439
#> GroupMZ:Twin_pair -0.208591104 0.3727852429 -0.5595476 0.575788026
#> 
#> Call: RDS ~ BW + GA + gender + Group * Twin_pair
#> 
#> Link function: logit
#> Variance function: binomialP
#> Regression:
#>                       Estimates    std.error     z value     Pr(>|z|)
#> (Intercept)       17.7898625766 2.9262172582  6.07947429 1.205772e-09
#> BW                 0.0003041043 0.0004392348  0.69235019 4.887174e-01
#> GA                -0.6161383846 0.1082421380 -5.69222297 1.253959e-08
#> genderMale         0.4320882307 0.2646558138  1.63264213 1.025443e-01
#> GroupMZ           -0.2550136464 0.7176032820 -0.35536856 7.223135e-01
#> Twin_pair          0.3702896784 0.2717155215  1.36278442 1.729505e-01
#> GroupMZ:Twin_pair -0.0163479996 0.4352948436 -0.03755615 9.700416e-01
#> 
#> Dispersion:
#> Genetic component:
#>     Estimates  std.error  z value     Pr(>|z|)
#> A1  0.7499721 0.17663106 4.245981 2.176393e-05
#> A2  0.4797312 0.11943718 4.016599 5.904417e-05
#> A12 0.2696486 0.07602441 3.546869 3.898384e-04
#> 
#> Environment component:
#>       Estimates  std.error   z value     Pr(>|z|)
#> E1  0.315126079 0.16682685 1.8889410 5.889973e-02
#> E2  0.624749528 0.10451108 5.9778307 2.261286e-09
#> E12 0.005736647 0.05068788 0.1131759 9.098911e-01
#> 
#> Measures associated with genetic structure:
#> Genetic correlation:
#>       Estimates std.error  z value     Pr(>|z|)
#> rho12 0.4495486 0.1126027 3.992342 6.542384e-05
#> 
#> Heritability:
#>    Estimates  std.error  z value     Pr(>|z|)
#> h1 0.7041342 0.14775291 4.765620 1.882738e-06
#> h2 0.4343500 0.07043692 6.166511 6.981330e-10
#> 
#> Bivariate heritability:
#>     Estimates std.error  z value     Pr(>|z|)
#> h12 0.9791687 0.1838009 5.327333 9.966523e-08
#> 
#> Measures associated with environment structure:
#> Environment correlation:
#>        Estimates std.error   z value  Pr(>|z|)
#> rho12 0.01292894 0.1137449 0.1136661 0.9095025
#> 
#> Environmentality:
#>    Estimates  std.error  z value     Pr(>|z|)
#> e1 0.2958658 0.14775291 2.002436 4.523785e-02
#> e2 0.5656500 0.07043692 8.030590 9.700495e-16
#> 
#> Bivariate environmentality:
#>      Estimates std.error   z value  Pr(>|z|)
#> e12 0.02083135 0.1838009 0.1133365 0.9097638
#> 
#> Algorithm: chaser
#> Correction: FALSE
#> Number iterations: 10
Table4
#>                    BPD BPD_std  RDS RDS_std BPD_RDS BPD_RDS_std
#> Heritability       0.7    0.15 0.43    0.07    0.98        0.18
#> Environmentability 0.3    0.15 0.57    0.07    0.02        0.18
#> Genetic Cor         NA      NA   NA      NA    0.45        0.11
#> Environment Cor     NA      NA   NA      NA    0.01        0.11