Sleep’s quality: Mixed types of outcomes • mglm4twin

Abstract

The mglm4twin package implements multivariate generalized linear models for twin and family data as proposed by Bonat and Hjelmborg (2020). The core fit function mglm4twin is employed for fitting a set of models. In this introductory vignette we restrict ourselves to model mixed types of outcomes. We present a set of special specifications of the standard ACE for the mixed of continuous and bounded traits in the context of twin data. This vignette reproduces the data analysis presented in the Section 5.5 Dataset 5: Mixed types of traits data of Bonat and Hjelmborg (2020).

Install and loading the mglm4twin package

To install the stable version of mglm4twin, use devtools::install_git(). For more information, visit mglm4twin/README.

library(devtools)
install_git("bonatwagner/mglm4twin")

library(mglm4twin)
packageVersion("mglm4twin")
#> [1] '0.5.0'

Data set

The data set concerning sleep’s quality in a sample of 250 (135 DZ and 116 MZ) Danish twin pairs. The traits are cortisone levels when waking up (T0) and PSQI (Pittsburgh Sleep Quality Index). The first one is a continuous trait, while the second one is a scale varying from 0 to 21 (slower values better sleep’s quality). The goal of this example is to show how to deal with mixed types of traits in the context of twin data. It is a simulated data set based on the parameter estimates of the real data fit.

data(t0psqi)
head(t0psqi)
#>   Twin_pair Twin_id      Age Type Gender Group       T0       PSQI
#> 1         1       1 31.43789   DZ   Male    G1 2.111575 0.37432276
#> 2         2       1 33.94153   DZ   Male    G1 2.070715 0.25220080
#> 3         1       2 32.04488   DZ   Male    G1 3.160935 0.06494865
#> 4         2       2 34.41509   DZ   Male    G1 2.171129 0.04690910
#> 5         1       3 34.70234   DZ   Male    G1 2.923754 0.07152169
#> 6         2       3 30.22778   DZ   Male    G1 3.350422 0.06988779

Model specification

The model specification is done in three steps:

Specifying the linear predictor (mean model).
Specifying the matrix linear predictor (twin dependence struture).
Specifying link and variance function suitable to be response variable type.

Specyfing the linear predictor

In the t0psqi data set we have the covariates Age, Type, Gender and Group for composing the linear predictor for both outcomes. Thus, the linear predictors are given by

form_T0 <- T0 ~ Age + Gender + Group + Type*Twin_pair
form_PSQI <- PSQI ~ Age + Gender + Group + Type*Twin_pair

Specyfing the matrix linear predictor

The matrix linear predictor describes the twin dependence structure. The mglm4twin package provides a set of pre-specified covariance structures. However, the users can easily specify thein own covariance structure through a linear combination of known matrices. In this example, we are going to specify a set of special cases of the standard ACE models for twin data.

E <- mt_twin(N_DZ = 135, N_MZ = 116, n_resp = 2, model = "E")
AE <- mt_twin(N_DZ = 135, N_MZ = 116, n_resp = 2, model = "AE")
CE <- mt_twin(N_DZ = 135, N_MZ = 116, n_resp = 2, model = "CE")
ACE <- mt_twin(N_DZ = 135, N_MZ = 116, n_resp = 2, model = "ACE")

The function mt_twin() has four main arguments: N_DZ and N_MZ the number of dizygotic and monozygotic twin, respectively. n_resp number of response variables, in this case $1$ and the model a string given the model’s name. It is extreme important to note that the data set should be ordered in a way that all DZ twin appear first and then all MZ twin and the twin pair should appear together, i.e. twin 2 after twin 1.

Model fitting

The main fitting function in the mglm4twin package is the mglm4twin() function. it was designed to be similar to standard model fitting functions in R such as lm() and glm(). In this example, we are going to fit Gaussian bivariate models and a more felxible model where we use a Tweedie model for the T0 trait and a flexible quasi-beta model for the PSQI index. We opted to do this, because Gaussian models are a frequently choice for this types of data.

## Gaussian model (link = 'identity' and variance = 'constant')
Gauss_E <- mglm4twin(linear_pred = c(form_T0,  form_PSQI), 
                     matrix_pred = E, data = t0psqi)
#> Automatic initial values selected.
Gauss_AE <- mglm4twin(linear_pred = c(form_T0,  form_PSQI), 
                      matrix_pred = AE, data = t0psqi)
#> Automatic initial values selected.
Gauss_CE <- mglm4twin(linear_pred = c(form_T0,  form_PSQI), 
                      matrix_pred = CE, data = t0psqi)
#> Automatic initial values selected.
Gauss_ACE <- mglm4twin(linear_pred = c(form_T0,  form_PSQI), 
                       matrix_pred = ACE, data = t0psqi)
#> Automatic initial values selected.

## Mixed types
fit_E <- mglm4twin(linear_pred = c(form_T0,  form_PSQI), 
                   matrix_pred = E, 
                   link = c("log","logit"), 
                   variance = c("tweedie","binomialP"), 
                   control_algorithm = list(tuning = 0.25, max_iter = 100),
                   power_fixed = c(F,F), data = t0psqi)
#> Warning in eval(family$initialize): non-integer counts in a binomial glm!
#> Automatic initial values selected.
fit_AE <- mglm4twin(linear_pred = c(form_T0,  form_PSQI), 
                    matrix_pred = AE, 
                    link = c("log","logit"), 
                    variance = c("tweedie","binomialP"), 
                    control_algorithm = list(tuning = 0.25, max_iter = 100),
                    power_fixed = c(F,F), data = t0psqi)
#> Warning in eval(family$initialize): non-integer counts in a binomial glm!
#> Automatic initial values selected.
fit_CE <- mglm4twin(linear_pred = c(form_T0,  form_PSQI), 
                    matrix_pred = CE, 
                    link = c("log","logit"), 
                    variance = c("tweedie","binomialP"), 
                    control_algorithm = list(tuning = 0.25, max_iter = 100),
                    power_fixed = c(F,F), data = t0psqi)
#> Warning in eval(family$initialize): non-integer counts in a binomial glm!
#> Automatic initial values selected.
fit_ACE <- mglm4twin(linear_pred = c(form_T0,  form_PSQI), 
                     matrix_pred = ACE, 
                     link = c("log","logit"), 
                     variance = c("tweedie","binomialP"), 
                     control_algorithm = list(tuning = 0.25, max_iter = 100),
                     power_fixed = c(F,F), data = t0psqi)
#> Warning in eval(family$initialize): non-integer counts in a binomial glm!
#> Automatic initial values selected.

Reproducing Table 6 of Bonat and Hjelmborg (2020)

Table6
#>   plogLik Df   pAIC     pBIC plogLik Df   pAIC     pBIC
#> 1 -107.28 19 252.56 345.8832 -101.56 21 245.12 348.2667
#> 2  -95.85 22 235.70 343.7584  -90.74 24 229.48 347.3619
#> 3  -97.32 22 238.64 346.6984  -92.29 24 232.58 350.4619
#> 4  -95.85 25 241.70 364.4937  -90.69 27 235.38 367.9972

Reproducing Table 7 of Bonat and Hjelmborg (2020)

Table 7 is easily done by collecting the values from the model’s summary.

summary(fit_AE, model = "AE", biometric = TRUE)
#> Call: T0 ~ Age + Gender + Group + Type * Twin_pair
#> 
#> Link function: log
#> Variance function: tweedie
#> Regression:
#>                     Estimates   std.error    z value     Pr(>|z|)
#> (Intercept)       0.778897145 0.117586314  6.6240459 3.494981e-11
#> Age               0.003539927 0.002779662  1.2735102 2.028371e-01
#> GenderMale        0.102732171 0.040540507  2.5340623 1.127487e-02
#> GroupG2           0.061926288 0.026258589  2.3583250 1.835761e-02
#> GroupG3           0.036307594 0.048482219  0.7488847 4.539267e-01
#> TypeMZ           -0.086137485 0.063969873 -1.3465321 1.781310e-01
#> Twin_pair        -0.014585782 0.020222398 -0.7212687 4.707442e-01
#> TypeMZ:Twin_pair  0.026471909 0.028540661  0.9275156 3.536589e-01
#> 
#> Power:
#>        Estimates std.error    z value  Pr(>|z|)
#> power1 -1.367794  1.568881 -0.8718273 0.3833026
#> 
#> Call: PSQI ~ Age + Gender + Group + Type * Twin_pair
#> 
#> Link function: logit
#> Variance function: binomialP
#> Regression:
#>                      Estimates  std.error      z value    Pr(>|z|)
#> (Intercept)      -1.8400750531 0.68590502 -2.682696597 0.007303121
#> Age               0.0025620667 0.01619334  0.158217263 0.874285601
#> GenderMale       -0.3353952228 0.24425786 -1.373119470 0.169715209
#> GroupG2           0.0004782828 0.16506552  0.002897533 0.997688106
#> GroupG3          -0.2768288186 0.29398587 -0.941639885 0.346377042
#> TypeMZ            0.1054520387 0.37320544  0.282557615 0.777515975
#> Twin_pair         0.3371176870 0.11642008  2.895700548 0.003783132
#> TypeMZ:Twin_pair -0.0675952114 0.16264667 -0.415595431 0.677706071
#> 
#> Power:
#>        Estimates std.error  z value   Pr(>|z|)
#> power2  1.521228 0.8703424 1.747851 0.08048989
#> 
#> Dispersion:
#> Genetic component:
#>       Estimates  std.error    z value  Pr(>|z|)
#> A1   0.18487228 0.29178559  0.6335895 0.5263488
#> A2   0.12816576 0.21864946  0.5861700 0.5577613
#> A12 -0.07807775 0.09566742 -0.8161373 0.4144216
#> 
#> Environment component:
#>      Estimates  std.error   z value  Pr(>|z|)
#> E1  0.67329427 1.05819224 0.6362684 0.5246015
#> E2  0.30057930 0.49018002 0.6132019 0.5397429
#> E12 0.02076126 0.04234167 0.4903270 0.6239026
#> 
#> Measures associated with genetic structure:
#> Genetic correlation:
#>        Estimates std.error   z value   Pr(>|z|)
#> rho12 -0.5072305 0.2369296 -2.140849 0.03228623
#> 
#> Heritability:
#>    Estimates  std.error  z value     Pr(>|z|)
#> h1 0.2154270 0.07759258 2.776387 5.496671e-03
#> h2 0.2989323 0.07659475 3.902779 9.509469e-05
#> 
#> Bivariate heritability:
#>     Estimates std.error  z value   Pr(>|z|)
#> h12  1.362221 0.7007826 1.943857 0.05191265
#> 
#> Measures associated with environment structure:
#> Environment correlation:
#>        Estimates  std.error   z value  Pr(>|z|)
#> rho12 0.04615001 0.07793605 0.5921523 0.5537486
#> 
#> Environmentality:
#>    Estimates  std.error   z value     Pr(>|z|)
#> e1 0.7845730 0.07759258 10.111443 4.915481e-24
#> e2 0.7010677 0.07659475  9.152947 5.540116e-20
#> 
#> Bivariate environmentality:
#>      Estimates std.error    z value Pr(>|z|)
#> e12 -0.3622215 0.7007826 -0.5168814 0.605239
#> 
#> Algorithm: chaser
#> Correction: FALSE
#> Number iterations: 38