Abstract
The mglm4twin
package implements multivariate
generalized linear models for twin and family data as proposed by Bonat
and Hjelmborg (2020). The core fit function mglm4twin
is
employed for fitting a set of models. In this introductory vignette we
restrict ourselves to model mixed types of outcomes. We present a set of
special specifications of the standard ACE for the mixed of continuous
and bounded traits in the context of twin data. This vignette reproduces
the data analysis presented in the Section 5.5 Dataset 5: Mixed types of
traits data of Bonat and Hjelmborg (2020).
Install and loading the mglm4twin package
To install the stable version of mglm4twin
,
use devtools::install_git()
. For more information, visit mglm4twin/README.
library(devtools)
install_git("bonatwagner/mglm4twin")
library(mglm4twin)
packageVersion("mglm4twin")
#> [1] '0.5.0'
Data set
The data set concerning sleep’s quality in a sample of 250 (135 DZ and 116 MZ) Danish twin pairs. The traits are cortisone levels when waking up (T0) and PSQI (Pittsburgh Sleep Quality Index). The first one is a continuous trait, while the second one is a scale varying from 0 to 21 (slower values better sleep’s quality). The goal of this example is to show how to deal with mixed types of traits in the context of twin data. It is a simulated data set based on the parameter estimates of the real data fit.
data(t0psqi)
head(t0psqi)
#> Twin_pair Twin_id Age Type Gender Group T0 PSQI
#> 1 1 1 31.43789 DZ Male G1 2.111575 0.37432276
#> 2 2 1 33.94153 DZ Male G1 2.070715 0.25220080
#> 3 1 2 32.04488 DZ Male G1 3.160935 0.06494865
#> 4 2 2 34.41509 DZ Male G1 2.171129 0.04690910
#> 5 1 3 34.70234 DZ Male G1 2.923754 0.07152169
#> 6 2 3 30.22778 DZ Male G1 3.350422 0.06988779
Model specification
The model specification is done in three steps:
- Specifying the linear predictor (mean model).
- Specifying the matrix linear predictor (twin dependence struture).
- Specifying link and variance function suitable to be response variable type.
Specyfing the linear predictor
In the t0psqi data set we have the covariates
Age
, Type
, Gender
and
Group
for composing the linear predictor for both outcomes.
Thus, the linear predictors are given by
form_T0 <- T0 ~ Age + Gender + Group + Type*Twin_pair
form_PSQI <- PSQI ~ Age + Gender + Group + Type*Twin_pair
Specyfing the matrix linear predictor
The matrix linear predictor describes the twin dependence structure.
The mglm4twin
package provides a set of pre-specified
covariance structures. However, the users can easily specify thein own
covariance structure through a linear combination of known matrices. In
this example, we are going to specify a set of special cases of the
standard ACE models for twin data.
E <- mt_twin(N_DZ = 135, N_MZ = 116, n_resp = 2, model = "E")
AE <- mt_twin(N_DZ = 135, N_MZ = 116, n_resp = 2, model = "AE")
CE <- mt_twin(N_DZ = 135, N_MZ = 116, n_resp = 2, model = "CE")
ACE <- mt_twin(N_DZ = 135, N_MZ = 116, n_resp = 2, model = "ACE")
The function mt_twin()
has four main arguments:
N_DZ
and N_MZ
the number of dizygotic and
monozygotic twin, respectively. n_resp
number of response
variables, in this case
and the model
a string given the model’s name. It is
extreme important to note that the data set should be
ordered in a way that all DZ twin appear first and then all MZ twin and
the twin pair should appear together, i.e. twin 2 after twin 1.
Model fitting
The main fitting function in the mglm4twin
package is
the mglm4twin()
function. it was designed to be similar to
standard model fitting functions in R
such as
lm()
and glm()
. In this example, we are going
to fit Gaussian bivariate models and a more felxible model where we use
a Tweedie model for the T0 trait and a flexible quasi-beta model for the
PSQI index. We opted to do this, because Gaussian models are a
frequently choice for this types of data.
## Gaussian model (link = 'identity' and variance = 'constant')
Gauss_E <- mglm4twin(linear_pred = c(form_T0, form_PSQI),
matrix_pred = E, data = t0psqi)
#> Automatic initial values selected.
Gauss_AE <- mglm4twin(linear_pred = c(form_T0, form_PSQI),
matrix_pred = AE, data = t0psqi)
#> Automatic initial values selected.
Gauss_CE <- mglm4twin(linear_pred = c(form_T0, form_PSQI),
matrix_pred = CE, data = t0psqi)
#> Automatic initial values selected.
Gauss_ACE <- mglm4twin(linear_pred = c(form_T0, form_PSQI),
matrix_pred = ACE, data = t0psqi)
#> Automatic initial values selected.
## Mixed types
fit_E <- mglm4twin(linear_pred = c(form_T0, form_PSQI),
matrix_pred = E,
link = c("log","logit"),
variance = c("tweedie","binomialP"),
control_algorithm = list(tuning = 0.25, max_iter = 100),
power_fixed = c(F,F), data = t0psqi)
#> Warning in eval(family$initialize): non-integer counts in a binomial glm!
#> Automatic initial values selected.
fit_AE <- mglm4twin(linear_pred = c(form_T0, form_PSQI),
matrix_pred = AE,
link = c("log","logit"),
variance = c("tweedie","binomialP"),
control_algorithm = list(tuning = 0.25, max_iter = 100),
power_fixed = c(F,F), data = t0psqi)
#> Warning in eval(family$initialize): non-integer counts in a binomial glm!
#> Automatic initial values selected.
fit_CE <- mglm4twin(linear_pred = c(form_T0, form_PSQI),
matrix_pred = CE,
link = c("log","logit"),
variance = c("tweedie","binomialP"),
control_algorithm = list(tuning = 0.25, max_iter = 100),
power_fixed = c(F,F), data = t0psqi)
#> Warning in eval(family$initialize): non-integer counts in a binomial glm!
#> Automatic initial values selected.
fit_ACE <- mglm4twin(linear_pred = c(form_T0, form_PSQI),
matrix_pred = ACE,
link = c("log","logit"),
variance = c("tweedie","binomialP"),
control_algorithm = list(tuning = 0.25, max_iter = 100),
power_fixed = c(F,F), data = t0psqi)
#> Warning in eval(family$initialize): non-integer counts in a binomial glm!
#> Automatic initial values selected.
Reproducing Table 6 of Bonat and Hjelmborg (2020)
Table6
#> plogLik Df pAIC pBIC plogLik Df pAIC pBIC
#> 1 -107.28 19 252.56 345.8832 -101.56 21 245.12 348.2667
#> 2 -95.85 22 235.70 343.7584 -90.74 24 229.48 347.3619
#> 3 -97.32 22 238.64 346.6984 -92.29 24 232.58 350.4619
#> 4 -95.85 25 241.70 364.4937 -90.69 27 235.38 367.9972
Reproducing Table 7 of Bonat and Hjelmborg (2020)
Table 7 is easily done by collecting the values from the model’s summary.
summary(fit_AE, model = "AE", biometric = TRUE)
#> Call: T0 ~ Age + Gender + Group + Type * Twin_pair
#>
#> Link function: log
#> Variance function: tweedie
#> Regression:
#> Estimates std.error z value Pr(>|z|)
#> (Intercept) 0.778897145 0.117586314 6.6240459 3.494981e-11
#> Age 0.003539927 0.002779662 1.2735102 2.028371e-01
#> GenderMale 0.102732171 0.040540507 2.5340623 1.127487e-02
#> GroupG2 0.061926288 0.026258589 2.3583250 1.835761e-02
#> GroupG3 0.036307594 0.048482219 0.7488847 4.539267e-01
#> TypeMZ -0.086137485 0.063969873 -1.3465321 1.781310e-01
#> Twin_pair -0.014585782 0.020222398 -0.7212687 4.707442e-01
#> TypeMZ:Twin_pair 0.026471909 0.028540661 0.9275156 3.536589e-01
#>
#> Power:
#> Estimates std.error z value Pr(>|z|)
#> power1 -1.367794 1.568881 -0.8718273 0.3833026
#>
#> Call: PSQI ~ Age + Gender + Group + Type * Twin_pair
#>
#> Link function: logit
#> Variance function: binomialP
#> Regression:
#> Estimates std.error z value Pr(>|z|)
#> (Intercept) -1.8400750531 0.68590502 -2.682696597 0.007303121
#> Age 0.0025620667 0.01619334 0.158217263 0.874285601
#> GenderMale -0.3353952228 0.24425786 -1.373119470 0.169715209
#> GroupG2 0.0004782828 0.16506552 0.002897533 0.997688106
#> GroupG3 -0.2768288186 0.29398587 -0.941639885 0.346377042
#> TypeMZ 0.1054520387 0.37320544 0.282557615 0.777515975
#> Twin_pair 0.3371176870 0.11642008 2.895700548 0.003783132
#> TypeMZ:Twin_pair -0.0675952114 0.16264667 -0.415595431 0.677706071
#>
#> Power:
#> Estimates std.error z value Pr(>|z|)
#> power2 1.521228 0.8703424 1.747851 0.08048989
#>
#> Dispersion:
#> Genetic component:
#> Estimates std.error z value Pr(>|z|)
#> A1 0.18487228 0.29178559 0.6335895 0.5263488
#> A2 0.12816576 0.21864946 0.5861700 0.5577613
#> A12 -0.07807775 0.09566742 -0.8161373 0.4144216
#>
#> Environment component:
#> Estimates std.error z value Pr(>|z|)
#> E1 0.67329427 1.05819224 0.6362684 0.5246015
#> E2 0.30057930 0.49018002 0.6132019 0.5397429
#> E12 0.02076126 0.04234167 0.4903270 0.6239026
#>
#> Measures associated with genetic structure:
#> Genetic correlation:
#> Estimates std.error z value Pr(>|z|)
#> rho12 -0.5072305 0.2369296 -2.140849 0.03228623
#>
#> Heritability:
#> Estimates std.error z value Pr(>|z|)
#> h1 0.2154270 0.07759258 2.776387 5.496671e-03
#> h2 0.2989323 0.07659475 3.902779 9.509469e-05
#>
#> Bivariate heritability:
#> Estimates std.error z value Pr(>|z|)
#> h12 1.362221 0.7007826 1.943857 0.05191265
#>
#> Measures associated with environment structure:
#> Environment correlation:
#> Estimates std.error z value Pr(>|z|)
#> rho12 0.04615001 0.07793605 0.5921523 0.5537486
#>
#> Environmentality:
#> Estimates std.error z value Pr(>|z|)
#> e1 0.7845730 0.07759258 10.111443 4.915481e-24
#> e2 0.7010677 0.07659475 9.152947 5.540116e-20
#>
#> Bivariate environmentality:
#> Estimates std.error z value Pr(>|z|)
#> e12 -0.3622215 0.7007826 -0.5168814 0.605239
#>
#> Algorithm: chaser
#> Correction: FALSE
#> Number iterations: 38