Abstract
The mglm4twin
package implements multivariate
generalized linear models for twin and family data as proposed by Bonat
and Hjelmborg (2020). The core fit function mglm4twin
is
employed for fitting a set of models. In this introductory vignette we
restrict ourselves to model multivariate bounded outcomes. We present a
set of special specifications of the standard ACE for multivariate
bounded traits in the context of twin data. This vignette reproduces the
data analysis presented in the Section 5.6 Dataset 6: Multivariate
bounded data of Bonat and Hjelmborg (2020).
Install and loading the mglm4twin package
To install the stable version of mglm4twin
,
use devtools::install_git()
. For more information, visit mglm4twin/README.
library(devtools)
install_git("bonatwagner/mglm4twin")
library(mglm4twin)
packageVersion("mglm4twin")
#> [1] '0.5.0'
Data set
In this example, we simulate a dataset as similar as possible to a
real dataset concernings three bounded outcomes reflecting mental
conditions. The data set consists of 828 (524 DZ and 274 MZ) twin pairs.
The outcomes are obtained based on questionaries and includes the Mini
Mental State Examination (MMSE) taking values on the interval
.
The second outcome corresponds to depression symptomatology which was
evaluated based on an adaptation of the depression section of the
Cambridge Mental Disorders of the Elderly Examination (CAMDEX). The
depression scale used here is a composite of responses to 17 depression
items and taking integer values in the interval
and is highly left-skewed in distribution in any population-based
sample. The third outcome is a Social Activity scale based on six items
that assess the frequency with which the individual is engaged with
others and mental pursuits. We opted to simulate such a data set because
we can show how to simulate the data and make it available for the
users. For simulating the data set, we are going to use the
SimCorMultRes
package.
## Loading extra packages
require(Matrix)
#> Loading required package: Matrix
require(SimCorMultRes)
#> Loading required package: SimCorMultRes
require(mglm4twin)
## Setting model parameters
E <- c(0.75, 0.7, 0.65, -0.3, 0.25, -0.4)
A <- c(0.25,0.3, 0.35, -0.15, 0.20, -0.2)
C <- rep(0, 6)
tau = c(E, A, C)
## Twin structure
DZ = mt_twin(N_DZ = 1, N_MZ = 0, n_resp = 3, model = "ACE")
MZ = mt_twin(N_DZ = 0, N_MZ = 1, n_resp = 3, model = "ACE")
Omega_DZ <- as.matrix(mt_matrix_linear_predictor(tau = tau, Z = DZ))
Omega_MZ <- as.matrix(mt_matrix_linear_predictor(tau = tau, Z = MZ))
## Regression structures
sex <- c(rep("Male", 220/2), rep("Female", 334/2),
rep("Male", 116/2), rep("Female", 158/2))
zyg <- c(rep("DZ", 554/2), rep("MZ", 274/2))
set.seed(123)
Age <- rbeta(414, shape1 = 0.3*2, shape2 = 0.7*2)*20 + 70
age_std <- (Age - mean(Age))/var(Age)
X <- model.matrix(~ sex + age_std)
beta1 <- c(1.6956, 0.0584, -0.2576)
mu1 <- exp(X%*%beta1)/(1+exp(X%*%beta1))
beta2 <- c(-1.7930, 0.0875, 0.2382)
mu2 <- exp(X%*%beta2)/(1+exp(X%*%beta2))
beta3 <- c(-0.5363, -0.05138, -0.1528)
mu3 <- exp(X%*%beta3)/(1+exp(X%*%beta3))
## Marginal distributions
dist <- c("qbeta","qbeta","qbeta")
invcdfnames <- rep(dist, each = 2)
## Simulating data set
Y1_MZ <- list()
Y2_MZ <- list()
Y3_MZ <- list()
Y1_DZ <- list()
Y2_DZ <- list()
Y3_DZ <- list()
set.seed(181185)
phi = 5
for(i in 1:137) {
paramslists <- list(
m1 = list(shape1 = mu1[i]*phi, shape2 = (1-mu1[i])*phi),
m1 = list(shape1 = mu1[i]*phi, shape2 = (1-mu1[i])*phi),
m2 = list(shape1 = mu2[i]*phi, shape2 = (1-mu2[i])*phi),
m2 = list(shape1 = mu2[i]*phi, shape2 = (1-mu2[i])*phi),
m3 = list(shape1 = mu3[i]*phi, shape2 = (1-mu3[i])*phi),
m3 = list(shape1 = mu3[i]*phi, shape2 = (1-mu3[i])*phi))
Y_MZ <- rnorta(R = 1, cor.matrix = Omega_MZ,
distr = invcdfnames, qparameters = paramslists)
Y1_MZ[[i]] <- Y_MZ[1:2]
Y2_MZ[[i]] <- Y_MZ[3:4]
Y3_MZ[[i]] <- Y_MZ[5:6]
}
set.seed(181185)
for(i in 138:414) {
paramslists <- list(
m1 = list(shape1 = mu1[i]*phi, shape2 = (1-mu1[i])*phi),
m1 = list(shape1 = mu1[i]*phi, shape2 = (1-mu1[i])*phi),
m2 = list(shape1 = mu2[i]*phi, shape2 = (1-mu2[i])*phi),
m2 = list(shape1 = mu2[i]*phi, shape2 = (1-mu2[i])*phi),
m3 = list(shape1 = mu3[i]*phi, shape2 = (1-mu3[i])*phi),
m3 = list(shape1 = mu3[i]*phi, shape2 = (1-mu3[i])*phi))
Y_DZ <- rnorta(R = 1, cor.matrix = Omega_DZ,
distr = invcdfnames, qparameters = paramslists)
Y1_DZ[[i]] <- Y_DZ[1:2]
Y2_DZ[[i]] <- Y_DZ[3:4]
Y3_DZ[[i]] <- Y_DZ[5:6]
}
Y1 <- c(do.call(c, Y1_DZ), do.call(c, Y1_MZ))
Y2 <- c(do.call(c, Y2_DZ), do.call(c, Y2_MZ))
Y3 <- c(do.call(c, Y3_DZ), do.call(c, Y3_MZ))
data <- data.frame("Y1" = Y1, "Y2" = Y2, "Y3" = Y3, "twin_id" = rep(1:2, 414),
"zyg" = rep(zyg, each = 2), "sex" = rep(sex, each = 2),
"age_std" = rep(age_std, each = 2))
Model specification
The model specification is done in three steps:
- Specifying the linear predictor (mean model).
- Specifying the matrix linear predictor (twin dependence struture).
- Specifying link and variance function suitable to be response variable type.
Specyfing the linear predictor
In the example we have the covariates sex and age for composing the linear predictor. Thus, the linear predictor is given by
Specyfing the matrix linear predictor
The matrix linear predictor describes the twin dependence structure.
The mglm4twin
package provides a set of pre-specified
covariance structures. However, the users can easily specify thein own
covariance structure through a linear combination of known matrices. In
this example, we are going to specify a set of special cases of the
standard ACE models for twin data.
ACE = mt_twin(N_DZ = 554/2, N_MZ = 274/2, n_resp = 3, model = "ACE")
AE = mt_twin(N_DZ = 554/2, N_MZ = 274/2, n_resp = 3, model = "AE")
CE = mt_twin(N_DZ = 554/2, N_MZ = 274/2, n_resp = 3, model = "CE")
E = mt_twin(N_DZ = 554/2, N_MZ = 274/2, n_resp = 3, model = "E")
The function mt_twin()
has four main arguments:
N_DZ
and N_MZ
the number of dizygotic and
monozygotic twin, respectively. n_resp
number of response
variables, in this case just
and the model
a string given the model’s name. It is
extreme important to note that the data set should be
ordered in a way that all DZ twin appear first and then all MZ twin and
the twin pair should appear together, i.e. twin 2 after twin 1.
Specyfing link and variance functions
The specificantion of the link and variance functions depends on the
response variable type. In the case of the bounded data the
mglm4twin
package offers the logit
,
probit
, cauchit
, cloglog
and
loglog
link functions. For the variance function the
binomialP
represents the standard binomial variance
function,
i.e.
which is suitable for bounded outcomes. In this example, we follow Bonat
and Hjelmborg (2020) and use the logit
link function and
the binomialP
variance function.
Model fitting
The main fitting function in the mglm4twin
package is
the mglm4twin()
function. it was designed to be similar to
standard model fitting functions in R
such as
lm()
and glm()
.
fit_ACE <- mglm4twin(linear_pred = c(form_Y1, form_Y2, form_Y3),
matrix_pred = ACE, link = link,
variance = variance,
data = data)
#> Automatic initial values selected.
fit_AE <- mglm4twin(linear_pred = c(form_Y1, form_Y2, form_Y3),
matrix_pred = AE, link = link,
variance = variance,
data = data)
#> Automatic initial values selected.
fit_CE <- mglm4twin(linear_pred = c(form_Y1, form_Y2, form_Y3),
matrix_pred = CE, link = link,
variance = variance,
data = data)
#> Automatic initial values selected.
fit_E <- mglm4twin(linear_pred = c(form_Y1, form_Y2, form_Y3),
matrix_pred = E, link = link,
variance = variance,
data = data)
#> Automatic initial values selected.
Reproducing Table 9 of Bonat and Hjelmborg (2020)
Table 9 is done by collecting the results from the model’s summary.
summary(fit_AE, model = "AE", biometric = TRUE)
#> Call: Y1 ~ sex + age_std
#>
#> Link function: logit
#> Variance function: binomialP
#> Regression:
#> Estimates std.error z value Pr(>|z|)
#> (Intercept) 1.74420183 0.05291741 32.9608287 2.959761e-238
#> sexMale -0.06859125 0.08198530 -0.8366286 4.028013e-01
#> age_std -0.24957571 0.21050178 -1.1856228 2.357713e-01
#>
#> Call: Y2 ~ sex + age_std
#>
#> Link function: logit
#> Variance function: binomialP
#> Regression:
#> Estimates std.error z value Pr(>|z|)
#> (Intercept) -1.81642618 0.05590683 -32.4902391 1.464717e-231
#> sexMale 0.06108591 0.08667232 0.7047914 4.809401e-01
#> age_std 0.45342969 0.22054316 2.0559681 3.978559e-02
#>
#> Call: Y3 ~ sex + age_std
#>
#> Link function: logit
#> Variance function: binomialP
#> Regression:
#> Estimates std.error z value Pr(>|z|)
#> (Intercept) -0.54371923 0.03970218 -13.694945 1.088421e-42
#> sexMale -0.06694236 0.06272402 -1.067252 2.858579e-01
#> age_std -0.24058640 0.16243509 -1.481123 1.385737e-01
#>
#> Dispersion:
#> Genetic component:
#> Estimates std.error z value Pr(>|z|)
#> A1 0.027929924 0.011012562 2.5361876 0.011206667
#> A2 0.041948185 0.011088036 3.7831934 0.000154829
#> A3 0.029001292 0.011134358 2.6046668 0.009196365
#> A12 -0.015019172 0.008190355 -1.8337632 0.066689151
#> A13 0.007991468 0.008219853 0.9722154 0.330943424
#> A23 -0.013718879 0.008734612 -1.5706340 0.116267687
#>
#> Environment component:
#> Estimates std.error z value Pr(>|z|)
#> E1 0.12812751 0.014343170 8.932998 4.146175e-19
#> E2 0.11544285 0.012164565 9.490093 2.308273e-21
#> E3 0.13215173 0.011462188 11.529363 9.383491e-31
#> E12 -0.04475663 0.008457948 -5.291666 1.212074e-07
#> E13 0.04644685 0.008850462 5.247958 1.537946e-07
#> E23 -0.06735482 0.009130621 -7.376806 1.621321e-13
#>
#> Measures associated with genetic structure:
#> Genetic correlation:
#> Estimates std.error z value Pr(>|z|)
#> rho12 -0.4387878 0.1877430 -2.337172 0.01943023
#> rho13 0.2807909 0.2475721 1.134178 0.25671972
#> rho23 -0.3933266 0.1795943 -2.190084 0.02851811
#>
#> Heritability:
#> Estimates std.error z value Pr(>|z|)
#> h1 0.1789721 0.06973874 2.566322 1.027833e-02
#> h2 0.2665221 0.06729049 3.960769 7.470886e-05
#> h3 0.1799612 0.06729608 2.674171 7.491434e-03
#>
#> Bivariate heritability:
#> Estimates std.error z value Pr(>|z|)
#> h12 0.2512584 0.1313564 1.9127994 0.05577373
#> h13 0.1467986 0.1479034 0.9925303 0.32093892
#> h23 0.1692149 0.1046257 1.6173365 0.10580566
#>
#> Measures associated with environment structure:
#> Environment correlation:
#> Estimates std.error z value Pr(>|z|)
#> rho12 -0.3680041 0.05815352 -6.328149 2.481196e-10
#> rho13 0.3569428 0.05709776 6.251433 4.067047e-10
#> rho23 -0.5453166 0.04741133 -11.501822 1.291602e-30
#>
#> Environmentality:
#> Estimates std.error z value Pr(>|z|)
#> e1 0.8210279 0.06973874 11.77291 5.383495e-32
#> e2 0.7334779 0.06729049 10.90017 1.150392e-27
#> e3 0.8200388 0.06729608 12.18553 3.712093e-34
#>
#> Bivariate environmentality:
#> Estimates std.error z value Pr(>|z|)
#> e12 0.7487416 0.1313564 5.700079 1.197522e-08
#> e13 0.8532014 0.1479034 5.768640 7.991358e-09
#> e23 0.8307851 0.1046257 7.940548 2.012905e-15
#>
#> Algorithm: chaser
#> Correction: FALSE
#> Number iterations: 6