Psychiatric disorder: Dichotomous data
Source:vignettes/Psychiatric_disorder.Rmd
Psychiatric_disorder.Rmd
Abstract
The mglm4twin
package implements multivariate
generalized linear models for twin and family data as proposed by Bonat
and Hjelmborg (2020). The core fit function mglm4twin
is
employed for fitting a set of models. In this introductory vignette we
restrict ourselves to model a binary trait. We present a set of special
specifications of the standard ACE for binary traits in the context of
twin data. This vignette reproduces the data analysis presented in the
Section 5.1 Dataset 1: Dichotomous data of Bonat and Hjelmborg
(2020).
Install and loading the mglm4twin package
To install the stable version of mglm4twin
,
use devtools::install_git()
. For more information, visit mglm4twin/README.
library(devtools)
install_git("bonatwagner/mglm4twin")
library(mglm4twin)
packageVersion("mglm4twin")
#> [1] '0.5.0'
Data set
The dataset concerns psychiatric disorders in 1030 (440 DZ and 590 MZ) Caucasian female twin-pairs sampled from the Virginia Twin Registry. Lifetime psychiatric illness is a binary trait and was diagnosed using an adapted version of the Structured Clinical Interview for DSM-II-R Diagnosis. The dataset was analysed by Neale and Maes (2004) and Rabe-Hesketh et al. (2008) through generalized linear mixed models. The main goal of this study is to measure the genetic influence on the binary trait.
Model specification
The model specification is done in three steps:
- Specifying the linear predictor (mean model).
- Specifying the matrix linear predictor (twin dependence struture).
- Specifying link and variance function suitable to be response variable type.
Specyfing the linear predictor
In the psydis data set we have no covariate for composing the linear predictor. Thus, the linear predictors are given by
linear_pred <- y ~ 1
Specyfing the matrix linear predictor
The matrix linear predictor describes the twin dependence structure.
The mglm4twin
package provides a set of pre-specified
covariance structures. However, the users can easily specify thein own
covariance structure through a linear combination of known matrices. In
this example, we are going to specify a set of special cases of the
standard ACE models for twin data.
ex1_E <- mt_twin(N_DZ = 440, N_MZ = 590, n_resp = 1, model = "E")
ex1_AE <- mt_twin(N_DZ = 440, N_MZ = 590, n_resp = 1, model = "AE")
ex1_CE <- mt_twin(N_DZ = 440, N_MZ = 590, n_resp = 1, model = "CE")
ex1_ACE <- mt_twin(N_DZ = 440, N_MZ = 590, n_resp = 1, model = "ACE")
The function mt_twin()
has four main arguments:
N_DZ
and N_MZ
the number of dizygotic and
monozygotic twin, respectively. n_resp
number of response
variables, in this case just
and the model
a string given the model’s name. It is
extreme important to note that the data set should be
ordered in a way that all DZ twin appear first and then all MZ twin and
the twin pair should appear together, i.e. twin 2 after twin 1.
Specyfing link and variance functions
The specificantion of the link and variance functions depends on the
response variable type. In the case of the dichotomous data the
mglm4twin
package offers the logit
,
probit
, cauchit
, cloglog
and
loglog
link functions. For the variance function the
binomialP
represents the standard binomial variance
function,
i.e.
which is suitable for binary outcomes. In this example, we follow Bonat
and Hjelmborg (2020) and use the logit
link function and
the binomialP
variance function.
link = "logit"
variance = "binomialP"
Model fitting
The main fitting function in the mglm4twin
package is
the mglm4twin()
function. it was designed to be similar to
standard model fitting functions in R
such as
lm()
and glm()
.
fit_ex1_E <- mglm4twin(linear_pred = c(linear_pred), matrix_pred = ex1_E,
link = c(link), variance = c(variance), data = psydis)
#> Automatic initial values selected.
fit_ex1_AE <- mglm4twin(linear_pred = c(linear_pred), matrix_pred = ex1_AE,
link = c(link), variance = c(variance), data = psydis)
#> Automatic initial values selected.
fit_ex1_CE <- mglm4twin(linear_pred = c(linear_pred), matrix_pred = ex1_CE,
link = c(link), variance = c(variance), data = psydis)
#> Automatic initial values selected.
fit_ex1_ACE <- mglm4twin(linear_pred = c(linear_pred), matrix_pred = ex1_ACE,
link = c(link), variance = c(variance), data = psydis)
#> Automatic initial values selected.
Model output
The mglm4twin
package offers a general
summary()
function that can be customized to show the
dispersion components (standard output) or measures of interest
(biometric = TRUE
) as genetic correlation and
heritability.
Standard output
summary(fit_ex1_ACE, model = "ACE") ## Standard output
#> Call: y ~ 1
#>
#> Link function: logit
#> Variance function: binomialP
#> Regression:
#> Estimates std.error z value Pr(>|z|)
#> (Intercept) -0.7749189 0.05194836 -14.9171 2.551227e-50
#>
#> Dispersion:
#> Genetic component:
#> Estimates std.error z value Pr(>|z|)
#> A1 0.3409539 0.1144371 2.9794 0.002888135
#>
#> Common environment component:
#> Estimates std.error z value Pr(>|z|)
#> C1 -0.05999503 0.09731633 -0.616495 0.5375679
#>
#> Environment component:
#> Estimates std.error z value Pr(>|z|)
#> E1 0.7173134 0.02692294 26.6432 2.145572e-156
#>
#> Algorithm: chaser
#> Correction: FALSE
#> Number iterations: 6
Biometric output
summary(fit_ex1_ACE, model = "ACE", biometric = TRUE) ## Standard output
#> Call: y ~ 1
#>
#> Link function: logit
#> Variance function: binomialP
#> Regression:
#> Estimates std.error z value Pr(>|z|)
#> (Intercept) -0.7749189 0.05194836 -14.9171 2.551227e-50
#>
#> Dispersion:
#> Genetic component:
#> Estimates std.error z value Pr(>|z|)
#> A1 0.3409539 0.1144371 2.9794 0.002888135
#>
#> Common environment component:
#> Estimates std.error z value Pr(>|z|)
#> C1 -0.05999503 0.09731633 -0.616495 0.5375679
#>
#> Environment component:
#> Estimates std.error z value Pr(>|z|)
#> E1 0.7173134 0.02692294 26.6432 2.145572e-156
#>
#> Measures associated with genetic structure:
#> Genetic correlation:
#> [1] "Univariate model no correlation available"
#>
#> Heritability:
#> Estimates std.error z value Pr(>|z|)
#> h1 0.341544 0.1128125 3.027537 0.002465553
#>
#> Bivariate heritability:
#> [1] "Bivariate heritability no available."
#>
#> Measures associated with common environment structure:
#> Common environment correlation:
#> [1] "Univariate model no correlation available"
#>
#> Common environment:
#> Estimates std.error z value Pr(>|z|)
#> c1 -0.06009886 0.09741751 -0.6169205 0.5372872
#>
#> Bivariate common environment:
#> [1] "Bivariate common environmentality no available."
#>
#> Measures associated with environment structure:
#> Environment correlation:
#> [1] "Univariate model no correlation available"
#>
#> Environmentality:
#> Estimates std.error z value Pr(>|z|)
#> e1 0.7185549 0.03445171 20.85687 1.320389e-96
#>
#> Bivariate environmentality:
#> [1] "Bivariate environmetality no available."
#>
#> Algorithm: chaser
#> Correction: FALSE
#> Number iterations: 6
Reproducing Table 1 of Bonat and Hjelmborg (2020)
Table1
#> Estimate Std_Error Estimate Std_Error Estimate Std_Error Estimate Std_Error
#> E 1.00 0.02 0.73 0.03 0.80 0.02 0.72 0.03
#> A NA NA 0.27 0.04 0.00 0.00 0.34 0.11
#> C NA NA NA NA 0.20 0.03 -0.06 0.10
#> h2 NA NA 0.27 0.03 NA NA 0.34 0.11
#> ll -1340.99 2.00 -1315.58 3.00 -1319.61 3.00 -1315.39 4.00