Skip to contents

Abstract

The mglm4twin package implements multivariate generalized linear models for twin and family data as proposed by Bonat and Hjelmborg (2020). The core fit function mglm4twin is employed for fitting a set of models. In this introductory vignette we restrict ourselves to model a binary trait. We present a set of special specifications of the standard ACE for binary traits in the context of twin data. This vignette reproduces the data analysis presented in the Section 5.1 Dataset 1: Dichotomous data of Bonat and Hjelmborg (2020).

Install and loading the mglm4twin package

To install the stable version of mglm4twin, use devtools::install_git(). For more information, visit mglm4twin/README.

library(devtools)
install_git("bonatwagner/mglm4twin")
library(mglm4twin)
packageVersion("mglm4twin")
#> [1] '0.5.0'

Data set

The dataset concerns psychiatric disorders in 1030 (440 DZ and 590 MZ) Caucasian female twin-pairs sampled from the Virginia Twin Registry. Lifetime psychiatric illness is a binary trait and was diagnosed using an adapted version of the Structured Clinical Interview for DSM-II-R Diagnosis. The dataset was analysed by Neale and Maes (2004) and Rabe-Hesketh et al. (2008) through generalized linear mixed models. The main goal of this study is to measure the genetic influence on the binary trait.

data(psydis)
head(psydis)
#>   y Group Twin Twin_pair
#> 1 0    DZ  591         1
#> 2 0    DZ  591         2
#> 3 0    DZ  592         1
#> 4 0    DZ  592         2
#> 5 0    DZ  593         1
#> 6 0    DZ  593         2

Model specification

The model specification is done in three steps:

  1. Specifying the linear predictor (mean model).
  2. Specifying the matrix linear predictor (twin dependence struture).
  3. Specifying link and variance function suitable to be response variable type.

Specyfing the linear predictor

In the psydis data set we have no covariate for composing the linear predictor. Thus, the linear predictors are given by

linear_pred <- y ~ 1

Specyfing the matrix linear predictor

The matrix linear predictor describes the twin dependence structure. The mglm4twin package provides a set of pre-specified covariance structures. However, the users can easily specify thein own covariance structure through a linear combination of known matrices. In this example, we are going to specify a set of special cases of the standard ACE models for twin data.

ex1_E <- mt_twin(N_DZ = 440, N_MZ = 590, n_resp = 1, model = "E")
ex1_AE <- mt_twin(N_DZ = 440, N_MZ = 590, n_resp = 1, model = "AE")
ex1_CE <- mt_twin(N_DZ = 440, N_MZ = 590, n_resp = 1, model = "CE")
ex1_ACE <- mt_twin(N_DZ = 440, N_MZ = 590, n_resp = 1, model = "ACE")

The function mt_twin() has four main arguments: N_DZ and N_MZ the number of dizygotic and monozygotic twin, respectively. n_resp number of response variables, in this case just 11 and the model a string given the model’s name. It is extreme important to note that the data set should be ordered in a way that all DZ twin appear first and then all MZ twin and the twin pair should appear together, i.e. twin 2 after twin 1.

The specificantion of the link and variance functions depends on the response variable type. In the case of the dichotomous data the mglm4twinpackage offers the logit, probit, cauchit, cloglog and loglog link functions. For the variance function the binomialP represents the standard binomial variance function, i.e. μ(1μ)\mu(1-\mu) which is suitable for binary outcomes. In this example, we follow Bonat and Hjelmborg (2020) and use the logit link function and the binomialP variance function.

link = "logit"
variance = "binomialP"

Model fitting

The main fitting function in the mglm4twin package is the mglm4twin() function. it was designed to be similar to standard model fitting functions in R such as lm() and glm().

fit_ex1_E <- mglm4twin(linear_pred = c(linear_pred), matrix_pred = ex1_E, 
                       link = c(link), variance = c(variance), data = psydis)
#> Automatic initial values selected.
fit_ex1_AE <- mglm4twin(linear_pred = c(linear_pred), matrix_pred = ex1_AE, 
                        link = c(link), variance = c(variance), data = psydis)
#> Automatic initial values selected.
fit_ex1_CE <- mglm4twin(linear_pred = c(linear_pred), matrix_pred = ex1_CE, 
                        link = c(link), variance = c(variance), data = psydis)
#> Automatic initial values selected.
fit_ex1_ACE <- mglm4twin(linear_pred = c(linear_pred), matrix_pred = ex1_ACE, 
                         link = c(link), variance = c(variance), data = psydis)
#> Automatic initial values selected.

Model output

The mglm4twin package offers a general summary() function that can be customized to show the dispersion components (standard output) or measures of interest (biometric = TRUE) as genetic correlation and heritability.

Standard output

summary(fit_ex1_ACE, model = "ACE") ## Standard output
#> Call: y ~ 1
#> 
#> Link function: logit
#> Variance function: binomialP
#> Regression:
#>              Estimates  std.error  z value     Pr(>|z|)
#> (Intercept) -0.7749189 0.05194836 -14.9171 2.551227e-50
#> 
#> Dispersion:
#> Genetic component:
#>    Estimates std.error z value    Pr(>|z|)
#> A1 0.3409539 0.1144371  2.9794 0.002888135
#> 
#> Common environment component:
#>      Estimates  std.error   z value  Pr(>|z|)
#> C1 -0.05999503 0.09731633 -0.616495 0.5375679
#> 
#> Environment component:
#>    Estimates  std.error z value      Pr(>|z|)
#> E1 0.7173134 0.02692294 26.6432 2.145572e-156
#> 
#> Algorithm: chaser
#> Correction: FALSE
#> Number iterations: 6

Biometric output

summary(fit_ex1_ACE, model = "ACE", biometric = TRUE) ## Standard output
#> Call: y ~ 1
#> 
#> Link function: logit
#> Variance function: binomialP
#> Regression:
#>              Estimates  std.error  z value     Pr(>|z|)
#> (Intercept) -0.7749189 0.05194836 -14.9171 2.551227e-50
#> 
#> Dispersion:
#> Genetic component:
#>    Estimates std.error z value    Pr(>|z|)
#> A1 0.3409539 0.1144371  2.9794 0.002888135
#> 
#> Common environment component:
#>      Estimates  std.error   z value  Pr(>|z|)
#> C1 -0.05999503 0.09731633 -0.616495 0.5375679
#> 
#> Environment component:
#>    Estimates  std.error z value      Pr(>|z|)
#> E1 0.7173134 0.02692294 26.6432 2.145572e-156
#> 
#> Measures associated with genetic structure:
#> Genetic correlation:
#> [1] "Univariate model no correlation available"
#> 
#> Heritability:
#>    Estimates std.error  z value    Pr(>|z|)
#> h1  0.341544 0.1128125 3.027537 0.002465553
#> 
#> Bivariate heritability:
#> [1] "Bivariate heritability no available."
#> 
#> Measures associated with common environment structure:
#> Common environment correlation:
#> [1] "Univariate model no correlation available"
#> 
#> Common environment:
#>      Estimates  std.error    z value  Pr(>|z|)
#> c1 -0.06009886 0.09741751 -0.6169205 0.5372872
#> 
#> Bivariate common environment:
#> [1] "Bivariate common environmentality no available."
#> 
#> Measures associated with environment structure:
#> Environment correlation:
#> [1] "Univariate model no correlation available"
#> 
#> Environmentality:
#>    Estimates  std.error  z value     Pr(>|z|)
#> e1 0.7185549 0.03445171 20.85687 1.320389e-96
#> 
#> Bivariate environmentality:
#> [1] "Bivariate environmetality no available."
#> 
#> Algorithm: chaser
#> Correction: FALSE
#> Number iterations: 6

Reproducing Table 1 of Bonat and Hjelmborg (2020)

Table1
#>    Estimate Std_Error Estimate Std_Error Estimate Std_Error Estimate Std_Error
#> E      1.00      0.02     0.73      0.03     0.80      0.02     0.72      0.03
#> A        NA        NA     0.27      0.04     0.00      0.00     0.34      0.11
#> C        NA        NA       NA        NA     0.20      0.03    -0.06      0.10
#> h2       NA        NA     0.27      0.03       NA        NA     0.34      0.11
#> ll -1340.99      2.00 -1315.58      3.00 -1319.61      3.00 -1315.39      4.00