---
title: "Vignette for package *grmsem*"
author: Beate St Pourcain and Alexander Klassmann
date: "`r Sys.Date()`"
output:
bookdown::pdf_document2:
toc: yes
fig_caption: yes
number_sections: yes
bookdown::html_document2:
theme: default
highlight: textmate
number_sections: yes
toc: yes
toc_float: yes
collapsed: false
fontsize: 12 pt
bibliography: vignette.bib
csl: genetics.csl
urlcolor: blue
header-includes:
- \numberwithin{equation}{section}
vignette: |
\usepackage[utf8]{inputenc}
\usepackage{bm}
\usepackage{amsmath}
%\VignetteIndexEntry{Vignette for package *grmsem*}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
---
\newcommand{\Cov}{\mathrm{Cov}}
***
# Synopsis
The `grmsem` package is an open-source quantitative genetics tool that supports the modelling of genetic and residual covariance structures in samples of unrelated individuals with genome-wide genotyping information. `grmsem` allows fitting different models describing the underlying multivariate genetic architecture of quantitative traits, as captured by a genetic-relationship-matrix (GRM), using structural equation modelling (SEM) techniques and a maximum likelihood approach (GRM-SEM). Analogous to twin models, the `grmsem` package includes multiple models, such as a `Cholesky decomposition` model, an `Independent Pathway` model and the `Direct Symmetric` model, but also novel models such as a hybrid `Independent Pathway / Cholesky` model. A general form of these models can be automatically fitted. The user can adapt each model by changing the pre-fitted parameters. All estimates can be obtained in standardised form. Follow-up analyses include estimations of genetic and residual covariance and correlations, bivariate heritabilities, and factorial co-heritabilities and co-environmentalities. `grmsem` replaces the package `gsem`, presented in [@StPourcain2018], because an unrelated package with the same name had been released simultaneously.
# Download and installation
## Package
The calculations performed by the package `grmsem` are computationally demanding. Hence it is highly recommended to optimise the R software prior to installing the package. This can be done within a Linux environment using OpenBLAS (an optimised Basic Linear Algebra Subsystem library), ATLAS (Automatically Tuned Linear Algebra Software) or the Intel MKL (Math Kernel Library) to improve the performance of basic vector and matrix operations (see [here](https://csantill.github.io/RPerformanceWBLAS/) for further information). The package itself can be installed by standard commands, using either of the two options:
* `install.packages("grmsem")` (latest CRAN release)
* `devtools::install_git('https://gitlab.gwdg.de/beate.stpourcain/grmsem')` (development version)
`grmsem` should be run in parallel by setting the `cores` option of `grmsem.fit()`. Empirically, good performance was achieved with `cores=4`, while sharing memory across as many cores as possible. For this, the entire node should be blocked so that memory across all cores is available for the job (depending on the system ranging usually between 8-24 cores).
Run times and memory requirements for different examples are detailed below.
## Data sets
The `small` data set is already included in the package. All data sets used in this vignette can be downloaded [here](https://gitlab.gwdg.de/beate.stpourcain/grmsem_external).
# Input files
## Genetic relationship matrix
A Genetic Relationship Matrix (GRM) is a symmetric matrix with entries representing the (standardized) number of mutually shared alleles among individuals of a sample. A GRM, consisting of pairs of unrelated individuals (relatedness cut-off $\leq$ 0.05) with genome-wide information, can be estimated using [PLINK](https://www.cog-genomics.org/plink2) or [GCTA](https://cnsgenomics.com/software/gcta/#Overview) software, of which the lower triangle elements are saved in two different forms:
* **grm.gz files:**
The lower triangle GRM elements are generated and saved with the [GCTA](https://cnsgenomics.com/software/gcta/#Overview) `--make-grm-gz` command. `grm.gz` files have no header and contain four columns: indices of pairs of individuals (column 1,2; corresponding to row numbers of grm.id files), number of non-missing SNPs (column 3) and the estimate of genetic relatedness (column 4). The compressed GRM file (`grm.gz`) can be imported using the `grm.input()` function, specifying the file name in the working directory, and will be returned as symmetric matrix. An example is shown below:
```{r eval = FALSE}
> G <- grm.input("large.gcta.grm.gz")
> G[1:3,1:3] #Relationships among the first three individuals
[,1] [,2] [,3]
[1,] 0.99354762 0.02328514 0.01644197
[2,] 0.02328514 0.99406837 0.01021175
[3,] 0.01644197 0.01021175 1.02751472
```
* **grm-bin files:**
The lower triangle GRM elements are saved in binary form using the [GCTA](https://cnsgenomics.com/software/gcta/#Overview) `--make-grm-bin` command. The binary `grm.bin` file can be imported using the `grm.bin.input()` function, specifying the file name in the working directory, and will be returned as symmetric matrix.
## Phenotype file
**Z-standardised** scores are required in form of tables (data frames) with each column representing a different phenotype. The number of columns determines the number of phenotypes **k** in the model. The observations must be **in the same order** as the individuals in the columns/rows of the GRM matrix. This order is shown, for example, in the [GCTA](https://cnsgenomics.com/software/gcta/#Overview) .id file (2-column file: family ID, individual ID). An example of a quad-variate phenotype is shown below:
```{r eval = FALSE}
> load("ph.large.RData")
> ph.large[1:2,]
Y1 Y2 Y3 Y4
[1,] -0.7640819 -0.6016908 -0.3981901 -0.3169821
[2,] -0.5099606 0.6671311 -1.3119328 -0.5601261
```
# GREML AE models
Genome-based restricted maximum likelihood (GREML) approaches introduced by @Yang2010a partition the observed phenotypic variation between unrelated individuals into an additive genetic component (A), as captured by genetic markers on a genotyping array, and a remainder (E), analogous to AE twin research methodologies. GREML [@Yang2010a] assumes an underlying "infinitesimal model", a quantitative genetics construct, where each causal locus $i$ from a large number of $m$ causal loci contributes a small additive effect $u_i$ to a phenotype y, reflecting polygenicity. These effect sizes are assumed to follow a normal distribution with variance $\sigma_u^2$. A phenotypic value $y_j$ of an individual $j$ can be predicted by the sum over the (standardised) number of alleles $z_{ij}$ that the individual $j$ carries at locus $i$ multiplied by the (additive) genetic effect $u_i$ at locus $i$
\begin{equation}
y_j=\mu+\sum_{j=1}^m z_{ij}u_i+e_j\;,
(\#eq:lm)
\end{equation}
where, $\mu$ equals the population average that is commonly set to zero, $e_j$ is the residual for individual $j$ and $u_i$ the scaled (additive) effect of alleles at the causal locus $i$. Additive genetic effects $u_i$ in equation \ref{eq:lm} are treated as random effects, following a normal distribution $\mathbf{u}\sim N(0,\sigma_u^2)$, where $\sigma_u^2$ is the variance of genetic effects. The residual $e_j$ subsumes other (non-additive genetic, environmental and error) influences on the phenotype of individual $j$, and the residual effects in equation \ref{eq:lm}, too, are assumed to follow a normal distribution with $\mathbf{e}\sim N(0,\sigma^2_e)$. For $n$ individuals, equation \ref{eq:lm} can be re-written in vector notation as
\begin{equation}
\mathbf{y}=\mu+\mathbf{Z}\cdot \mathbf{u}+\mathbf{e}\;
(\#eq:lmmatrix)
\end{equation}
with expected variance
\begin{equation}
Var(\mathbf{y})=\mathbf{ZZ^\top}\sigma_u^2+\mathbf{I}\sigma_e^2\;
(\#eq:var)
\end{equation}
that is equivalent to $\sigma_p^2=\sigma_g^2+\sigma_e^2$, and reflects, thus, the assumed independence of additive genetic and residual effects. A change of scale by defining $\mathbf{G_c}=\frac{\mathbf{ZZ}^{\top}}{m}$ and $\sigma^2_g=m\sigma_u^2$ yields
\begin{equation}
Var(\mathbf{y})=\mathbf{G_c}\sigma_g^2+\mathbf{I}\sigma_e^2\;,
(\#eq:vai)
\end{equation}
where $\mathbf{G_c}$ is the genetic relationship matrix (GRM) between pairs of individuals at causal loci [@Yang2010a]. As $m$ is not known, $\mathbf{G_c}$ is approximated using $N$ genome-wide SNPs [@Yang2010a]. Thus, $z_{ij}$ is approximated by $w_{ij}$, the standardised number of alleles that an individual $j$ carries at locus $i$, assuming Hardy-Weinberg equilibrium:
\begin{equation}
w_{ij}=\frac{x_{ij}-2p_j}{\sqrt{2p_j(1-p_j)}}
(\#eq:wij)
\end{equation}
Here, $p_i$ is the sample frequency of the (arbitrary) reference allele at a locus $i$ and $x_{ij}$ denotes the number of alleles that individual $j$ carries at locus $i$. Hence, $\mathbf{G_c}$ is approximated by $\mathbf{G}$, a GRM constructed from all variants presented on a genome-wide genotyping chip [@Yang2010a], with $\mathbf{G}=\frac{\mathbf{WW}^{\top}}{N}$. Consequently, $\sigma_g^2$ is an estimate of the genetic variance captured by these variants.
@Yang2010a applied REML, a standard technique of linear mixed model regression, to estimate the parameter $\sigma_g^2$, using genomic-relatedness-based restricted maximum-likelihood (GREML). @Lee2012a extends the GREML approach by @Yang2010a to the bivariate case.
# Multivariate AE models
The package `grmsem`, presented in [@StPourcain2018] (referred there by its initial name `gsem`), allows
- fitting more than two phenotypic measures
- identifying genetic and residual covariance structures
- evaluating the model fit against saturated models (e.g. `Cholesky decomposition` or `Direct Symmetric` models)
`grmsem` uses a maximum likelihood approach and extends the GREML approach by @Yang2010a to a multivariate case for $k$ traits, setting fixed effect means to zero by use of **z-standardised** phenotypes. We define $Y=\begin{pmatrix}\mathbf{y_1},\mathbf{y_2}, ...,\mathbf{y_k}\end{pmatrix}$ as the $k\cdot n$ observed phenotypic vector of k z-standardised traits of $n$ individuals and denote by $\Sigma_V$ their expected variance $\Sigma_V=Var(Y)$.
For example, for a quad-variate **z-standardised** trait with $k=4$, a single individual $i$ has information on four phenotypes $y_{i1}$, $y_{i2}$, $y_{i3}$ and $y_{i4}$. Using vector notation, we have for a set of individuals
\begin{align*}
\mathbf{y_1}=\mathbf{W}\mathbf{u_1}+\mathbf{e_1}\\
\mathbf{y_2}=\mathbf{W}\mathbf{u_2}+\mathbf{e_2}\\
\mathbf{y_3}=\mathbf{W}\mathbf{u_3}+\mathbf{e_3}\\
\mathbf{y_4}=\mathbf{W}\mathbf{u_4}+\mathbf{e_4}
\end{align*}
which can be written as a single equation
\begin{equation}
\begin{pmatrix}\mathbf{y_1}\\\mathbf{y_2}\\\mathbf{y_3}\\\mathbf{y_4}\end{pmatrix}=\begin{pmatrix}\mathbf{W} & \mathbf{0} & \mathbf{0} & \mathbf{0}\\ \mathbf{0} & \mathbf{W} & \mathbf{0} & \mathbf{0}
\\ \mathbf{0} & \mathbf{0} & \mathbf{W} & \mathbf{0}\\ \mathbf{0} & \mathbf{0} & \mathbf{0} & \mathbf{W} \end{pmatrix}\begin{pmatrix}\mathbf{u_1} \\ \mathbf{u_2}\\ \mathbf{u_3} \\ \mathbf{u_4} \\\end{pmatrix}+\begin{pmatrix}\mathbf{e_1}\\\mathbf{e_2}\\\mathbf{e_3}\\\mathbf{e_4}\end{pmatrix}
(\#eq:ydef1)
\end{equation}
with
\begin{equation}
\begin{split}
\begin{pmatrix}\mathbf{u_1}\\\mathbf{u_2}\\\mathbf{u_3}\\\mathbf{u_4}\end{pmatrix}\sim N\left(\begin{pmatrix}\mathbf{0}\\\mathbf{0}\\\mathbf{0}\\\mathbf{0}\end{pmatrix},\begin{pmatrix}\sigma_{g1}^2&\sigma_{g1g2} &\sigma_{g1g3} &\sigma_{g1g4}\\\sigma_{g1g2}&\sigma_{g2}^2&\sigma_{g2g3} &\sigma_{g2g4} \\\sigma_{g1g3}&\sigma_{g2g3} &\sigma_{g3}^2&\sigma_{g3g4} \\\sigma_{g1g4}&\sigma_{g2g4} &\sigma_{g3g4} &\sigma_{g4}^2 \end{pmatrix}\otimes \mathbf{I}\right)\\
\begin{pmatrix}\mathbf{e_1}\\\mathbf{e_2} \\\mathbf{e_3} \\\mathbf{e_4}\end{pmatrix}\sim N\left(\begin{pmatrix}\mathbf{0}\\\mathbf{0}\\\mathbf{0}\\\mathbf{0}\end{pmatrix},\begin{pmatrix}\sigma_{e1}^2&\sigma_{e1e2} &\sigma_{e1e3} &\sigma_{e1e4}\\\sigma_{e1e2}&\sigma_{e2}^2&\sigma_{e2e3} &\sigma_{e2e4} \\\sigma_{e1e3}&\sigma_{e2e3} &\sigma_{e3}^2&\sigma_{e3e4} \\\sigma_{e1e4}&\sigma_{e2e4} &\sigma_{e3e4} &\sigma_{e4}^2 \end{pmatrix}\otimes \mathbf{I}\right)
\end{split}
(\#eq:norme)
\end{equation}
\noindent Assuming $\Cov(\mathbf{\sigma_{g}},\mathbf{\sigma_{e}})$= $\mathbf{0}$, we can calculate the model implied covariance of
\begin{equation}
\begin{aligned}
\Sigma_V &= \Sigma_A + \Sigma_E \\
&= \begin{pmatrix}\sigma^2_{g1} & \sigma_{g1g2} & \sigma_{g1g3} & \sigma_{g1g4} \\\sigma_{g1g2} & \sigma^2_{g2} & \sigma_{g2g3} & \sigma_{g2g4}
\\\sigma_{g1g3} & \sigma_{g2g3} & \sigma^2_{g3} & \sigma_{g3g4}
\\\sigma_{g1g4} & \sigma_{g2g4} & \sigma_{g3g4} & \sigma^2_{g4}
\end{pmatrix}\otimes \mathbf{G} +
\begin{pmatrix}\sigma^2_{e1} & \sigma_{e1e2} & \sigma_{e1e3} & \sigma_{e1e4} \\ \sigma_{e1e2} & \sigma^2_{e2} & \sigma_{e2e3} & \sigma_{e2e4}
\\\sigma_{e1e3} & \sigma_{e2e3} & \sigma^2_{e3} & \sigma_{e3e4}
\\\sigma_{e1e4} & \sigma_{e2e4} & \sigma_{e3e4} & \sigma^2_{e4}
\end{pmatrix}\otimes \mathbf{I}\;.
\end{aligned}
(\#eq:4-var)
\end{equation}
The symbol $\otimes$ denotes the *Kronecker product* of two matrices: $\begin{pmatrix}a & b\\c&d\end{pmatrix}\otimes M=\begin{pmatrix}aM & bM\\cM&dM\end{pmatrix}$.
Note that contrary to @Lee2012a, `grmsem` allows for the correlation of residual effects.
We describe the multivariate structure of $\Sigma_A$ and $\Sigma_E$ by adapting multivariate structural equation modelling methodologies for twin research, originally proposed by @Martin1977, to genomic approaches. Fitting structural models to GRMs, `grmsem` models genetic covariance, $\Sigma_A$, and residual covariance, $\Sigma_E$, by defining factor structures as specified by genetic and residual factor loadings in $\Lambda_A$ and $\Lambda_E$ matrices respectively.
\begin{equation}
\Sigma_V=\Lambda_A\mathbf{G}\Lambda_A^\top+\Lambda_E\mathbf{I}\Lambda_E^\top\;.
(\#eq:gsemvar)
\end{equation}
The user can select $\Lambda_A$ and $\Lambda_E$ according to pre-specified model structures, including the models
- Cholesky decomposition
- Independent Pathway (IP)
- Hybrid Independent Pathway / Cholesky (IPC)
by setting the `model` option of `grmsem.fit()` to `Cholesky`, `IP` or `IPC` respectively. In addition, the Cholesky model can be re-parametrised as
- Direct Symmetric (DS)
model, estimating genetic and residual covariances directly, using the `model` option `DS`. `grmsem` fits, like GREML, all available data to the model. Each model can be adapted by the user by setting free parameters and starting values. Although $k$, the number of different phenotypes is not restricted, in principle, computational demands will typically set a limit based on $k\cdot n\approx30,000$ for Cholesky decomposition models; models using fewer parameters can handle larger $k\cdot n$.
# Model descriptions
## Cholesky model
The Cholesky decomposition model is a fully parametrised descriptive model without any explicit boundaries on the structure of latent genetic and residual influences that can provide higly informative insight into the developmental architecture of quantitative traits. The Cholesky model can be fit to the data through decomposition of both the genetic variance and residual variance into as many factors A and E as there are observed phenotypes $k$ (\ref{eq:choleskycov}). However, model coefficients can only be interpreted, if there is an inherent longitudinal order of measured phenotypes that is reflected in the order of phenotypes entered in the model. The first phenotype should represent the earliest measured trait. In general, a Cholesky model has $k(k+1)/2$ genetic parameters $a_{11},a_{21},...,a_{kk}$ and $k(k+1)/2$ residual parameters $e_{11},e_{21},...,e_{kk}$.
An example of a quad-variate ($k=4$) Cholesky decomposition model is shown in Figure \@ref(fig:path-cholesky), with $\Lambda$ parameters
\begin{equation}
\Lambda_A=\begin{pmatrix}a_{11} & 0 & 0 & 0\\a_{21} &a_{22} & 0& 0\\ a_{31} &a_{32} &a_{33} & 0\\ a_{41} &a_{42} &a_{43} &a_{44}\end{pmatrix}\;,\quad \Lambda_E=\begin{pmatrix}e_{11} & 0 & 0 & 0\\e_{21} &e_{22} & 0& 0\\ e_{31} &e_{32} &e_{33} & 0\\ e_{41} &e_{42} &e_{43} &e_{44}\end{pmatrix}
(\#eq:choleskylambda)
\end{equation}
and expected covariance matrices
\begin{equation}
\begin{split}
\Sigma_A&=\begin{pmatrix}
a_{11}^2 & a_{11}a_{21} & a_{11}a_{31} & a_{11}a_{41}\\
a_{11}a_{21} & a_{21}^2+a_{22}^2 & a_{21}a_{31}+a_{22}a_{32} & a_{21}a_{41}+a_{22}a_{42}\\
a_{11}a_{31} & a_{21}a_{31}+a_{22}a_{32} & a_{31}^2+a_{32}^2 +a_{33}^2& a_{31}a_{41}+a_{32}a_{42}+a_{33}a_{43}\\
a_{11}a_{41} & a_{21}a_{41}+a_{22}a_{42} & a_{31}a_{41}+a_{32}a_{42}+a_{33}a_{43}& a_{41}^2+a_{42}^2+a_{43}^2 +a_{44}^2 \end{pmatrix}\otimes \mathbf{G}\\
\Sigma_E&=\begin{pmatrix}
e_{11}^2 & e_{11}e_{21} & e_{11}e_{31} & e_{11}e_{41}\\
e_{11}e_{21} & e_{21}^2+e_{22}^2 & e_{21}e_{31}+e_{22}e_{32} & e_{21}e_{41}+e_{22}e_{42}\\
e_{11}e_{31} & e_{21}e_{31}+e_{22}e_{32} & e_{31}^2+e_{32}^2 +e_{33}^2& e_{31}e_{41}+e_{32}e_{42}+e_{33}e_{43}\\
e_{11}e_{41} & e_{21}e_{41}+e_{22}e_{42} & e_{31}e_{41}+e_{32}e_{42}+e_{33}e_{43}& e_{41}^2+e_{42}^2+e_{43}^2 +e_{44}^2\\ \end{pmatrix}\otimes \mathbf{I}
\end{split}
(\#eq:choleskycov)
\end{equation}
```{r path-cholesky, echo=FALSE, out.width="65%", fig.align='center', fig.cap="Quad-variate Cholesky decomposition model. Observed phenotypes (P) are represented by squares and factors by circles. Single-headed arrows (paths) define relationships between variables. The variance of latent variables is constrained to unit variance; this is omitted from the path diagram to improve clarity."}
knitr::include_graphics("model_quadvariate_path_diagram_Cholesky.png")
```
## Independent Pathway model
The Independent Pathway model specifies a common latent genetic factor (A) and a common latent residual factor (E), in addition to trait-specific genetic influences (AS) and trait-specific residual influences (ES) (\ref{eq:ipcov}). In general, this model has $2k$ genetic parameters $a_{c1},...,a_{ck},a_{s1},...,a_{sk}$ and $2k$ residual parameters $e_{c1},...,e_{ck},e_{s1},...,e_{sk}$. An example of a quad-variate ($k=4$) Independent Pathway model is shown in Figure \@ref(fig:path-indep), with $\Lambda$ parameters
\begin{equation}
\Lambda_A=
\begin{pmatrix}a_{c1} & a_{s1} & 0& 0& 0\\a_{c2} & 0 & a_{s2}& 0& 0\\a_{c3} & 0 & 0 &a_{s3}& 0 \\ a_{c4} & 0 & 0 & 0 &a_{s4}\end{pmatrix}\;,\quad \Lambda_E=\begin{pmatrix}e_{c1} & e_{s1} & 0& 0& 0\\e_{c2} & 0 & e_{s2}& 0& 0\\e_{c3} & 0 & 0 &e_{s3}& 0 \\ e_{c4} & 0 & 0 & 0 &e_{s4}\end{pmatrix}
(\#eq:iplambda)
\end{equation}
and expected covariance matrices
\begin{equation}
\begin{split}
\Sigma_A&=\begin{pmatrix}
a_{c1}^2+a_{s1}^2 & a_{c1}a_{c2} & a_{c1}a_{c3} & a_{c1}a_{c4}\\
a_{c1}a_{c2} & a_{c2}^2+a_{s2}^2 & a_{c2}a_{c3} & a_{c2}a_{c4}\\
a_{c1}a_{c3} & a_{c2}a_{c3} & a_{c3}^2+a_{s3}^2 & a_{c3}a_{c4}\\
a_{c1}a_{c4} & a_{c2}a_{c4} & a_{c3}a_{c4} & a_{c4}^2+a_{s4}^2
\end{pmatrix}\otimes \mathbf{G}\\
\Sigma_E&=\begin{pmatrix}
e_{c1}^2+e_{s1}^2 & e_{c1}e_{c2} & e_{c1}e_{c3} & e_{c1}e_{c4}\\
e_{c1}e_{c2} & e_{c2}^2+e_{s2}^2 & e_{c2}e_{c3} & e_{c2}e_{c4}\\
e_{c1}e_{c3} & e_{c2}e_{c3} & e_{c3}^2+e_{s3}^2 & e_{c3}e_{c4}\\
e_{c1}e_{c4} & e_{c2}e_{c4} & e_{c3}e_{c4} & e_{c4}^2+e_{s4}^2
\end{pmatrix}\otimes \mathbf{I}
\end{split}
(\#eq:ipcov)
\end{equation}
```{r path-indep, echo = FALSE, out.width="65%", fig.align='center', fig.cap="Quad-variate Independent Pathway model. Observed phenotypes (P) are represented by squares and factors by circles. Single-headed arrows (paths) define relationships between variables. The variance of latent variables is constrained to unit variance; this is omitted from the path diagram to improve clarity."}
knitr::include_graphics("model_quadvariate_path_diagram_IP.png")
```
## IPC model
The hybrid Independent Pathway / Cholesky decomposition (IPC) model (\ref{eq:ipccov}) structures the genetic variance as an Independent Pathway model, consisting of a common factor (A) and measurement-specific influences (AS), while the residual variance is structured as a Cholesky decomposition model, where the number of factors (E) is the same as the number of observed variables ($k$). In general, this model has $2k$ genetic parameters $a_{c1},...,a_{ck},a_{s1},...,a_{sk}$ and $k(k+1)/2$ residual parameters $e_{11},e_{21},...,e_{kk}$. An example of a quad-variate ($k=4$) IPC model is shown in Figure \@ref(fig:path-ipc)), with $\Lambda$ parameters
\begin{equation}
\Lambda_A=
\begin{pmatrix}a_{c1} & a_{s1} & 0& 0& 0\\a_{c2} & 0 & a_{s2}& 0& 0\\a_{c3} & 0 & 0 &a_{s3}& 0 \\ a_{c4} & 0 & 0 & 0 &a_{s4}\end{pmatrix}\;,\quad \Lambda_E=\begin{pmatrix}e_{11} & 0 & 0 & 0\\e_{21} &e_{22} & 0& 0\\ e_{31} &e_{32} &e_{33} & 0\\ e_{41} &e_{42} &e_{43} &e_{44}\end{pmatrix}
(\#eq:ipclambda)
\end{equation}
and expected covariance matrices
\begin{equation}
\begin{split}
\Sigma_A&=\begin{pmatrix}
a_{c1}^2+a_{s1}^2 & a_{c1}a_{c2} & a_{c1}a_{c3} & a_{c1}a_{c4}\\
a_{c1}a_{c2} & a_{c2}^2+a_{s2}^2 & a_{c2}a_{c3} & a_{c2}a_{c4}\\
a_{c1}a_{c3} & a_{c2}a_{c3} & a_{c3}^2+a_{s3}^2 & a_{c3}a_{c4}\\
a_{c1}a_{c4} & a_{c2}a_{c4} & a_{c3}a_{c4} & a_{c4}^2+a_{s4}^2
\end{pmatrix}\otimes \mathbf{G}\\
\Sigma_E&=\begin{pmatrix}
e_{11}^2 & e_{11}e_{21} & e_{11}e_{31} & e_{11}e_{41}\\
e_{11}e_{21} & e_{21}^2+e_{22}^2 & e_{21}e_{31}+e_{22}e_{32} & e_{21}e_{41}+e_{22}e_{42}\\
e_{11}e_{31} & e_{21}e_{31}+e_{22}e_{32} & e_{31}^2+e_{32}^2 +e_{33}^2& e_{31}e_{41}+e_{32}e_{42}+e_{33}e_{43}\\
e_{11}e_{41} & e_{21}e_{41}+e_{22}e_{42} & e_{31}e_{41}+e_{32}e_{42}+e_{33}e_{43}& e_{41}^2+e_{42}^2+e_{43}^2 +e_{44}^2 \end{pmatrix}\otimes\mathbf{I}
\end{split}
(\#eq:ipccov)
\end{equation}
```{r path-ipc, echo = FALSE, out.width="65%", fig.align='center', fig.cap="Quad-variate IPC model. Observed phenotypes (P) are represented by squares and factors by circles. Single-headed arrows (paths) define relationships between variables. The variance of latent variables is constrained to unit variance; this is omitted from the path diagram to improve clarity."}
knitr::include_graphics("model_quadvariate_path_diagram_IPC.png")
```
## Direct Symmetric model
The Direct Symmetric model is a saturated variance component-based model, mathematically equivalent to the Cholesky decomposition model. It is a fully parametrised descriptive model and without any explicit or implicit boundaries, allowing for negative variance estimations [@Verhulst2019]. The model estimates genetic and residual variances and covariances, as outlined in equation \ref{eq:4-dsvar}, irrespective of the order of the phenotypes, but without providing insight into developmental genetic architectures, as captured by the Cholesky decomposition model. `grmsem` model comparisons between the Cholesky decomposition and the Direct Symmetric model have, so far, revealed no discernable differences in loglikelihood or estimated parameters (see below), when studying heritable traits. However, we advise the user to always fit a Direct Symmetric model to confirm unbiased likelihood estimations. In general, the Direct Symmetric model has $k(k+1)/2$ genetic parameters $A_{11},A_{21},...,A_{kk}$ and $k(k+1)/2$ residual parameters $E_{11},E_{21},...,E_{kk}$, where the indices of each variance component are interchangeable. For identification, the factor loadings are fixed to unity.
Example of a quad-variate Direct Symmetric model ($k=4$) with a model implied covariance of
\begin{equation}
\begin{aligned}
\Sigma_V &= \begin{pmatrix} A_{11} & A_{21} & A_{31} & A_{41} \\A_{21} & A_{22} & A_{32} & A_{42}
\\ A_{31} & A_{32} & A_{33} & A_{43}
\\ A_{41} & A_{42} & A_{43} & A_{44}
\end{pmatrix}\otimes \mathbf{G} +
\begin{pmatrix} E_{11} & E_{21} & E_{31} & E_{41} \\ E_{21} & E_{22} & E_{32} & E_{42}
\\ E_{31} & E_{32} & E_{33} & E_{43}
\\ E_{41} & E_{42} & E_{43} & E_{44}
\end{pmatrix}\otimes \mathbf{I}\;.
\end{aligned}
(\#eq:4-dsvar)
\end{equation}
# Estimation
From standard SEM theory it follows that the model parameter vector $\mathbf{\theta}$ of either $\Lambda_A$ and $\Lambda_E$ (Cholesky, IP, IPC models) or $\Sigma_A$ and $\Sigma_E$ (DS model), can be estimated by maximising the following log-likelihood function (omitting a constant term $- \frac{nk}{2} \log{2\pi}$) across n Gaussians for all n individuals:
\begin{equation}
l(\theta)=-\frac{n}{2}log|\Sigma_V|-\sum_{j=1}^{n} \frac{1}{2}Y^\top \Sigma_V^{-1}Y\;.
(\#eq:ll)
\end{equation}
From general ML theory it follows that the variance of the parameters can be estimated using the information matrix. For any parameter $\theta$ it holds
\begin{equation}
Var(\theta) =\left[-\left(\frac{\partial^2 l(\theta)}{\partial \theta^2}\right)\Bigg\rvert_{\theta=\hat\theta}\right]^{-1}\;.
(\#eq:llvar)
\end{equation}
The estimation of parameters and their SEs is performed by the function `grmsem.fit()`. Specifically, the loglikelihood is estimated with `stats::optim()` and the BFGS approach and $Var(\theta)$ with `numDeriv::hessian()`. Using the `grmsem.fit()` output, the function `grmsem.var()` returns the matrices $\Lambda_A\Lambda_A^\top$ and $\Lambda_E\Lambda_E^\top$ (Cholesky, IP, IPC models) which yield the phenotopic variance explained by the various factors. Bivariate simulations comparing OpenMx, GRM-SEM and GCTA GREML can be found in [@StPourcain2018], Supplementary Table S3. Otherwise, the `grmsem.fit()` output returns the directly estimated $\Sigma_A$ and $\Sigma_E$ matrices (DS model) and their SEs.
Different to twin-models, `grmsem` models allow for an estimation of genetic and residual variances and covariances (AE), as E in GRM-related methods for unrelated individuals may capture both shared and unique residual influences. `grmsem` estimates the statistical significance of estimated parameters directly, using a Wald test, assuming multivariate normality.
The returned `grmsem.fit` object can be used to estimate genetic and residual covariance and correlations (`grmsem.var` function), bivariate heritabilities (`grmsem.biher` function), and factorial co-heritabilities and co-environmentalities (`grmsem.fcoher` function). Listwise complete observations can be selected with the option `compl.ph` `TRUE`. Otherwise, `grmsem.fit` fits, like GREML, all available data to the model with the default option `compl.ph` `FALSE`.
# Examples
## Outline
Multivariate models with `grmsem` are time-consuming, especially with large numbers of observations per trait. To illustrate the functionality of `grmsem` follow-up analyses, we fitted two saturated models (`Cholesky decomposition` and `DS`) to two different data sets:
* A `small` simulated tri-variate toy data set that allows fitting the model on the fly.
* A `large` multivariate quad-variate data set with defined genetic architecture.
Simulated data sets used in the vignette can be downloaded [here](https://gitlab.gwdg.de/beate.stpourcain/grmsem_external).
Recent multivariate analysis of real data can be found in [@Verhoef2020] and [@Shapland2020].
## Tri-variate example
### Data simulation
For quick demonstration purposes, a small tri-variate toy data set was simulated, assuming an underlying Cholesky model with 100 observations per trait and low polygenicity (150 SNPs per genetic factor), and the following genetic (`lambda_A`) and residual (`lambda_E`) factor loadings:
\begin{equation}
\Lambda_A=\begin{pmatrix} \sqrt{0.3} & 0 & 0\\ \sqrt{0.2} & \sqrt{0.4} & 0\\ \sqrt{0.2} & \sqrt{0.2} & \sqrt{0.3}\end{pmatrix}\;,\quad \Lambda_E=\begin{pmatrix} \sqrt{0.7} & 0 & 0\\ \sqrt{0.1} & \sqrt{0.3} & 0\\ \sqrt{0.2} & \sqrt{0.1} & \sqrt{0.1}\end{pmatrix}
(\#eq:simsmall)
\end{equation}
\begin{equation}
\Sigma_A=\begin{pmatrix} 0.300 & 0.245 & 0.245\\ 0.245 & 0.600 & 0.483\\ 0.245 & 0.483 & 0.600\end{pmatrix}\;,\quad \Sigma_E=\begin{pmatrix} 0.700 & 0.265 & 0.374\\ 0.265 & 0.400 & 0.315\\ 0.374 & 0.315 & 0.400\end{pmatrix}
(\#eq:simsmallvar)
\end{equation}
The small GRM data file (`G.small.RData`) and the phenotype file (`ph.small.RData`) are included in the grmsem package, but can also be downloaded from [here](https://gitlab.gwdg.de/beate.stpourcain/grmsem_external).
Performance: Fitting a Cholesky decomposition model (6 genetic and 6 residual parameters) to the data using 4 cores, sharing memory across 24 cores, required ~11s using R MKL 3.6.3. Fitting the models under Windows10 with 1 core required ~15s using 3.6.3. A pre-fitted Cholesky output has been also saved to (`fit.small.RData`). For comparison, also the Direct Symmetric model has been fitted, which had a similar performance. Note that some output has been omitted for clarity.
### Cholesky model
Here, we fit a Cholesky decomposition model (6 genetic and 6 residual parameters) to the `small` simulated data set, on the fly, with the following model fit and estimated parameters:
```{r eval = FALSE}
#If files were downloaded externally, please load here:
> load("G.small.RData")
> load("ph.small.RData")
> fit <- grmsem.fit(ph.small,
G.small,
model="Cholesky",
LogL=TRUE,
estSE=TRUE)
> fit
$model.in
part label value freepar
1 a a11 0.6971323 1
2 a a21 -0.2357364 1
3 a a31 0.3048627 1
4 a a22 0.2067668 1
5 a a32 -0.3823634 1
6 a a33 -0.1557505 1
7 e e11 -0.4958634 1
8 e e21 -0.5924593 1
9 e e31 0.1700410 1
10 e e22 0.3570643 1
11 e e32 -0.3220280 1
12 e e33 0.5024512 1
$model.fit$LL
[1] -54.39143
$model.out
label estimates gradient se Z p
1 a11 -0.5618594 -4.490283e-05 0.18730897 -2.999640 2.702993e-03
2 a21 -0.5172780 6.160157e-05 0.25150203 -2.056755 3.970982e-02
3 a31 -0.6022167 -4.596719e-05 0.20040106 -3.005057 2.655308e-03
4 a22 -0.7791023 -1.154262e-05 0.13696258 -5.688432 1.282113e-08
5 a32 -0.3313097 6.118543e-05 0.14755775 -2.245289 2.474961e-02
6 a33 -0.4201447 7.389654e-05 0.07000431 -6.001697 1.952654e-09
7 e11 -0.8266200 2.460492e-04 0.11621420 -7.112900 1.136299e-12
8 e21 -0.2344285 -1.605268e-04 0.12464049 -1.880838 5.999398e-02
9 e31 -0.4124524 6.209974e-05 0.12534761 -3.290469 1.000207e-03
10 e22 0.3862229 -2.636815e-05 0.12634091 3.056990 2.235717e-03
11 e32 0.3410509 8.582604e-05 0.10625220 3.209824 1.328162e-03
12 e33 -0.2511355 1.911040e-04 0.05125420 -4.899803 9.593276e-07
$k
[1] 3
$n
[1] 300
$n.obs
[1] 100 100 100
$n.ind
[1] 100
$model
[1] "Cholesky"
```
Here, we derive genetic variance and correlations:
```{r eval = FALSE}
> var.fit <- grmsem.var(fit)
> print(var.fit)
$VA
1 2 3
Y1 0.3156860 0.2906375 0.3383611
Y2 0.2906375 0.8745770 0.5696376
Y3 0.3383611 0.5696376 0.6489526
$VA.se
1 2 3
Y1 0.2104826 0.1832943 0.1850112
Y2 0.1832943 0.2583423 0.2220882
Y3 0.1850112 0.2220882 0.2227517
$VE
1 2 3
Y1 0.6833006 0.1937833 0.3409414
Y2 0.1937833 0.2041249 0.2284123
Y3 0.3409414 0.2284123 0.3495017
$VE.se
1 2 3
Y1 0.1921300 0.1180125 0.1387263
Y2 0.1180125 0.1269780 0.1222845
Y3 0.1387263 0.1222845 0.1373590
$VP
1 2 3
Y1 0.9989866 0.4844208 0.6793025
Y2 0.4844208 1.0787019 0.7980499
Y3 0.6793025 0.7980499 0.9984543
$RG
1 2 3
Y1 1.0000000 0.5531272 0.7475604
Y2 0.5531272 1.0000000 0.7561242
Y3 0.7475604 0.7561242 1.0000000
$RG.se
1 2 3
Y1 0.0000000 0.22348027 1.575650e-01
Y2 0.2234803 0.00000000 8.697161e-02
Y3 0.1575650 0.08697161 4.449797e-17
$RE
1 2 3
Y1 1.0000000 0.5188747 0.6976686
Y2 0.5188747 1.0000000 0.8551589
Y3 0.6976686 0.8551589 1.0000000
$RE.se
1 2 3
Y1 5.160947e-17 2.051003e-01 1.173883e-01
Y2 2.051003e-01 5.610664e-17 8.730948e-02
Y3 1.173883e-01 8.730948e-02 2.359273e-17
```
Here, we derive the standardised parameters:
```{r eval = FALSE}
> print(st.fit)
$stand.model.out
label stand.estimates stand.se Z p
1 a11 -0.5621443 0.17186763 -3.270798 1.072444e-03
2 a21 -0.4980504 0.22959865 -2.169222 3.006584e-02
3 a31 -0.6026826 0.17316262 -3.480443 5.005852e-04
4 a22 -0.7501425 0.12682263 -5.914895 3.320880e-09
5 a32 -0.3315661 0.14661954 -2.261404 2.373422e-02
6 a33 -0.4204698 0.07358378 -5.714164 1.102447e-08
7 e11 -0.8270392 0.11681963 -7.079625 1.445454e-12
8 e21 -0.2257147 0.12967922 -1.740562 8.176046e-02
9 e31 -0.4127715 0.13122602 -3.145500 1.658030e-03
10 e22 0.3718667 0.14164900 2.625269 8.658054e-03
11 e32 0.3413148 0.11390419 2.996507 2.730917e-03
12 e33 -0.2513298 0.05457774 -4.604987 4.124919e-06
```
and the standardised variances (heritability and environmentality):
```{r eval=FALSE}
> st.var.fit <- grmsem.var(st.fit)
> print(st.var.fit)
$VA
1 2 3
Y1 0.3160062 0.2799762 0.3387946
Y2 0.2799762 0.8107680 0.5488882
Y3 0.3387946 0.5488882 0.6499573
$VA.se
1 2 3
Y1 0.1932288 0.1619891 0.1613852
Y2 0.1619891 0.1379212 0.1565957
Y3 0.1613852 0.1565957 0.1544661
$VE
1 2 3
Y1 0.6839938 0.1866749 0.3413782
Y2 0.1866749 0.1892320 0.2200922
Y3 0.3413782 0.2200922 0.3500427
$VE.se
1 2 3
Y1 0.1932288 0.1218645 0.1436571
Y2 0.1218645 0.1379212 0.1334911
Y3 0.1436571 0.1334911 0.1544661
$VP
1 2 3
Y1 1.0000000 0.4666511 0.6801728
Y2 0.4666511 1.0000000 0.7689803
Y3 0.6801728 0.7689803 1.0000000
$RG
1 2 3
Y1 1.0000000 0.5531272 0.7475604
Y2 0.5531272 1.0000000 0.7561242
Y3 0.7475604 0.7561242 1.0000000
$RG.se
1 2 3
Y1 0.0000000 0.23210789 1.574432e-01
Y2 0.2321079 0.00000000 8.945803e-02
Y3 0.1574432 0.08945803 1.633888e-17
$RE
1 2 3
Y1 1.0000000 0.5188747 0.6976686
Y2 0.5188747 1.0000000 0.8551589
Y3 0.6976686 0.8551589 1.0000000
$RE.se
1 2 3
Y1 0.0000000 2.130183e-01 1.172975e-01
Y2 0.2130183 6.290479e-17 8.880531e-02
Y3 0.1172975 8.880531e-02 5.827606e-17
```
### DS model
For comparison, we fit a Direct Symmetric model and obtain a near-identical fit (loglikelihood).
```{r eval=FALSE}
> fit.DS <- grmsem.fit(ph.small,
G.small,
model="DS",
LogL=TRUE,
estSE=TRUE)
[1] 300
[1] 100 100 100
[1] 1 101 201
[1] 100 200 300
[1] "Parameters for a direct symmetric model (variance components):"
[1] "Vector of genetic variances:"
[1] "A11" "A21" "A31" "A22" "A32" "A33"
[1] "Vector of residual factor loadings:"
[1] "E11" "E21" "E31" "E22" "E32" "E33"
> fit.DS
$model.in
part label value freepar
1 a A11 1.0000000 1
2 a A21 0.4602739 1
3 a A31 0.6738565 1
4 a A22 1.0000000 1
5 a A32 0.7527887 1
6 a A33 1.0000000 1
7 e E11 1.0000000 1
8 e E21 0.4602739 1
9 e E31 0.6738565 1
10 e E22 1.0000000 1
11 e E32 0.7527887 1
12 e E33 1.0000000 1
$model.fit$LL
[1] -54.39145
$model.fit$calls
function gradient
122 26
$model.fit$convergence
[1] 0
$model.fit$message
NULL
$model.out
label estimates gradient se Z p
1 A11 0.3161061 0.0009860302 0.2106008 1.500972 0.1333627052
2 A21 0.2912542 0.0039066047 0.1834675 1.587497 0.1124001236
3 A31 0.3389444 -0.0036802953 0.1852176 1.829979 0.0672530362
4 A22 0.8754990 0.0025629175 0.2584953 3.386905 0.0007068577
5 A32 0.5705239 -0.0051792201 0.2222947 2.566520 0.0102724710
6 A33 0.6497819 0.0048300247 0.2229978 2.913849 0.0035700307
7 E11 0.6828573 0.0009099445 0.1921358 3.554035 0.0003793684
8 E21 0.1932362 0.0081933028 0.1180255 1.637241 0.1015800561
9 E31 0.3403784 -0.0046169601 0.1387849 2.452561 0.0141843201
10 E22 0.2034536 0.0015010666 0.1268743 1.603584 0.1088057273
11 E32 0.2277079 0.0414495710 0.1222574 1.862529 0.0625285683
12 E33 0.3487737 0.0049090361 0.1373909 2.538551 0.0111312504
$k
[1] 3
$n
[1] 300
$n.obs
[1] 100 100 100
$n.ind
[1] 100
$model
[1] "DS"
```
Similarly, the variance parameter estimates and SEs for the Direct Symmetric model are near-identical to those derived for the Cholesky decomposition model.
```{r eval=FALSE}
> var.fit.ds <- grmsem.var(fit.DS)
> var.fit.ds
$VA
1 2 3
Y1 0.3161061 0.2912542 0.3389444
Y2 0.2912542 0.8754990 0.5705239
Y3 0.3389444 0.5705239 0.6497819
$VA.se
1 2 3
Y1 0.2106008 0.1834675 0.1852176
Y2 0.1834675 0.2584953 0.2222947
Y3 0.1852176 0.2222947 0.2229978
$VE
1 2 3
Y1 0.6828573 0.1932362 0.3403784
Y2 0.1932362 0.2034536 0.2277079
Y3 0.3403784 0.2277079 0.3487737
$VE.se
1 2 3
Y1 0.1921358 0.1180255 0.1387849
Y2 0.1180255 0.1268743 0.1222574
Y3 0.1387849 0.1222574 0.1373909
$VP
1 2 3
Y1 0.9989633 0.4844903 0.6793228
Y2 0.4844903 1.0789526 0.7982318
Y3 0.6793228 0.7982318 0.9985556
$RG
1 2 3
Y1 1.0000000 0.5536406 0.7478738
Y2 0.5536406 1.0000000 0.7564186
Y3 0.7478738 0.7564186 1.0000000
$RG.se
1 2 3
Y1 0.0000000 0.22323218 0.15738360
Y2 0.2232322 0.00000000 0.08684325
Y3 0.1573836 0.08684325 0.00000000
$RE
1 2 3
Y1 1.0000000 0.5184307 0.6974693
Y2 0.5184307 1.0000000 0.8548176
Y3 0.6974693 0.8548176 1.0000000
$RE.se
1 2 3
Y1 0.0000000 0.20548054 0.11753303
Y2 0.2054805 0.00000000 0.08761989
Y3 0.1175330 0.08761989 0.00000000
```
## Quad-variate example
### Data simulation
A large quad-variate data set was simulated, assuming an underlying Cholesky model, with 5000 observations per trait and high polygenicity (5,000 SNPs per genetic factor), and the following genetic (`lambda_A`) and residual (`lambda_E`) factor loadings:
\begin{equation}
\Lambda_A=\begin{pmatrix}\sqrt{0.3} & 0 & 0 & 0\\ \sqrt{0.2} & \sqrt{0.4} & 0& 0\\ \sqrt{0.2} & \sqrt{0.2} & \sqrt{0.2} & 0\\ \sqrt{0.1} & \sqrt{0.2} &0 & \sqrt{0.4}\end{pmatrix}\;,\quad \Lambda_E=\begin{pmatrix}\sqrt{0.7} & 0 & 0 & 0\\ \sqrt{0.1} & \sqrt{0.3} & 0& 0\\ \sqrt{0.2} & 0 & \sqrt{0.2} & 0\\ 0 & 0 & 0 & \sqrt{0.3}\end{pmatrix}
(\#eq:simlarge)
\end{equation}
\begin{equation}
\Sigma_A=\begin{pmatrix} 0.300 & 0.245 & 0.245 & 0.173\\ 0.245 & 0.600 & 0.483 & 0.424\\ 0.245 & 0.483 & 0.600 & 0.341\\ 0.173 & 0.424 &0.341 & 0.700 \end{pmatrix}\;,\quad \Sigma_E=\begin{pmatrix}0.700 & 0.265 & 0.374 & 0.000\\ 0.265 & 0.400 & 0.141& 0.000\\ 0.374 & 0.141 & 0.400 & 0.000\\ 0.000 & 0.000 & 0.000 & 0.300 \end{pmatrix}
(\#eq:simlargevar)
\end{equation}
The Cholesky model was pre-fitted and saved (`fit.large.RData`) to illustrate `grmsem` follow-up functions. The large GRM data file (`G.large.RData`), the phenotype file (`ph.large.RData`) and the pre-fitted output file (`fit.large.RData`) can be downloaded from [here](https://gitlab.gwdg.de/beate.stpourcain/grmsem_external). The phenotype file and the pre-fitted Cholesky output file have also been included in the package. For completeness, also the fit of a Direct Symmetric model is shown. Note that some output has been omitted for clarity.
Performance: Fitting both a Cholesky decomposition model (10 genetic and 10 residual parameters) and a Direct Symmetric model (10 genetic and 10 residual variances and covariances) to the data including a VCOV estimation for variances and covariances among all parameters, using 4 cores and sharing memory across 24 cores, required ~21h using R MKL 3.6.3 (Cholesky model: 21h2m, Direct Symmetric model: 21h24m, vmmax of 17.3 Gb RAM each). For comparison, using otherwise similar conditions, fitting an IPC model required 11h48m, and fitting an Independent Pathway model 10h14m (data not shown). However, note that this performance is based on simulated data without missing values (listwise complete data option `compl.ph` `TRUE`). RAM requirements (but not time) will increase for data with missingness patterns (listwise complete data option `compl.ph` `FALSE`) and models will require, under similar conditions, ~ 70 Gb RAM.
### Cholesky model
We fit a quad-variate Cholesky decomposition model (10 genetic and 10 residual variances) with the following model fit and estimated parameters:
```{r eval = FALSE}
#Do not run!
#Please downloaded externally
#load("G.large.RData")
#load("ph.large.RData")
#fit <- grmsem.fit(ph.large,
# G.large,model="Cholesky",
# LogL=TRUE,
# estSE=TRUE)
> load("fit.large.RData")
> fit.large
$model.in
part label value freepar
1 a a11 0.01686961 1
2 a a21 0.75022252 1
3 a a31 -0.10352849 1
4 a a41 -0.81445295 1
5 a a22 -0.34964589 1
6 a a32 0.70013464 1
7 a a42 0.30439698 1
8 a a33 0.87503509 1
9 a a43 -0.37434617 1
10 a a44 0.86376688 1
11 e e11 -0.28008894 1
12 e e21 -0.92449828 1
13 e e31 -0.72312602 1
14 e e41 -0.70924504 1
15 e e22 0.51178943 1
16 e e32 0.66207502 1
17 e e42 -0.64331456 1
18 e e33 0.16870994 1
19 e e43 0.30222504 1
20 e e44 0.21833386 1
$model.fit$LL
[1] -4556.647
$model.fit$calls
function gradient
488 118
$model.fit$convergence
[1] 0
$model.fit$message
NULL
$model.out
label estimates gradient se Z p
1 a11 -0.590283919 7.196924e-03 0.018936941 -31.1710286 2.632163e-213
2 a21 -0.406228554 4.760323e-04 0.022331488 -18.1908415 6.099640e-74
3 a31 -0.412282704 -1.306064e-03 0.021811928 -18.9017085 1.104185e-79
4 a41 -0.251181508 4.515862e-03 0.025355870 -9.9062469 3.910610e-23
5 a22 -0.611871764 7.519419e-03 0.013081543 -46.7736703 0.000000e+00
6 a32 -0.412494347 5.020765e-05 0.015153798 -27.2205248 3.712770e-163
7 a42 -0.406496327 2.858898e-03 0.020248170 -20.0757069 1.203583e-89
8 a33 0.443344860 -2.806537e-04 0.013549030 32.7215213 7.719589e-235
9 a43 -0.015668913 1.437494e-03 0.024698450 -0.6344088 5.258141e-01
10 a44 0.655953940 -5.747793e-03 0.014789225 44.3535029 0.000000e+00
11 e11 -0.794244501 2.325624e-02 0.012513836 -63.4693048 0.000000e+00
12 e21 -0.290249172 -8.449568e-04 0.013042715 -22.2537381 1.037729e-109
13 e31 -0.445243952 -5.908758e-03 0.012154714 -36.6313812 9.056960e-294
14 e41 0.006719942 7.527450e-03 0.011697368 0.5744833 5.656408e-01
15 e22 0.521077401 -1.251593e-02 0.008841668 58.9342896 0.000000e+00
16 e32 -0.054613913 1.163049e-03 0.010020443 -5.4502495 5.029922e-08
17 e42 -0.018229268 -8.354153e-03 0.011967918 -1.5231779 1.277142e-01
18 e33 0.405590017 -2.845990e-02 0.007471397 54.2856995 0.000000e+00
19 e43 -0.022769655 7.889148e-04 0.012392180 -1.8374213 6.614772e-02
20 e44 -0.467128074 1.039490e-02 0.009007167 -51.8618176 0.000000e+00
$k
[1] 4
$n
[1] 20000
$n.obs
[1] 5000 5000 5000 5000
$n.ind
[1] 5000
$model
[1] "Cholesky"
```
Here, we derive genetic variance and correlations:
```{r eval = FALSE}
> var.fit <- grmsem.var(fit.large)
> print(var.fit)
$VA
1 2 3 4
Y1 0.3484351 0.2397902 0.2433639 0.1482684
Y2 0.2397902 0.5394087 0.4198747 0.3507607
Y3 0.2433639 0.4198747 0.5366833 0.2642885
Y4 0.1482684 0.3507607 0.2642885 0.6588525
$VA.se
1 2 3 4
Y1 0.02235634 0.01660915 0.01767332 0.01514868
Y2 0.01660915 0.02017283 0.01672469 0.01562045
Y3 0.01767332 0.01672469 0.02042099 0.01515375
Y4 0.01514868 0.01562045 0.01515375 0.02078249
$VE
1 2 3 4
Y1 0.630824327 0.23052881 0.35363256 -0.005337277
Y2 0.230528809 0.35576624 0.10077361 -0.011449317
Y3 0.353632560 0.10077361 0.36572812 -0.011231587
Y4 -0.005337277 -0.01144932 -0.01123159 0.219104559
$VE.se
1 2 3 4
Y1 0.019878092 0.012210347 0.013679555 0.009287867
Y2 0.012210347 0.012054860 0.009085111 0.007093293
Y3 0.013679555 0.009085111 0.012413525 0.007220553
Y4 0.009287867 0.007093293 0.007220553 0.008358356
$VP
1 2 3 4
Y1 0.9792594 0.4703190 0.5969964 0.1429311
Y2 0.4703190 0.8951749 0.5206483 0.3393114
Y3 0.5969964 0.5206483 0.9024114 0.2530569
Y4 0.1429311 0.3393114 0.2530569 0.8779571
$RG
1 2 3 4
Y1 1.0000000 0.553110 0.5627767 0.3094522
Y2 0.5531100 1.000000 0.7803720 0.5883800
Y3 0.5627767 0.780372 1.0000000 0.4444523
Y4 0.3094522 0.588380 0.4444523 1.0000000
$RG.se
1 2 3 4
Y1 0.00000000 0.02488356 2.233207e-02 3.056075e-02
Y2 0.02488356 0.00000000 1.557784e-02 1.976462e-02
Y3 0.02233207 0.01557784 3.008489e-18 2.206251e-02
Y4 0.03056075 0.01976462 2.206251e-02 4.557191e-18
$RE
1 2 3 4
Y1 1.00000000 0.48661851 0.73623906 -0.01435621
Y2 0.48661851 1.00000000 0.27937355 -0.04100828
Y3 0.73623906 0.27937355 1.00000000 -0.03967676
Y4 -0.01435621 -0.04100828 -0.03967676 1.00000000
$RE.se
1 2 3 4
Y1 1.111452e-17 0.01770789 1.099568e-02 2.499190e-02
Y2 1.770789e-02 0.00000000 2.164591e-02 2.549970e-02
Y3 1.099568e-02 0.02164591 5.397777e-18 2.555311e-02
Y4 2.499190e-02 0.02549970 2.555311e-02 7.971451e-18
```
Here,we derive the standardised parameters:
```{r eval = FALSE}
> st.fit <- grmsem.stpar(fit.large)
> print(st.fit)
$stand.model.out
label stand.estimates stand.se Z p
1 a11 -0.596502229 0.016233733 -36.7446127 1.417362e-295
2 a21 -0.429354966 0.020456160 -20.9890304 8.261383e-98
3 a31 -0.434003098 0.019542251 -22.2084494 2.845907e-109
4 a41 -0.268071735 0.024887619 -10.7712891 4.703688e-27
5 a22 -0.646705353 0.011739313 -55.0888607 0.000000e+00
6 a32 -0.434225891 0.014761620 -29.4158701 3.441759e-190
7 a42 -0.433830407 0.018983259 -22.8533157 1.354554e-115
8 a33 0.466701710 0.013216780 35.3113009 3.939089e-273
9 a43 -0.016722540 0.024694677 -0.6771718 4.982969e-01
10 a44 0.700062328 0.012710221 55.0786896 0.000000e+00
11 e11 -0.802611420 0.012064939 -66.5242831 0.000000e+00
12 e21 -0.306772929 0.012940717 -23.7060220 3.125266e-124
13 e31 -0.468700852 0.012081674 -38.7943628 0.000000e+00
14 e41 0.007171812 0.011697423 0.6131104 5.398033e-01
15 e22 0.550742107 0.009617573 57.2641473 0.000000e+00
16 e32 -0.057491151 0.010016353 -5.7397289 9.482821e-09
17 e42 -0.019455060 0.011964987 -1.6259993 1.039498e-01
18 e33 0.426957819 0.008330528 51.2521942 0.000000e+00
19 e43 -0.024300758 0.012391052 -1.9611537 4.986110e-02
20 e44 -0.498539222 0.010093805 -49.3906111 0.000000e+00
$k
[1] 4
$n
[1] 20000
$model
[1] "Cholesky"
```
and the standardised variances (heritability and environmentality):
```{r eval=FALSE}
> st.var.fit <- grmsem.var(st.fit)
> print(st.var.fit)
$VA
1 2 3 4
Y1 0.3558149 0.2561112 0.2588838 0.1599054
Y2 0.2561112 0.6025735 0.4671576 0.3956584
Y3 0.2588838 0.4671576 0.5947213 0.2969199
Y4 0.1599054 0.3956584 0.2969199 0.7504382
$VA.se
1 2 3 4
Y1 0.01936692 0.01501072 0.01589623 0.01481065
Y2 0.01501072 0.01344962 0.01252746 0.01320059
Y3 0.01589623 0.01252746 0.01377554 0.01417036
Y4 0.01481065 0.01320059 0.01417036 0.01000981
$VE
1 2 3 4
Y1 0.644185091 0.24621946 0.37618466 -0.005756178
Y2 0.246219456 0.39742650 0.11212194 -0.012914839
Y3 0.376184656 0.11212194 0.40527870 -0.012618339
Y4 -0.005756178 -0.01291484 -0.01261834 0.249561817
$VE.se
1 2 3 4
Y1 0.019366916 0.012115870 0.013465601 0.009385665
Y2 0.012115870 0.013449618 0.009581551 0.007494475
Y3 0.013465601 0.009581551 0.013775535 0.007600587
Y4 0.009385665 0.007494475 0.007600587 0.010009812
$VP
1 2 3 4
Y1 1.0000000 0.5023307 0.6350685 0.1541492
Y2 0.5023307 1.0000000 0.5792795 0.3827435
Y3 0.6350685 0.5792795 1.0000000 0.2843016
Y4 0.1541492 0.3827435 0.2843016 1.0000000
$RG
1 2 3 4
Y1 1.0000000 0.553110 0.5627767 0.3094522
Y2 0.5531100 1.000000 0.7803720 0.5883800
Y3 0.5627767 0.780372 1.0000000 0.4444523
Y4 0.3094522 0.588380 0.4444523 1.0000000
$RG.se
1 2 3 4
Y1 7.209226e-18 2.354325e-02 2.121442e-02 2.863522e-02
Y2 2.354325e-02 4.542180e-18 1.479096e-02 1.851946e-02
Y3 2.121442e-02 1.479096e-02 5.454486e-18 2.069598e-02
Y4 2.863522e-02 1.851946e-02 2.069598e-02 4.092585e-18
$RE
1 2 3 4
Y1 1.00000000 0.48661851 0.73623906 -0.01435621
Y2 0.48661851 1.00000000 0.27937355 -0.04100828
Y3 0.73623906 0.27937355 1.00000000 -0.03967676
Y4 -0.01435621 -0.04100828 -0.03967676 1.00000000
$RE.se
1 2 3 4
Y1 0.00000000 1.675409e-02 1.044538e-02 2.341725e-02
Y2 0.01675409 4.271060e-18 2.052641e-02 2.389309e-02
Y3 0.01044538 2.052641e-02 6.732149e-18 2.394314e-02
Y4 0.02341725 2.389309e-02 2.394314e-02 4.512840e-18
```
The factorial co-heritabilities can be estimated as shown below. Note that for the first genetic factor in a Cholesky model, the factorial co-heritability is always **one**, as per definition, this factor explains the entire genetic variation of the trait. Note that `Vi` is the estimated explained trait variance by a factor and corresponds to the squared factor loading.
```{r eval=FALSE}
> grmsem.fcoher(fit.large)
> grmsem.fcoher(fit.large)$fcoher.model.out[,c(1,7:10)]
label Vi Vi.se FCOHER FCOHER.se
1 a11 3.484351e-01 0.0223563431 1.0000000000 0.000000000
2 a21 1.650216e-01 0.0181433759 0.3059306238 0.027526691
3 a31 1.699770e-01 0.0179853613 0.3167175772 0.025135935
4 a41 6.309215e-02 0.0127378513 0.0957606594 0.018914180
5 a22 3.743871e-01 0.0160084532 0.6940693762 0.027526691
6 a32 1.701516e-01 0.0125017124 0.3170428312 0.024459178
7 a42 1.652393e-01 0.0164616136 0.2507985687 0.023183443
8 a33 1.965547e-01 0.0120137852 0.3662395917 0.021786386
9 a43 2.455148e-04 0.0007739957 0.0003726401 0.001174465
10 a44 4.302756e-01 0.0194021010 0.6530681318 0.023536859
11 e11 6.308243e-01 0.0198780916 NA NA
12 e21 8.424458e-02 0.0075712747 NA NA
13 e31 1.982422e-01 0.0108236256 NA NA
14 e41 4.515762e-05 0.0001572113 NA NA
15 e22 2.715217e-01 0.0092143864 NA NA
16 e32 2.982679e-03 0.0010945112 NA NA
17 e42 3.323062e-04 0.0004363328 NA NA
18 e33 1.645033e-01 0.0060606482 NA NA
19 e43 5.184572e-04 0.0005643313 NA NA
20 e44 2.182086e-01 0.0084150015 NA NA
```
Similarly, the factorial co-environmentalities is shown below (shortened output). Again, note that for the first residual factor in a Cholesky model, the factorial co-environmentality is always **one**, as per definition this factor explains the entire residual variation of the trait.
```{r eval=FALSE}
> grmsem.fcoher(fit.large)
> grmsem.fcoher(fit.large)$fcoher.model.out[,c(1,7:8,13:14)]
label Vi Vi.se FCOENV FCOENV.se
1 a11 3.484351e-01 0.0223563431 NA NA
2 a21 1.650216e-01 0.0181433759 NA NA
3 a31 1.699770e-01 0.0179853613 NA NA
4 a41 6.309215e-02 0.0127378513 NA NA
5 a22 3.743871e-01 0.0160084532 NA NA
6 a32 1.701516e-01 0.0125017124 NA NA
7 a42 1.652393e-01 0.0164616136 NA NA
8 a33 1.965547e-01 0.0120137852 NA NA
9 a43 2.455148e-04 0.0007739957 NA NA
10 a44 4.302756e-01 0.0194021010 NA NA
11 e11 6.308243e-01 0.0198780916 1.0000000000 0.0000000000
12 e21 8.424458e-02 0.0075712747 0.2367975722 0.0172339750
13 e31 1.982422e-01 0.0108236256 0.5420479497 0.0161908960
14 e41 4.515762e-05 0.0001572113 0.0002061008 0.0007175779
15 e22 2.715217e-01 0.0092143864 0.7632024278 0.0172339750
16 e32 2.982679e-03 0.0010945112 0.0081554556 0.0029995986
17 e42 3.323062e-04 0.0004363328 0.0015166558 0.0019944184
18 e33 1.645033e-01 0.0060606482 0.4497965946 0.0162294819
19 e43 5.184572e-04 0.0005643313 0.0023662548 0.0025782116
20 e44 2.182086e-01 0.0084150015 0.9959109886 0.0034506100
```
Comparing observed phenotypic variance/covariance patterns with estimated genetic variance/covariance patterns, we can estimate bivariate heritabilities. Note that the diagonal elements of the estimated `$BIHER` matrix detail the heritabilities, while the off-diagonal elements detail the co-heritabilities.
```{r eval = FALSE}
> fit.var <- grmsem.var(fit)
> load("ph.large.RData")
> grmsem.biher(ph.large, fit.var)
$VPO
1 2 3 4
Y1 1.0000000 0.5061277 0.6278417 0.1792348
Y2 0.5061277 1.0000000 0.6167281 0.4352479
Y3 0.6278417 0.6167281 1.0000000 0.3331679
Y4 0.1792348 0.4352479 0.3331679 1.0000000
$VA
1 2 3 4
Y1 0.3484351 0.2397902 0.2433639 0.1482684
Y2 0.2397902 0.5394087 0.4198747 0.3507607
Y3 0.2433639 0.4198747 0.5366833 0.2642885
Y4 0.1482684 0.3507607 0.2642885 0.6588525
$BIHER
1 2 3 4
Y1 0.3484351 0.4737741 0.3876197 0.8272299
Y2 0.4737741 0.5394087 0.6808100 0.8058872
Y3 0.3876197 0.6808100 0.5366833 0.7932591
Y4 0.8272299 0.8058872 0.7932591 0.6588525
$BIHER.se
1 2 3 4
Y1 0.02235634 0.03281612 0.02814933 0.08451862
Y2 0.03281612 0.02017283 0.02711842 0.03588862
Y3 0.02814933 0.02711842 0.02042099 0.04548383
Y4 0.08451862 0.03588862 0.04548383 0.02078249
$BIHER.Z
1 2 3 4
Y1 15.585514 14.43723 13.77012 9.787546
Y2 14.437234 26.73937 25.10508 22.455231
Y3 13.770125 25.10508 26.28097 17.440466
Y4 9.787546 22.45523 17.44047 31.702288
$BIHER.p
1 2 3 4
Y1 9.132538e-55 3.017077e-47 3.855470e-43 1.273488e-22
Y2 3.017077e-47 1.641347e-157 4.377379e-139 1.137653e-111
Y3 3.855470e-43 4.377379e-139 3.165351e-152 4.067453e-68
Y4 1.273488e-22 1.137653e-111 4.067453e-68 1.444882e-220
```
### DS model
For comparison, we fit a quad-variate Direct Symmetric model (10 genetic and 10 residual variances) and obtain a near-identical model fit (loglikelihood).
```{r eval = FALSE}
#Do not run!
#Please downloaded externally
#load("G.large.RData")
#load("ph.large.RData")
#> out<-grmsem.fit(ph.large,
# G.large,
# model="DS",
# LogL=TRUE,
# estSE=TRUE)
> out
$model.in
part label value freepar
1 a A11 1.0000000 1
2 a A21 0.5061277 1
3 a A31 0.6278417 1
4 a A41 0.1792348 1
5 a A22 1.0000000 1
6 a A32 0.6167281 1
7 a A42 0.4352479 1
8 a A33 1.0000000 1
9 a A43 0.3331679 1
10 a A44 1.0000000 1
11 e E11 1.0000000 1
12 e E21 0.5061277 1
13 e E31 0.6278417 1
14 e E41 0.1792348 1
15 e E22 1.0000000 1
16 e E32 0.6167281 1
17 e E42 0.4352479 1
18 e E33 1.0000000 1
19 e E43 0.3331679 1
20 e E44 1.0000000 1
$model.fit$LL
[1] -4556.647
$model.fit$calls
function gradient
173 38
$model.fit$convergence
[1] 0
$model.fit$message
NULL
$model.out
label estimates gradient se Z p
1 A11 0.348458001 -0.052226894 0.022356069 15.5867299 8.960447e-55
2 A21 0.239787813 0.073256044 0.016608680 14.4374998 3.005481e-47
3 A31 0.243373776 0.046398011 0.017672850 13.7710544 3.806174e-43
4 A41 0.148252661 0.045240115 0.015148427 9.7866704 1.284564e-22
5 A22 0.539403719 -0.038929159 0.020172359 26.7397447 1.625068e-157
6 A32 0.419867512 0.053261468 0.016724145 25.1054698 4.334317e-139
7 A42 0.350754104 -0.026684946 0.015620099 22.4553060 1.135724e-111
8 A33 0.536686987 -0.032016186 0.020420456 26.2818311 3.094103e-152
9 A43 0.264276207 0.014920668 0.015153366 17.4400997 4.093589e-68
10 A44 0.658845787 0.011119193 0.020782130 31.7025151 1.434490e-220
11 E11 0.630774354 -0.070278866 0.019875702 31.7359534 4.961474e-221
12 E21 0.230506722 0.166057579 0.012209173 18.8797985 1.672229e-79
13 E31 0.353594799 0.308389144 0.013677755 25.8518155 2.321384e-147
14 E41 -0.005335735 -0.015130111 0.009287206 -0.5745253 5.656124e-01
15 E22 0.355757271 -0.040844105 0.012054375 29.5127094 1.977804e-191
16 E32 0.100760965 -0.097429983 0.009084347 11.0917126 1.376273e-28
17 E42 -0.011450535 0.016775106 0.007093098 -1.6143207 1.064579e-01
18 E33 0.365701992 -0.075051659 0.012412202 29.4631041 8.554205e-191
19 E43 -0.011230439 0.004879057 0.007220099 -1.5554412 1.198410e-01
20 E44 0.219102869 0.026457431 0.008358236 26.2140086 1.839860e-151
$k
[1] 4
$n
[1] 20000
$n.obs
[1] 5000 5000 5000 5000
$n.ind
[1] 5000
$model
[1] "DS"
```
Similarly, the variance parameter estimates and SEs for the Direct Symmetric model are near-identical to those derived for the Cholesky decomposition model.
```{r eval = FALSE}
> var.out<-grmsem.var(out)
> print(var.out)
$VA
1 2 3 4
Y1 0.3484580 0.2397878 0.2433738 0.4198675
Y2 0.2397878 0.5394037 0.1482527 0.3507541
Y3 0.2433738 0.4198675 0.5366870 0.2642762
Y4 0.1482527 0.3507541 0.2642762 0.6588458
$VA.se
1 2 3 4
Y1 0.02235607 0.01660868 0.01767285 0.01672414
Y2 0.01660868 0.02017236 0.01514843 0.01562010
Y3 0.01767285 0.01672414 0.02042046 0.01515337
Y4 0.01514843 0.01562010 0.01515337 0.02078213
$VE
1 2 3 4
Y1 0.630774354 0.23050672 0.353594799 0.10076097
Y2 0.230506722 0.35575727 -0.005335735 -0.01145053
Y3 0.353594799 0.10076097 0.365701992 -0.01123044
Y4 -0.005335735 -0.01145053 -0.011230439 0.21910287
$VE.se
1 2 3 4
Y1 0.019875702 0.012209173 0.013677755 0.009084347
Y2 0.012209173 0.012054375 0.009287206 0.007093098
Y3 0.013677755 0.009084347 0.012412202 0.007220099
Y4 0.009287206 0.007093098 0.007220099 0.008358236
$VP
1 2 3 4
Y1 0.9792324 0.4702945 0.5969686 0.5206285
Y2 0.4702945 0.8951610 0.1429169 0.3393036
Y3 0.5969686 0.5206285 0.9023890 0.2530458
Y4 0.1429169 0.3393036 0.2530458 0.8779487
$RG
1 2 3 4
Y1 1.0000000 0.5530889 0.5627792 0.8762846
Y2 0.5530889 1.0000000 0.2755402 0.5883746
Y3 0.5627792 0.7803596 1.0000000 0.4444323
Y4 0.3094107 0.5883746 0.4444323 1.0000000
$RG.se
1 2 3 4
Y1 0.00000000 0.02488339 0.02233109 0.03667066
Y2 0.02488339 0.00000000 0.02690300 0.01976462
Y3 0.02233109 0.01557817 0.00000000 0.02206248
Y4 0.03055988 0.01976462 0.02206248 0.00000000
$RE
1 2 3 4
Y1 1.00000000 0.48659729 0.73621590 0.27103868
Y2 0.48659729 1.00000000 -0.01479291 -0.04101331
Y3 0.73621590 0.27935199 1.00000000 -0.03967428
Y4 -0.01435269 -0.04101331 -0.03967428 1.00000000
$RE.se
1 2 3 4
Y1 4.413292e-18 0.01770803 1.099634e-02 0.02340894
Y2 1.770803e-02 0.00000000 2.576743e-02 0.02549942
Y3 1.099634e-02 0.02164575 5.512125e-18 0.02555251
Y4 2.499120e-02 0.02549942 2.555251e-02 0.00000000
```
# GREML versus GRM-SEM
The GREML method implemented in the software [GCTA](https://cnsgenomics.com/software/gcta/) [@Yang2011] can be used to estimate bivariate genetic correlations based on variance/covariance estimates for two traits. Transforming the quad-variate simulated data `large` grmsem data above into GCTA format (`large.gcta.grm.gz`, `large.gcta.phe`, `large.gcta.grm.id`), the bivariate model estimates from `GREML` and `grmsem` analyses can be compared, here shown for traits Y1 and Y2 (the first two of the 4 simulated traits). We carried out a comparison of model estimates from bi-variate GCTA GREML and GRM-SEM models for two selected traits of the simulated quad-variate `large` data set.
GCTA `GREML` estimates were fitted as follows:
```{r eval = FALSE}
gcta64 --reml-bivar 1 2 --grm-gz large.gcta --pheno large.gcta.phe
--reml-bivar-lrt-rg 0 --thread-num 4 --out Y1_Y2 --reml-maxit 1000 >
large.gcta.Y1Y2.log`
```
which yielded the estimates:
```{r eval = FALSE}
Summary result of REML analysis (shortened output):
Source Variance SE
V(G)/Vp_tr1 0.361387 0.020470
V(G)/Vp_tr2 0.635426 0.015053
rG 0.557367 0.026039
```
Similar estimates can be obtained with `grmsem`, fitting a Cholesky model:
```{r eval = FALSE}
load("ph.large.RData")
load("G.large.RData")
ph.biv<-ph.large[,c(1,2)]
fit <- grmsem.fit(ph.biv, G.large, LogL = TRUE, estSE = TRUE)
print(fit)
st.fit <- grmsem.stpar(fit)
print(st.fit)
st.var.fit <- grmsem.var(st.fit)
print(st.var.fit)
```
The standardised variances are (shortened output):
```{r eval = FALSE}
$VA
1 2
Y1 0.3613708 0.2669526
Y2 0.2669526 0.6349136
$VA.se
1 2
Y1 0.01943807 0.01555813
Y2 0.01555813 0.01355490
$RG
1 2
Y1 1.0000000 0.5573145
Y2 0.5573145 1.0000000
$RG.se
1 2
Y1 7.179876e-18 0.0239077
Y2 2.390770e-02 0.0000000
```
Both, GREML and GRM-SEM estimates from bivariate models, correspond to the GRM-SEM estimates from the quad-variate Cholesky model (see above).
# References