Jonas HaslbeckPhD Student Psych Methods
http://jmbh.github.io/
Deconstructing 'Measurement error and the replication crisis'<p>Yesterday, I read <a href="http://science.sciencemag.org/content/355/6325/584/tab-pdf">‘Measurement error and the replication crisis’</a> by <a href="http://hhd.psu.edu/dsg/eric-loken-phd-assistant-director">Eric Loken</a> and <a href="http://andrewgelman.com">Andrew Gelman</a>, which left me puzzled. The first part of the paper consists of general statements about measurement error. The second part consists of the claim that in the presence of measurement error, we overestimate the true effect when having a small sample size. This sounded wrong enough to ask the authors for their <a href="https://raw.githubusercontent.com/jmbh/jmbh.github.io/master/figs/measurementerror/graph%20codes%20to%20share%20for%20science%20paper%20final-2.txt">simulation code</a> and spend a couple of hours to figure out what they did in their paper. I am offering a short and a long version.</p>
<p><strong>Edit Feb 17th:</strong> After a nice email converstaion with the authors, I now know that they <em>do</em> make their general argument only under the condition of selecting on significance. Their result then trivially follows from the increased variance of the sampling distribution due to adding ‘measurement error’ (see section (3) below). My source of confusion was that they talk about selection on significance in the paper, but then do not select on significance in the two scatter plots, and incorrectly state in the figure title, that they do. The conclusions of this blog post are still valid when making the assumptions in (1), so I leave it online in case somebody finds (parts of) it interesting.</p>
<h2 id="the-short-version">The Short Version</h2>
<p>My conclusion is that the authors show the following: If an estimator is biased (here by the presence of measurement error), then the proportion of estimates that overestimate the true effect depends on the variance of the sampling distribution (which depends on $N$). While this is an interesting insight, the authors do not say this clearly anywhere in the paper. Instead, they use formulations that suggest that they refer to the expected value of the estimator, which does not depend on the sample size. To make things worse, they plot the estimates in a way that suggest that the variance of the estimators is equal for N = 50 and N = 3000 and that the effect is driven by a difference in expected value, while the reverse is true.</p>
<h2 id="the-long-version">The Long Version</h2>
<p>I try to make an argument for my claims in the ‘short version’ above in 6 steps. (1) We make clear what the claim is the authors make, (2) we define our terminology, (3) we investigate what adding measurement error does on the population level, (4) we see how this influences the characteristics of estimators based on different sample sizes, (5) we summarize our results and (6) get back to the paper.</p>
<p><strong>(1) The exact claim</strong></p>
<p>The authors write <em>‘In a low-noise setting, the theoretical results of Hausman and others correctly show that measurement error will attenuate co- efficient estimates. But we can demonstrate with a simple exercise that the opposite occurs in the presence of high noise and selection on statistical significance.’ (p. 584/585)</em>. From this we can deduce that the authors claim that ‘In a high noise setting, the presence of measurement error and selection on statistical significance leads to an increase in coefficient estimates’. However, the authors do not select on statistical significance in their simulation, hence we also drop this condition and arrive at the claim ‘In a high noise setting, the presence of measurement error leads to an increase in coefficient estimates’.</p>
<p>What this statement means is unclear to me. Under the reasonable assumption that the authors did not make a fundamental mistake, the rest of this blogpost is about finding out what the authors could have meant.</p>
<p><strong>(2) Terminology (for reference)</strong></p>
<p>In the paper, ‘measurement error’, ‘noise’ and ‘variance’ are used interchangeably. Here, with variances we refer to the variances of the dimensions of the bivariate Gaussian distribution, if not stated otherwise. With measurement error we mean another bivariate Gaussian distribution with zero covariance. By a noisy setting, we refer to a situation with a low signal to noise ratio. This is defined relative to another setting, which is less noisy. The signal to noise ratio is a function of $N$ and is related to the variance of the sampling distribution of the estimator. All these things will become clear in sections (3) and (4).</p>
<p><strong>(3) What does ‘adding measurement error’ mean on the population level?</strong></p>
<p>In order to evaluate the above claim with respect to the simulation setup of the authors, we need to know the simulation setup. Fortunately, the authors provided the code in a quick and friendly email.</p>
<p>The authors consider the problem of estimating the covariance of a bivariate Gaussian distribution from a finite number of observations. The bivariate Gaussian distribution has the density</p>
<script type="math/tex; mode=display">f(x_1, x_2) = \frac{1}{\sqrt{(2 \pi)^k | \mathbf{ \Sigma } | }} \exp \bigl \{ - \frac{1}{2} \bigr (x - \mu)^{\top} \mathbf{ \Sigma }^{-1} (x - \mu) \},</script>
<p>where in our case the covariance $cov(x_1, x_2) = r > 0$ is some positive value, so the covariance matrix $\Sigma$ has entries:</p>
<script type="math/tex; mode=display">% <![CDATA[
\Sigma = \begin{bmatrix}
1 & r \\[0.3em]
r & 1
\end{bmatrix} %]]></script>
<p>Note that if we scale both dimensions of the Gaussian to $\mu_1 = \mu_2 = 0$ and $\sigma_1 = \sigma_2 = 1$ the correlation coefficient is equal to the coefficient of the regression of $x_1$ on $x_2$ or vice versa. Thus all results obtained here also extend to the regression coefficient that is refered to in the paper.</p>
<p>Now the authors ‘add measurement error’ to the two variables which consists of independent Gaussian noise with a variance $k > 0$, where $k$ is a constant. Notice that these two variables can also described by a bivariate Gaussian with covariance matrix $\Sigma^{ME}$:</p>
<script type="math/tex; mode=display">% <![CDATA[
\Sigma^{ME} = \begin{bmatrix}
k & 0 \\[0.3em]
0 & k
\end{bmatrix} %]]></script>
<p>Notice that adding ‘measurement error’ as done by the authors is the same as adding these two Gaussians. Addition is a linear transformation and hence the resulting distribution is again a bivariate Gaussian distribution. Indeed, it turns out that the covariance matrix $\Sigma^A$ of the resulting bivariate Gaussian is the sum of the covariance matrices $\Sigma$ and $\Sigma^{ME}$ of the two bivariate Gaussians:</p>
<script type="math/tex; mode=display">% <![CDATA[
\Sigma^A = \begin{bmatrix}
1 & r \\[0.3em]
r & 1
\end{bmatrix}
+
\begin{bmatrix}
k & 0 \\[0.3em]
0 & k
\end{bmatrix}
=
\begin{bmatrix}
k + 1 & r \\[0.3em]
r & k + 1
\end{bmatrix} %]]></script>
<p>Now, if we renormalize the variances to get back to a correlation matrix it becomes obvious that adding ‘measurement error’ has to decrease the absolute value of the covariance:</p>
<script type="math/tex; mode=display">% <![CDATA[
\Sigma^{A_{norm}} = \begin{bmatrix}
1 & \frac{r}{k + 1} \\[0.3em]
\frac{r}{k + 1} & 1
\end{bmatrix} %]]></script>
<p>Note that $k > 0$ and hence $\frac{r}{k + 1} < r$ and hence the absolute value of the covariance is smaller in \Sigma^{A_{norm}} than in $\Sigma$ in the population.</p>
<p><strong>(4) Properties of the Estimator</strong></p>
<p>We now consider the estimate $\hat \sigma_{1,2}$ for the covariance between $x_1$ and $x_2$ in the bivariate Gaussian with covariance matrix $\Sigma^{A_{norm}}$ which is ‘corrupted’ by measurement error. We obtain $\hat \sigma_{1,2}$ via the least squares estimator, <a href="http://math.stackexchange.com/questions/787939/show-that-the-least-squares-estimator-of-the-slope-is-an-unbiased-estimator-of-t">which is an unbiased estimator</a> for $\frac{r}{k + 1}$.</p>
<p>What does this mean? This means that by the <a href="https://en.wikipedia.org/wiki/Central_limit_theorem">Central limit theorem</a>, the sampling distribution will be a Gaussian distribution that is centered on the true coefficient, which is $\frac{r}{k + 1}$. Thus, if we take many samples of size $N$ and compute a coefficient estimate on each of them, the mean coefficient will be equal to $\frac{r}{k + 1}$:</p>
<script type="math/tex; mode=display">\mathbb{E} [\hat \sigma_{1,2}] = \lim_{S \rightarrow \infty} \frac{1}{S} \sum_{i=1}^{\infty} \hat \sigma_{1,2}^i = \frac{r}{k + 1}</script>
<p>From the fact that the Gaussian density is symmetric and centered on the true effect, it follows that $\hat \sigma_{1,2}$ will <em>equally often</em> under- and overestimate the true effect $\frac{r}{k + 1}$. It is important to stress that this is true, irrespective of the variance of the sampling distribution (which depends on $N$). We illustrate this in the following Figure which shows the empirical sampling distributions from the simulation of the authors:</p>
<p><img src="https://raw.githubusercontent.com/jmbh/jmbh.github.io/master/figs/measurementerror/SamplingDistri_new.png" alt="center" /></p>
<p>The solid black line indicates the density estimate of the empirical sampling distribution of the coefficient estimates in the low noise (N = 3000) case. The solid red line indicates the density of the empirical sampling distribution of in the high noise (N = 50) case. The dashed black and red lines indicate the arithmetic means of the corresponding sampling distributions. The green dashed line indicates the true coefficient of the bivariate Gaussian with added measurement error. Now, as predicted from the fact that $\hat \sigma_{1,2}$ is an unbiased estimator independent of $N$, we see that the mean parameter estimates in both low/high noise setting (black/red dashed lines) are close to the true coefficient $\frac{r}{k + 1}$ (dashed green line).</p>
<p>Before moving on, we define $\mathcal{P}^\uparrow \in [0,1]$ as the proportion of coefficient estimates that are larger than the true effect $r$ and hence overestimate it. $\mathcal{P}^\uparrow_H$ refers to that proportion in the high noise (small $N$) setting, $\mathcal{P}^\uparrow_L$ refers to that proportion in the low noise (large $N$) setting.</p>
<p>Now, the second important observation is that for both noise settings we have $\mathcal{P}^\uparrow_H = \mathcal{P}^\uparrow_L = \frac{1}{2}$, which implies that we equally often under- and overestimate the true effect. Note that another way of saying this is that the area under the curve left of the green line is equal to the area under the curve right to the orange line, for both sampling distributions.</p>
<p>We now make the crucial step by considering $\hat \sigma_{1,2}$ not as an estimate for the covariance $\frac{r}{k + 1}$ in $\Sigma^{A_{norm}}$, but for the covariance $r$ of the ‘true’ bivariate Gaussian without added measurement error with covariance matrix $\Sigma$. We <em>know</em> that $\hat \sigma_{1,2}$ is an unbiased estimator for $\frac{r}{k + 1}$ and we know $\frac{r}{k + 1} < r$. From this follows that $\hat{\sigma}_{1,2}$ is a <em>biased</em> estimator for $r$. Specifically, the estimator is biased downwards.</p>
<p>We again look at the proportions of coefficient estimates that under- and overestimate the true effect $r$ (the dashed blue line in the figure). We first consider the low noise case: the first observation is that we overestimate $r$ <em>less often</em> than we overestimated $\frac{r}{k + 1}$, which implies $\mathcal{P}^\uparrow_L < \frac{1}{2}$. Again, this is the same as saying that the area under the curve on the right of the blue line is smaller than the area under the curve left to the blue line.</p>
<p>For the high noise case the exact same is true, i.e. $\mathcal{P}^\uparrow_H < \frac{1}{2}$. Let’s define $q := \frac{\mathcal{P}^\uparrow_H}{\mathcal{P}^\uparrow_L}$. Now what we <em>do</em> we have is that $\mathcal{P}^\uparrow_H > \mathcal{P}^\uparrow_L$ and hence $q > 1$. This means that in the presence of measurement error, we overestimate <em>absolutely less</em> often than we underestimate in all settings, however, we overestimate <em>relatively more</em> in a high noise (small $N$) setting compared to a low noise (large $N$) setting. Let’s let this sink in for a moment and then move on to the summary:</p>
<p><strong>(5) Summary</strong></p>
<p>What have we found? We found that if our estimator is biased downwards (here by measurement error), then different sample sizes (and hence different variances of the sampling distribution) lead to different proportions of coefficient estimates that overestimate the true effect.</p>
<p>However, it is important to stress: when keeping $N$ constant and introducing measurement error, the proportion of overestimating estimates <em>decreases</em> compared to the situation without measurement error. This is because the whole sampling distribution is shifted towards zero in the presence of measurement error (the blue line is shifted to the position of the green line in the Figure).</p>
<p>The only thing that is increasing is $q$, which means that in the presence of measurement error in a high noise setting (small $N$) we <em>relatively</em> overestimate more than in a low noise setting (high $N$). What determines $q$? The larger the difference between the variances of two sampling distributions, the larger $q$. The more we shift the sampling distribution towards zero (by adding measurement error), the larger $q$.</p>
<p><strong>(6) Back to the Paper</strong></p>
<p>I think the results stated in (5) are pretty far away from the claim in the paper, which was ‘In a high noise setting, the presence of measurement error leads to an increase in coefficient estimates’. This statement rather suggests that introducing measurement error increases the expected value of the sampling distribution (moving the blue line to the right instead of to the left) which is - as we have seen - incorrect. This false suggestion is strengthened by the scaling of the figures. We illustrate this here, by plotting the figure as shown in the paper (top row) and with equal coordinate systems (bottom row).</p>
<p><img src="https://raw.githubusercontent.com/jmbh/jmbh.github.io/master/figs/measurementerror/ScalingIssue.png" alt="center" /></p>
<p>The top row suggests that the difference between the low/high noise setting is because the whole cloud is ‘shifted’ downwards in the low noise setting. This would mean that the sampling distributions are shifted differently depending on the noise setting (sample size) when adding measurement error. On the other hand, when plotting the data in the same coordinate system, it is clear that the expected values do not change and that effect is driven by the differing variances of the estimator.</p>
<p>And one more thing: in the right panel in the figure of the paper the authors plot $\mathcal{P}^\uparrow$ as a function of $N$. Note that from the discussion in (4) it follows that this value can <em>never</em> be larger than $\frac{1}{2}$ as long as the estimator is unbiased or biased downwards. So there must have been some mistake.</p>
<h2 id="conclusion">Conclusion</h2>
<p>This was a fun opportunity to do some statistics detective work. However, the lack of clarity does potentially also do quite some harm by confusing the reader about important concepts. There is of course also the possibility that I just fully misunderstood their paper. In that case I hope the reader will point to my mistakes.</p>
<p>The code to exactly reproduce the above figures can be found <a href="https://raw.githubusercontent.com/jmbh/jmbh.github.io/master/figs/measurementerror/RCode_ME_comment.R">here</a>.</p>
<p>I would like to thank <a href="https://twitter.com/fdabl">Fabian Dablander</a> and <a href="https://www.gess.ethz.ch/en/the-department/people/person-detail.html?persid=191462">Peter Edelsbrunner</a> for helpful comments on this blogpost. In addition, I would like to thank <a href="https://www.uu.nl/staff/ORyan/0">Oisín Ryan</a> and <a href="https://www.uu.nl/medewerkers/JJBroere/0">Joris Broere</a> for an interesting discussion on a train ride from Eindhoven to Utrecht yesterday, and I apologize to about 15 anonymous Dutch travelers because they had to endure a heated statistical debate.</p>
<p>I am looking forward to comments, complaints and corrections.</p>
Thu, 16 Feb 2017 00:00:00 +0000
http://jmbh.github.io//Deconstructing-ME/
http://jmbh.github.io//Deconstructing-ME/Predictability in Network Models<p>Network models have become a popular way to abstract complex systems and gain insights into relational patterns among observed variables in <a href="http://www.sachaepskamp.com/files/NA/NetworkTakeover.pdf">almost any area of science</a>. The majority of these applications focuses on analyzing the structure of the network. However, if the network is not directly observed (Alice and Bob are friends) but <em>estimated</em> from data (there is a relation between smoking and cancer), we can analyze - in addition to the network structure - the predictability of the nodes in the network. That is, we would like to know: how well can an arbitrarily picked node in the network predicted by all remaining nodes in the network?</p>
<p>Predictability is interesting for several reasons:</p>
<ol>
<li>It gives us an idea of how <em>practically relevant</em> edges are: if node A is connected to many other nodes but these only explain, let’s say, only 1% of its variance, how interesting are the edges connected to A?</li>
<li>We get an indication of how to design an <em>intervention</em> in order to achieve a change in a certain set of nodes and we can estimate how efficient the intervention will be</li>
<li>It tells us to which extent different parts of the network are <em>self-determined or determined by other factors</em> that are not included in the network</li>
</ol>
<p>In this blogpost, we use the R-package <a href="https://cran.r-project.org/web/packages/mgm/index.html">mgm</a> to estimate a network model and compute node wise predictability measures for a <a href="http://cpx.sagepub.com/content/3/6/836.short">dataset</a> on <a href="https://en.wikipedia.org/wiki/Posttraumatic_stress_disorder">Post Traumatic Stress Disorder (PTSD)</a> symptoms of <a href="https://en.wikipedia.org/wiki/2008_Sichuan_earthquake">Chinese earthquake victims</a>. We visualize the network model and predictability using <a href="https://cran.r-project.org/web/packages/qgraph/index.html">the qgraph package</a> and discuss how the combination of network model and node wise predictability can be used to design effective interventions on the symptom network.</p>
<h2 id="load-data">Load Data</h2>
<p>We load the data which the authors made freely available:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">data</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">read.csv</span><span class="p">(</span><span class="s1">'http://psychosystems.org/wp-content/uploads/2014/10/Wenchuan.csv'</span><span class="p">)</span><span class="w">
</span><span class="n">data</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">na.omit</span><span class="p">(</span><span class="n">data</span><span class="p">)</span><span class="w">
</span><span class="n">p</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">ncol</span><span class="p">(</span><span class="n">data</span><span class="p">)</span><span class="w">
</span><span class="nf">dim</span><span class="p">(</span><span class="n">data</span><span class="p">)</span><span class="w">
</span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="m">344</span><span class="w"> </span><span class="m">17</span></code></pre></figure>
<p>The datasets contains complete responses to 17 PTSD symptoms of 344 individuals. The answer categories for the intensity of symptoms ranges from 1 ‘not at all’ to 5 ‘extremely’. The exact wording of all symptoms is in the <a href="http://cpx.sagepub.com/content/3/6/836.short">paper of McNally and colleagues</a>.</p>
<h2 id="estimate-network-model">Estimate Network Model</h2>
<p>We estimate a <a href="http://www.jmlr.org/proceedings/papers/v33/yang14a.pdf">Mixed Graphical Model (MGM)</a>, where we treat all variables as continuous-Gaussian variables. Hence we set the type of all variables to <code class="highlighter-rouge">type = 'g'</code> and the number of categories for each variable to 1, which is the default for continuous variables <code class="highlighter-rouge">lev = 1</code>:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">install.packages</span><span class="p">(</span><span class="s1">'mgm'</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">mgm</span><span class="p">)</span><span class="w">
</span><span class="n">fit_obj</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">mgmfit</span><span class="p">(</span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data</span><span class="p">,</span><span class="w">
</span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="s1">'g'</span><span class="p">,</span><span class="w"> </span><span class="n">p</span><span class="p">),</span><span class="w">
</span><span class="n">lev</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">p</span><span class="p">),</span><span class="w">
</span><span class="n">rule.reg</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'OR'</span><span class="p">)</span></code></pre></figure>
<p>For more info on how to estimate Mixed Graphical Models using the mgm package see <a href="http://jmbh.github.io/Estimation-of-mixed-graphical-models/">this previous post</a> or the <a href="https://arxiv.org/pdf/1510.06871v2.pdf">mgm paper</a>.</p>
<h2 id="compute-predictability-of-nodes">Compute Predictability of Nodes</h2>
<p>After estimating the network model we are ready to compute the predictability for each node. Node wise predictability (or error) can be easily computed, because the graph is estimated by taking each node in turn and regressing all other nodes on it. As a measure for predictability we pick the propotion of explained variance, as it is straight forward to interpret: 0 means the node at hand is not explained at all by other nodes in the nentwork, 1 means perfect prediction. We centered all variables before estimation in order to remove any influence of the intercepts. For a detailed description of how to compute predictions and to choose predictability measures, <a href="https://arxiv.org/abs/1610.09108">check out this preprint</a>. In case there are additional variable types (e.g. categorical) in the network, we can choose an appropriate measure for these variables (e.g. % correct classification, see <code class="highlighter-rouge">?predict.mgm</code>).</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">pred_obj</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">predict</span><span class="p">(</span><span class="n">fit_obj</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="p">,</span><span class="w">
</span><span class="n">error.continuous</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'VarExpl'</span><span class="p">)</span><span class="w">
</span><span class="o">></span><span class="w"> </span><span class="n">pred_obj</span><span class="o">$</span><span class="n">error</span><span class="w">
</span><span class="n">Variable</span><span class="w"> </span><span class="n">Error</span><span class="w"> </span><span class="n">ErrorType</span><span class="w">
</span><span class="m">1</span><span class="w"> </span><span class="n">intrusion</span><span class="w"> </span><span class="m">0.583</span><span class="w"> </span><span class="n">VarExpl</span><span class="w">
</span><span class="m">2</span><span class="w"> </span><span class="n">dreams</span><span class="w"> </span><span class="m">0.590</span><span class="w"> </span><span class="n">VarExpl</span><span class="w">
</span><span class="m">3</span><span class="w"> </span><span class="n">flash</span><span class="w"> </span><span class="m">0.513</span><span class="w"> </span><span class="n">VarExpl</span><span class="w">
</span><span class="m">4</span><span class="w"> </span><span class="n">upset</span><span class="w"> </span><span class="m">0.615</span><span class="w"> </span><span class="n">VarExpl</span><span class="w">
</span><span class="m">5</span><span class="w"> </span><span class="n">physior</span><span class="w"> </span><span class="m">0.601</span><span class="w"> </span><span class="n">VarExpl</span><span class="w">
</span><span class="m">6</span><span class="w"> </span><span class="n">avoidth</span><span class="w"> </span><span class="m">0.648</span><span class="w"> </span><span class="n">VarExpl</span><span class="w">
</span><span class="m">7</span><span class="w"> </span><span class="n">avoidact</span><span class="w"> </span><span class="m">0.626</span><span class="w"> </span><span class="n">VarExpl</span><span class="w">
</span><span class="m">8</span><span class="w"> </span><span class="n">amnesia</span><span class="w"> </span><span class="m">0.327</span><span class="w"> </span><span class="n">VarExpl</span><span class="w">
</span><span class="m">9</span><span class="w"> </span><span class="n">lossint</span><span class="w"> </span><span class="m">0.419</span><span class="w"> </span><span class="n">VarExpl</span><span class="w">
</span><span class="m">10</span><span class="w"> </span><span class="n">distant</span><span class="w"> </span><span class="m">0.450</span><span class="w"> </span><span class="n">VarExpl</span><span class="w">
</span><span class="m">11</span><span class="w"> </span><span class="n">numb</span><span class="w"> </span><span class="m">0.333</span><span class="w"> </span><span class="n">VarExpl</span><span class="w">
</span><span class="m">12</span><span class="w"> </span><span class="n">future</span><span class="w"> </span><span class="m">0.450</span><span class="w"> </span><span class="n">VarExpl</span><span class="w">
</span><span class="m">13</span><span class="w"> </span><span class="n">sleep</span><span class="w"> </span><span class="m">0.531</span><span class="w"> </span><span class="n">VarExpl</span><span class="w">
</span><span class="m">14</span><span class="w"> </span><span class="n">anger</span><span class="w"> </span><span class="m">0.483</span><span class="w"> </span><span class="n">VarExpl</span><span class="w">
</span><span class="m">15</span><span class="w"> </span><span class="n">concen</span><span class="w"> </span><span class="m">0.604</span><span class="w"> </span><span class="n">VarExpl</span><span class="w">
</span><span class="m">16</span><span class="w"> </span><span class="n">hyper</span><span class="w"> </span><span class="m">0.602</span><span class="w"> </span><span class="n">VarExpl</span><span class="w">
</span><span class="m">17</span><span class="w"> </span><span class="n">startle</span><span class="w"> </span><span class="m">0.605</span><span class="w"> </span><span class="n">VarExpl</span></code></pre></figure>
<p>We calculated the percentage of variance explained in each of the nodes in the network. Next, we visualize the estimated network and discuss its structure in relation to explained variance.</p>
<h2 id="visualize-network--predictability">Visualize Network & Predictability</h2>
<p>We provide the estimated weighted adjacency matrix and the node wise predictability measures as arguments to <code class="highlighter-rouge">qgraph()</code> …</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">install.packages</span><span class="p">(</span><span class="s1">'qgraph'</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">qgraph</span><span class="p">)</span><span class="w">
</span><span class="n">jpeg</span><span class="p">(</span><span class="n">paste0</span><span class="p">(</span><span class="n">figDir</span><span class="p">,</span><span class="w"> </span><span class="s1">'McNallyNetwork.jpg'</span><span class="p">),</span><span class="w"> </span><span class="n">width</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1500</span><span class="p">,</span><span class="w"> </span><span class="n">height</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1500</span><span class="p">)</span><span class="w">
</span><span class="n">qgraph</span><span class="p">(</span><span class="n">fit_obj</span><span class="o">$</span><span class="n">wadj</span><span class="p">,</span><span class="w"> </span><span class="c1"># weighted adjacency matrix as input
</span><span class="w"> </span><span class="n">layout</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'spring'</span><span class="p">,</span><span class="w">
</span><span class="n">pie</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pred_obj</span><span class="o">$</span><span class="n">error</span><span class="o">$</span><span class="n">Error</span><span class="p">,</span><span class="w"> </span><span class="c1"># provide errors as input
</span><span class="w"> </span><span class="n">pieColor</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="s1">'#377EB8'</span><span class="p">,</span><span class="n">p</span><span class="p">),</span><span class="w">
</span><span class="n">node.color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fit_obj</span><span class="o">$</span><span class="n">edgecolor</span><span class="p">,</span><span class="w">
</span><span class="n">labels</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">colnames</span><span class="p">(</span><span class="n">data</span><span class="p">))</span><span class="w">
</span><span class="n">dev.off</span><span class="p">()</span></code></pre></figure>
<p>… and get the following network visualization:</p>
<p><img src="http://jmbh.github.io/figs/2016-11-01-Predictability-in-network-models/McNellyNetwork.jpg" alt="center" /></p>
<p><a href="http://jmbh.github.io/Predictability-in-network-models/">[Click here for the original post with larger figures]</a></p>
<p>Each variable is represented by a node and the edges correspond to partial correlations, because in this dataset the MGM consists only of conditional Gaussian variables. The green color of the edges indicates that all partial correlations in this graph are positive, and the edge-width is proportional to the absolute value of the partial correlation. The blue pie chart behind the node indicates the predictability measure for each node (more blue = higher predictability).</p>
<p>We see that intrusive memories, traumatic dreams and flashbacks cluster together. Also, we observe that avoidance of thoughts (avoidth) about trauma interacts with avoidance of acitivies reminiscent of the trauma (avoidact) and that hypervigilant (hyper) behavior is related to feeling easily startled (startle). But there are also less obvious interactions, for instance between anger and concentration problems.</p>
<p>Now, if we would like to reduce sleep problems, the network model suggests to intervene on the variables anger and startle. But what the network structure does not tell us is <em>how much</em> we could possibly change sleep through the variables anger and startle. The predictability measure gives us an answer to this question: 53.1%. If the goal was to intervene on amnesia, we see that all adjacent nodes in the network explain only 32.7% of its variance. In addition, we see that there are many small edges connected to amnesia, suggesting that it is hard to intervene on amnesia via other nodes in the symptom network. Thus, one would possibly try to find additional variables that are not included in the network that interact with amnesia or try to intervene on amnesia directly.</p>
<h2 id="limitations">Limitations!</h2>
<p>Of course, there are limitations to interpreting explained variance as predicted treatment outcome: first, we cannot know the causal direction of the edges, so any edge could point in one or both directions. However, if there is no edge, there is also no causal effect in any direction. Also, it is often reasonable to combine the network model with general knoweldge: for instance, it seems more likely that amnesia causes being upset than the other way around. Second, we estimated the model on cross-sectional data (each row is one person) and hence assume that all people are the same, which is an assumption that is always violated to some extent. To solve this problem we would need (many) repeated measurements of a single person, in order to estimate a model specific to that person. This also solves the first problem to some degree as we can use the direction of time as the direction of causality. One would then use models that predict all symptoms at time point t by all symptoms at an earlier time point, let’s say t-1. An example of such a model is the <a href="https://en.wikipedia.org/wiki/Vector_autoregression">Vector Autoregressive (VAR) model</a>.</p>
<h2 id="compare-within-vs-out-of-sample-predictability">Compare Within vs. Out of Sample Predictability</h2>
<p>So far we looked into how well we can predict nodes by all other nodes within our sample. But in most situations we are interested in the predictability of nodes in new, unseen data. In what follows, we compare the within sample predictability with the out of sample predictability.</p>
<p>We first split the data in two parts: a training part (60% of the data), which we use to estimate the network model and a test part, which we will use to compute predictability measures on:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">set.seed</span><span class="p">(</span><span class="m">1</span><span class="p">)</span><span class="w">
</span><span class="n">ind</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">sample</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="kc">TRUE</span><span class="p">,</span><span class="kc">FALSE</span><span class="p">),</span><span class="w"> </span><span class="n">prob</span><span class="o">=</span><span class="nf">c</span><span class="p">(</span><span class="m">.6</span><span class="p">,</span><span class="w"> </span><span class="m">.4</span><span class="p">),</span><span class="w"> </span><span class="n">size</span><span class="o">=</span><span class="n">nrow</span><span class="p">(</span><span class="n">data</span><span class="p">),</span><span class="w"> </span><span class="n">replace</span><span class="o">=</span><span class="nb">T</span><span class="p">)</span><span class="w"> </span><span class="err">#</span><span class="w"> </span><span class="n">divide</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">2</span><span class="w"> </span><span class="n">parts</span><span class="w"> </span><span class="p">(</span><span class="m">60</span><span class="o">% training set, 40%</span><span class="w"> </span><span class="n">test</span><span class="w"> </span><span class="n">set</span><span class="p">)</span></code></pre></figure>
<p>Next, we estimate the network only on the training data and compute the predictability measure both on the training data and the test data:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">fit_obj_cv</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">mgmfit</span><span class="p">(</span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data</span><span class="p">[</span><span class="n">ind</span><span class="p">,],</span><span class="w">
</span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="s1">'g'</span><span class="p">,</span><span class="w"> </span><span class="n">p</span><span class="p">),</span><span class="w">
</span><span class="n">lev</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">p</span><span class="p">),</span><span class="w">
</span><span class="n">rule.reg</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'OR'</span><span class="p">)</span><span class="w">
</span><span class="n">pred_obj_train</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">predict</span><span class="p">(</span><span class="n">fit_obj_cv</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="p">[</span><span class="n">ind</span><span class="p">,],</span><span class="w"> </span><span class="n">error.continuous</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'VarExpl'</span><span class="p">)</span><span class="w"> </span><span class="c1"># Compute Preditions on training data 60%
</span><span class="n">pred_obj_test</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">predict</span><span class="p">(</span><span class="n">fit_obj_cv</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="p">[</span><span class="o">!</span><span class="n">ind</span><span class="p">,],</span><span class="w"> </span><span class="n">error.continuous</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'VarExpl'</span><span class="p">)</span><span class="err">#</span><span class="w"> </span><span class="n">Compute</span><span class="w"> </span><span class="n">Predictions</span><span class="w"> </span><span class="n">on</span><span class="w"> </span><span class="n">test</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="m">40</span><span class="o">%</span></code></pre></figure>
<p>We now look at the mean predictability over nodes for the training- and test dataset:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">mean</span><span class="p">(</span><span class="n">pred_obj_train</span><span class="o">$</span><span class="n">error</span><span class="o">$</span><span class="n">Error</span><span class="p">)</span><span class="w"> </span><span class="c1"># mean explained variance training data
</span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="m">0.5384118</span><span class="w">
</span><span class="n">mean</span><span class="p">(</span><span class="n">pred_obj_test</span><span class="o">$</span><span class="n">error</span><span class="o">$</span><span class="n">Error</span><span class="p">)</span><span class="w"> </span><span class="c1"># mean explained variance test data
</span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="m">0.4494118</span></code></pre></figure>
<p>As to be expected, the explained variance is higher in the training dataset. This is because we fit the model to structure that is specific to the training data and is not present in the population (noise). Note that both means are lower than the mean we would get by taking the mean of the explained variances above, because we used less observation to estimate the model and hence have less power to detect edges.</p>
<p>While the explained variance values are lower in the test set, there is a strong correlation between the explained variance of a node in the training- and the test set</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">cor</span><span class="p">(</span><span class="n">pred_obj_train</span><span class="o">$</span><span class="n">error</span><span class="o">$</span><span class="n">Error</span><span class="p">,</span><span class="w"> </span><span class="n">pred_obj_test</span><span class="o">$</span><span class="n">error</span><span class="o">$</span><span class="n">Error</span><span class="p">)</span><span class="w">
</span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="m">0.8018155</span></code></pre></figure>
<p>which means that if a node has high explained variance in the training set, it tends to also have a high explained variance in the test set.</p>
<h2 id="edit-nov-3rd-the-and--or-or-rule-and-predictability">Edit Nov 3rd: The AND- or OR-rule and Predictability</h2>
<p>In the above example I used the OR-rule to combine estimates in the <a href="http://www.jstor.org/stable/25463463">neighborhood regression approach</a>, without justifying why (thanks to <a href="https://scholar.google.com.br/citations?user=fH6qCDoAAAAJ&hl=en">Wagner de Lara Machado</a> for pointing this out). Here comes the explanation:</p>
<p>In the neighborhood regression approach to graph estimation we pick each node in the graph and regress all other nodes on this node. If we have three nodes <script type="math/tex">x_1</script>, <script type="math/tex">x_2</script>, <script type="math/tex">x_3</script>, this procedure leads to three regression models:</p>
<ol>
<li>
<script type="math/tex; mode=display">x_1 = \beta_{10} + \beta_{12} x_2 + \beta_{13} x_3</script>
</li>
<li>
<script type="math/tex; mode=display">x_2 = \beta_{20} + \beta_{21} x_1 + \beta_{23} x_3</script>
</li>
<li>
<script type="math/tex; mode=display">x_3 = \beta_{30} + \beta_{31} x_1 + \beta_{32} x_2</script>
</li>
</ol>
<p>This procedure leads to two estimates for the edge between <script type="math/tex">x_1</script> and <script type="math/tex">x_2</script>: <script type="math/tex">\beta_{12}</script> from regression (1) and <script type="math/tex">\beta_{21}</script> from regression (2). If both parameters are nonzero, we clearly set the edge between x1 and x2 to present, and if both parameters are zero, we clearly set the edge between <script type="math/tex">x_1</script> and <script type="math/tex">x_2</script> to not present. However, in some cases the two estimates disagree and we need a rule for this situation: The OR-rule sets an edge to be present if <em>at least one</em> of the estimates is nonzero. The AND-rule sets an edge to be present only if <em>both</em> estimates are nonzero.</p>
<p>Now, to compute predictions and hence a measure of predictability we use the regression models 1-3. Let’s take regression model (3), where we predict <script type="math/tex">x_3</script> by <script type="math/tex">x_1</script> and <script type="math/tex">x_2</script>. Now, if the betas agree (<script type="math/tex">\beta_{31}</script> and <script type="math/tex">\beta_{13}</script> agree and <script type="math/tex">\beta_{32}</script> and <script type="math/tex">\beta_{23}</script> agree), everything is fine. But if there is disagreement, we have the following problem:</p>
<ul>
<li>
<p>When using the AND-rule: if let’s say the parameter <script type="math/tex">\beta_{32}</script> is nonzero but <script type="math/tex">\beta_{23}</script> is zero, the AND rule sets the edge-parameter <script type="math/tex">x_3</script>-<script type="math/tex">x_2</script> in the graph to zero; however the parameter <script type="math/tex">\beta_{32}</script> will still be used for estimation of <script type="math/tex">x_3</script>. This leads to a predictability that is too high. Hence we could have a situation in which a node has no connection in the graph (obtained using the AND-rule) but has a nonzero predictability measure.</p>
</li>
<li>
<p>When using the OR-rule: if the parameter <script type="math/tex">\beta_{23}</script> is nonzero but <script type="math/tex">\beta_{32}</script> is zero, the OR-rule sets the edge-parameter <script type="math/tex">x_3</script>-<script type="math/tex">x_2</script> in the graph to be present; however we use the (zero) parameter <script type="math/tex">\beta_{32}</script> in regression (3) for prediction. This leads to a predictability that is too small. Hence we could hae the situation that a node has a connetion in the graph but has a zero predictability measure.</p>
</li>
</ul>
<p>Hence, when using the OR-rule, we <em>underestimate</em> the true predictability given the graph and hence get a <em>conservative</em> estimate of predictability in the graph. This is why I chose the OR-rule above.</p>
<p>Okay, but why don’t we adjust the parameters of the regression models 1-3 by setting parameters to zero (AND-rule) or filling in parameters (OR-rule)? The following example shows that this can’t be done easily, because tinkering with the parameters can destroy the prediction model. We show this for the situation of the AND-rule, where we set the parameter <script type="math/tex">\beta_{32}</script> to zero (because <script type="math/tex">\beta_{23}</script> is zero):</p>
<p>We generate a network of three variables, <script type="math/tex">x_1</script>, <script type="math/tex">x_2</script>, <script type="math/tex">x_3</script>, with edges between <script type="math/tex">x_1</script>-<script type="math/tex">x_3</script> and <script type="math/tex">x_2</script>-<script type="math/tex">x_3</script>, where <script type="math/tex">x_1</script> is continuous and <script type="math/tex">x_2</script>, <script type="math/tex">x_3</script> are binary:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">n</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">60</span><span class="w"> </span><span class="c1"># number of observations
</span><span class="n">set.seed</span><span class="p">(</span><span class="m">7</span><span class="p">)</span><span class="w"> </span><span class="c1"># selected to get the pathological case
</span><span class="n">x</span><span class="m">1</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">rnorm</span><span class="p">(</span><span class="n">n</span><span class="p">)</span><span class="w">
</span><span class="n">x</span><span class="m">2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">rnorm</span><span class="p">(</span><span class="n">n</span><span class="p">)</span><span class="w">
</span><span class="n">x</span><span class="m">3</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">.7</span><span class="o">*</span><span class="n">x</span><span class="m">1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="m">0.3</span><span class="o">*</span><span class="n">x</span><span class="m">2</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="m">0.7</span><span class="o">*</span><span class="n">rnorm</span><span class="p">(</span><span class="n">n</span><span class="p">)</span><span class="w"> </span><span class="c1"># linear combination of x1, x2 plus some noise
</span><span class="w">
</span><span class="c1"># Binarize variable x2, x3
</span><span class="n">x</span><span class="m">2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">scale</span><span class="p">(</span><span class="n">x</span><span class="m">2</span><span class="p">)</span><span class="w">
</span><span class="n">x</span><span class="m">3</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">scale</span><span class="p">(</span><span class="n">x</span><span class="m">3</span><span class="p">)</span><span class="w">
</span><span class="n">x</span><span class="m">2</span><span class="n">b</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="kc">NA</span><span class="p">,</span><span class="w"> </span><span class="n">n</span><span class="p">)</span><span class="w">
</span><span class="n">x</span><span class="m">2</span><span class="n">b</span><span class="p">[</span><span class="n">x</span><span class="m">2</span><span class="o"><</span><span class="w"> </span><span class="m">-.8</span><span class="p">]</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">0</span><span class="w">
</span><span class="n">x</span><span class="m">2</span><span class="n">b</span><span class="p">[</span><span class="n">x</span><span class="m">2</span><span class="o">></span><span class="w"> </span><span class="m">-.8</span><span class="p">]</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">1</span><span class="w">
</span><span class="n">x</span><span class="m">3</span><span class="n">b</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="kc">NA</span><span class="p">,</span><span class="w"> </span><span class="n">n</span><span class="p">)</span><span class="w">
</span><span class="n">x</span><span class="m">3</span><span class="n">b</span><span class="p">[</span><span class="n">x</span><span class="m">3</span><span class="o"><</span><span class="w"> </span><span class="m">-.8</span><span class="p">]</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">0</span><span class="w">
</span><span class="n">x</span><span class="m">3</span><span class="n">b</span><span class="p">[</span><span class="n">x</span><span class="m">3</span><span class="o">></span><span class="w"> </span><span class="m">-.8</span><span class="p">]</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">1</span><span class="w">
</span><span class="n">data</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">cbind</span><span class="p">(</span><span class="n">x</span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">x</span><span class="m">2</span><span class="n">b</span><span class="p">,</span><span class="w"> </span><span class="n">x</span><span class="m">3</span><span class="n">b</span><span class="p">)</span><span class="w"> </span><span class="c1"># Combine in one matrix
</span><span class="w">
</span><span class="c1"># Check marginal probability of 1
</span><span class="n">mean</span><span class="p">(</span><span class="n">x</span><span class="m">2</span><span class="n">b</span><span class="p">)</span><span class="w">
</span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="m">0.7833333</span><span class="w">
</span><span class="n">mean</span><span class="p">(</span><span class="n">x</span><span class="m">3</span><span class="n">b</span><span class="p">)</span><span class="w">
</span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="m">0.7833333</span></code></pre></figure>
<p>We now fit a mixed graphical model:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">mgm</span><span class="p">)</span><span class="w">
</span><span class="n">fit_obj</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">mgmfit</span><span class="p">(</span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data</span><span class="p">,</span><span class="w">
</span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s1">'g'</span><span class="p">,</span><span class="w"> </span><span class="s1">'c'</span><span class="p">,</span><span class="w"> </span><span class="s1">'c'</span><span class="p">),</span><span class="w">
</span><span class="n">lev</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="m">2</span><span class="p">),</span><span class="w">
</span><span class="n">rule.reg</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'AND'</span><span class="p">)</span><span class="w">
</span><span class="n">fit_obj</span><span class="o">$</span><span class="n">wadj</span><span class="w">
</span><span class="p">[,</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="p">[,</span><span class="m">2</span><span class="p">]</span><span class="w"> </span><span class="p">[,</span><span class="m">3</span><span class="p">]</span><span class="w">
</span><span class="p">[</span><span class="m">1</span><span class="p">,]</span><span class="w"> </span><span class="m">0.0000000</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0.7017516</span><span class="w">
</span><span class="p">[</span><span class="m">2</span><span class="p">,]</span><span class="w"> </span><span class="m">0.0000000</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0.0000000</span><span class="w">
</span><span class="p">[</span><span class="m">3</span><span class="p">,]</span><span class="w"> </span><span class="m">0.7017516</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0.0000000</span><span class="w">
</span><span class="n">fit_obj</span><span class="o">$</span><span class="n">mpar.matrix</span><span class="w">
</span><span class="p">[,</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="p">[,</span><span class="m">2</span><span class="p">]</span><span class="w"> </span><span class="p">[,</span><span class="m">3</span><span class="p">]</span><span class="w"> </span><span class="p">[,</span><span class="m">4</span><span class="p">]</span><span class="w"> </span><span class="p">[,</span><span class="m">5</span><span class="p">]</span><span class="w">
</span><span class="p">[</span><span class="m">1</span><span class="p">,]</span><span class="w"> </span><span class="kc">NA</span><span class="w"> </span><span class="kc">NA</span><span class="w"> </span><span class="m">0.0000000</span><span class="w"> </span><span class="kc">NA</span><span class="w"> </span><span class="m">0.7943015</span><span class="w">
</span><span class="p">[</span><span class="m">2</span><span class="p">,]</span><span class="w"> </span><span class="m">0.0000000</span><span class="w"> </span><span class="kc">NA</span><span class="w"> </span><span class="kc">NA</span><span class="w"> </span><span class="kc">NA</span><span class="w"> </span><span class="m">0.0000000</span><span class="w">
</span><span class="p">[</span><span class="m">3</span><span class="p">,]</span><span class="w"> </span><span class="m">0.0000000</span><span class="w"> </span><span class="kc">NA</span><span class="w"> </span><span class="kc">NA</span><span class="w"> </span><span class="kc">NA</span><span class="w"> </span><span class="m">0.0000000</span><span class="w">
</span><span class="p">[</span><span class="m">4</span><span class="p">,]</span><span class="w"> </span><span class="m">-0.6092017</span><span class="w"> </span><span class="kc">NA</span><span class="w"> </span><span class="m">-0.7593007</span><span class="w"> </span><span class="kc">NA</span><span class="w"> </span><span class="kc">NA</span><span class="w">
</span><span class="p">[</span><span class="m">5</span><span class="p">,]</span><span class="w"> </span><span class="m">0.6092017</span><span class="w"> </span><span class="kc">NA</span><span class="w"> </span><span class="m">0.7593007</span><span class="w"> </span><span class="kc">NA</span><span class="w"> </span><span class="kc">NA</span></code></pre></figure>
<p>From the weighted adjacency matrix <code class="highlighter-rouge">fit_obj$wadj</code> we see that there is only one edge present between <script type="math/tex">x_1</script> and <script type="math/tex">x_3</script>. However, when looking at the model parameter matrix <code class="highlighter-rouge">fit_obj$mpar.matrix</code> we see that the parameter <script type="math/tex">\beta_{32}</script> of the regression (3) was actually nonzero, but the edge was set to be absent by the AND-rule because the parameter <script type="math/tex">\beta_{23}</script> in regression (2) was zero (for an explanation of the the model parameter matrix, see <a href="http://jmbh.github.io/Interactions-between-categorical-Variables-in-mixed-graphical-models/">here</a>)</p>
<p>We now do the following: we first go through all steps of using the parameters of regression model (3) to compute predictions for <script type="math/tex">x_3</script>. We will see that these steps lead to the exactly the same predictions as the function <code class="highlighter-rouge">predict.mgm()</code>. Then we modify the regression model according to the graph obtained with the AND-rule and set the regression parameter <script type="math/tex">\beta_{32}</script> to zero - we will see that this ‘destroys’ the parameter scaling and leads to a predictability that is <em>worse than the intercept model</em>.</p>
<p>We first show how to compute predictions for x3 using the <em>unmodified</em> model:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Getting Parameters out of MGM fit object:
</span><span class="n">threshold_0</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">fit_obj</span><span class="o">$</span><span class="n">node.models</span><span class="p">[[</span><span class="m">3</span><span class="p">]]</span><span class="o">$</span><span class="n">coefs</span><span class="p">[[</span><span class="m">1</span><span class="p">]][</span><span class="m">1</span><span class="p">]</span><span class="w">
</span><span class="n">threshold_1</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">fit_obj</span><span class="o">$</span><span class="n">node.models</span><span class="p">[[</span><span class="m">3</span><span class="p">]]</span><span class="o">$</span><span class="n">coefs</span><span class="p">[[</span><span class="m">2</span><span class="p">]][</span><span class="m">1</span><span class="p">]</span><span class="w">
</span><span class="n">beta_c01</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">fit_obj</span><span class="o">$</span><span class="n">node.models</span><span class="p">[[</span><span class="m">3</span><span class="p">]]</span><span class="o">$</span><span class="n">coefs</span><span class="p">[[</span><span class="m">1</span><span class="p">]][</span><span class="m">2</span><span class="p">]</span><span class="w">
</span><span class="n">beta_c02</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">fit_obj</span><span class="o">$</span><span class="n">node.models</span><span class="p">[[</span><span class="m">3</span><span class="p">]]</span><span class="o">$</span><span class="n">coefs</span><span class="p">[[</span><span class="m">1</span><span class="p">]][</span><span class="m">3</span><span class="p">]</span><span class="w">
</span><span class="n">beta_c11</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">fit_obj</span><span class="o">$</span><span class="n">node.models</span><span class="p">[[</span><span class="m">3</span><span class="p">]]</span><span class="o">$</span><span class="n">coefs</span><span class="p">[[</span><span class="m">2</span><span class="p">]][</span><span class="m">2</span><span class="p">]</span><span class="w">
</span><span class="n">beta_c12</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">fit_obj</span><span class="o">$</span><span class="n">node.models</span><span class="p">[[</span><span class="m">3</span><span class="p">]]</span><span class="o">$</span><span class="n">coefs</span><span class="p">[[</span><span class="m">2</span><span class="p">]][</span><span class="m">3</span><span class="p">]</span><span class="w">
</span><span class="c1"># Computing Potentials for each Category
</span><span class="n">potentials_0</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">exp</span><span class="p">(</span><span class="n">threshold_0</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">beta_c01</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">x</span><span class="m">1</span><span class="n">n</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">beta_c02</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">x</span><span class="m">2</span><span class="n">b</span><span class="p">)</span><span class="w">
</span><span class="n">potentials_1</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">exp</span><span class="p">(</span><span class="n">threshold_1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">beta_c11</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">x</span><span class="m">1</span><span class="n">n</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">beta_c12</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">x</span><span class="m">2</span><span class="n">b</span><span class="p">)</span><span class="w">
</span><span class="c1"># Normalize and get Probabilities
</span><span class="n">probability_0</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">potentials_0</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="p">(</span><span class="n">potentials_0</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">potentials_1</span><span class="p">)</span><span class="w">
</span><span class="n">probability_1</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">potentials_1</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="p">(</span><span class="n">potentials_0</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">potentials_1</span><span class="p">)</span><span class="w">
</span><span class="n">probabilities</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">cbind</span><span class="p">(</span><span class="n">probability_0</span><span class="p">,</span><span class="w"> </span><span class="n">probability_1</span><span class="p">)</span><span class="w">
</span><span class="c1"># Predict class
</span><span class="n">x</span><span class="m">3</span><span class="err">_</span><span class="n">predicted</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">apply</span><span class="p">(</span><span class="n">probabilities</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">which.max</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="c1"># minus one to get to original labels 0/1
</span><span class="w">
</span><span class="c1"># just for checking: do same with predict.mgm() function
</span><span class="n">x</span><span class="m">3</span><span class="err">_</span><span class="n">predicted_mgm</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">as.numeric</span><span class="p">(</span><span class="n">predict</span><span class="p">(</span><span class="n">fit_obj</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="p">)</span><span class="o">$</span><span class="n">pred</span><span class="p">[,</span><span class="m">3</span><span class="p">])</span><span class="w">
</span><span class="n">mean</span><span class="p">(</span><span class="n">x</span><span class="m">3</span><span class="err">_</span><span class="n">predicted</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">x</span><span class="m">3</span><span class="err">_</span><span class="n">predicted_mgm</span><span class="p">)</span><span class="w"> </span><span class="c1"># exactly the same
</span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="m">1</span><span class="w">
</span><span class="c1"># compute % correct classification (accuracy):
</span><span class="n">mean</span><span class="p">(</span><span class="n">x</span><span class="m">3</span><span class="err">_</span><span class="n">predicted</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">x</span><span class="m">3</span><span class="n">b</span><span class="p">)</span><span class="w">
</span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="m">0.85</span></code></pre></figure>
<p>We get an accuracy of 0.85. Note that the intercept model alone would already give us an accuracy of 0.78 (see above). Note that here we dropped the subscript for the betas indicating that we predict <script type="math/tex">x_3</script>. Instead we add a subscript <script type="math/tex">c0</script>, <script type="math/tex">c1</script> to indicate the predicted category. Also note that <script type="math/tex">\beta_{c02}</script> and <script type="math/tex">\beta_{c12}</script> correspond to <script type="math/tex">\beta_{32}</script> in the above notation; we have two parameters, because we have a binary predictor (for details about this symmetric approach to multinomial regression, see the <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2929880/">glmnet paper</a>). We now set the parameters between <script type="math/tex">x_2</script> and <script type="math/tex">x_3</script> (<script type="math/tex">\beta_{c02}</script> and <script type="math/tex">\beta_{c12}</script>) to zero and compute predictions in exactly the same way as before:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Getting Parameters out of MGM:
</span><span class="n">threshold_0</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">fit_obj</span><span class="o">$</span><span class="n">node.models</span><span class="p">[[</span><span class="m">3</span><span class="p">]]</span><span class="o">$</span><span class="n">coefs</span><span class="p">[[</span><span class="m">1</span><span class="p">]][</span><span class="m">1</span><span class="p">]</span><span class="w">
</span><span class="n">threshold_1</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">fit_obj</span><span class="o">$</span><span class="n">node.models</span><span class="p">[[</span><span class="m">3</span><span class="p">]]</span><span class="o">$</span><span class="n">coefs</span><span class="p">[[</span><span class="m">2</span><span class="p">]][</span><span class="m">1</span><span class="p">]</span><span class="w">
</span><span class="n">beta_c01</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">fit_obj</span><span class="o">$</span><span class="n">node.models</span><span class="p">[[</span><span class="m">3</span><span class="p">]]</span><span class="o">$</span><span class="n">coefs</span><span class="p">[[</span><span class="m">1</span><span class="p">]][</span><span class="m">2</span><span class="p">]</span><span class="w">
</span><span class="n">beta_c11</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">fit_obj</span><span class="o">$</span><span class="n">node.models</span><span class="p">[[</span><span class="m">3</span><span class="p">]]</span><span class="o">$</span><span class="n">coefs</span><span class="p">[[</span><span class="m">2</span><span class="p">]][</span><span class="m">2</span><span class="p">]</span><span class="w">
</span><span class="c1"># Computing Potentials for each Category
</span><span class="n">potentials_0</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">exp</span><span class="p">(</span><span class="n">threshold_0</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">beta_c01</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">x</span><span class="m">1</span><span class="n">n</span><span class="p">)</span><span class="w"> </span><span class="c1"># predictor x2 deleted
</span><span class="n">potentials_1</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">exp</span><span class="p">(</span><span class="n">threshold_1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">beta_c11</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">x</span><span class="m">1</span><span class="n">n</span><span class="p">)</span><span class="w"> </span><span class="c1"># predictor x2 deleted
</span><span class="w">
</span><span class="c1"># Normalize and get Probabilities
</span><span class="n">probability_0</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">potentials_0</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="p">(</span><span class="n">potentials_0</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">potentials_1</span><span class="p">)</span><span class="w">
</span><span class="n">probability_1</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">potentials_1</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="p">(</span><span class="n">potentials_0</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">potentials_1</span><span class="p">)</span><span class="w">
</span><span class="n">probabilities</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">cbind</span><span class="p">(</span><span class="n">probability_0</span><span class="p">,</span><span class="w"> </span><span class="n">probability_1</span><span class="p">)</span><span class="w">
</span><span class="c1"># Predict class
</span><span class="n">x</span><span class="m">3</span><span class="err">_</span><span class="n">predicted</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">apply</span><span class="p">(</span><span class="n">probabilities</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">which.max</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="c1"># minus one to get to original labels 0/1
</span><span class="w">
</span><span class="c1"># compute % correct classification:
</span><span class="n">mean</span><span class="p">(</span><span class="n">x</span><span class="m">3</span><span class="err">_</span><span class="n">predicted</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">x</span><span class="m">3</span><span class="n">b</span><span class="p">)</span><span class="w">
</span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="m">0.75</span></code></pre></figure>
<p>We see that we get an accuracy of .75, which is <em>lower</em> than the accuracy we would expect from the intercept model (0.78). However, we <em>should</em> get a higher accuracy than 0.78, because we know that <script type="math/tex">x_1</script> <em>is</em> a predictor of <script type="math/tex">x_3</script>. This shows that we cannot simply delete parameters from a regression model. We could show a similar example by adding nonzero predictors.</p>
<p>A possible way around this would be to take the estimated graph and then re-estimate the graph (by performing p regressions) but only use those variables as predictors that were connected to the predicted node in the initial graph. However, this 2-stage procedure would lead to a (possibly) completely different scaling for the estimation of each of the neighborhoods of the different nodes. This is likely to lead to an algorithm that does not consistently recover the true graph/network.</p>
Tue, 01 Nov 2016 00:00:00 +0000
http://jmbh.github.io//Predictability-in-network-models/
http://jmbh.github.io//Predictability-in-network-models/Graphical Analysis of German Parliament Voting Pattern<p>We use network visualizations to look into the voting patterns in the current German parliament. I downloaded the data <a href="https://www.bundestag.de/abstimmung">here</a> and all figures can be reproduced using the R code available on <a href="https://github.com/jmbh/bundestag">Github</a>.</p>
<p>Missing values, invalid votes, abstention from voting and not showing up for the vote weres coded as (-1), such that all other responses are a yes (1) or no (2) vote. We use pearson correlation as a measure of voting similarity and voting behavior coded as (-1) is regarded as noise in the dataset. 36 of the 659 members of parliament were removed from the data because more than 50% of the votes were coded as (-1). The reason was that they either joined or left the parliament during the analyzed time period.</p>
<p><em>Disclaimer: note that only for a fraction of the bills passed in the German parliament votes are recorded (and used here) and that relations between single members of parliaments might be artifacts of the noise-coding. Moreover, the data is quite scarce (136 bills). Therefore we should not draw any strong conclusions from this coarse-grained analysis.</em></p>
<h2 id="voting-pattern-amongst-members-of-parliament">Voting Pattern Amongst Members of Parliament</h2>
<p>We first compute the correlations between the voting behavior of all pairs of members of parliament, which gives us a 623 x 623 correlation matrix. We then visualize this correlation matrix using the force-directed <a href="https://en.wikipedia.org/wiki/Force-directed_graph_drawing">Fruchterman Reingold algorithm</a> as implemented in the <a href="https://cran.r-project.org/web/packages/qgraph/index.html">qgraph package</a>. This algorithm puts nodes (politicians) on the plane such that edges (connections) have comparable length and that edges are crossing as little as possible.</p>
<p><img src="http://jmbh.github.io/figs/bundestag/bundestag_cor_full.jpg" alt="center" /></p>
<p>(For readers on R-Bloggers.com: <a href="http://jmbh.github.io/Analyzing-voting-pattern-of-German-parliament/">click here for the original post with larger figures.</a>)</p>
<p>Green edges indicate positive correlations (voter agreement) and red edges indicate negative correlations (voter disagreement). The width of the edges is proportional to the strength (absolute value) of the correlation. We see that the green party (B90/GRUENE) clusters together, as well as the left party (DIE LINKE). The third and biggest cluster consists of members of the two largest parties, the social democrats (SPD) and the conservatives (CDU/CSU). This is the structure we would expect intuitively, as social democrats and conservatives currently form the government in a grand coalition.</p>
<p>With some imagination, one could also identify a couple of subclusters in this large cluster. A detailed analysis of smaller clusters would be especially interesting if we had additional information about politicians. We could then see whether the cluster assignment computed from the voting behavior relates to these additional variables. For instance, politicians with close ties to the economy might vote together, irrespective of their party.</p>
<p>So far we assumed that we can adequately describe the voting pattern of the whole period from 26.11.2013 - 14.04.2016 with one graph. This implies that we assume that the relative voting behavior does not change over time. For example, this means that if members of parliament A and B agree on votes at the beginning of the period, they also agree throughout the rest of the period and do not start to disagree at some point. In the next section we check whether the voting behavior changes over time.</p>
<h2 id="voting-pattern-amongst-members-of-parliament-across-time">Voting Pattern Amongst Members of Parliament across Time</h2>
<p>To make graphs comparable over different time points and to be able to see growing (dis-) agreement between parties, we arrange individual members of parliament in circles that correspond to their parties. We compute a time-varying graph by visualizing a Gaussian kernel smoothed (bandwidth = .1, time interval [0,1]) correlation matrix at 20 equally spaced time points. Details can be found in the code used to create all figures, which is available <a href="https://github.com/jmbh/bundestag">here</a>. We then combine these 20 graphs into the following video:</p>
<p><img src="http://jmbh.github.io/figs/bundestag/bundestag_cor.gif" alt="center" /></p>
<p>We see that right after the time the parliament was elected and the big coalition was formed in November 2013, there is relatively high agreement between members of CDU/CSU and SPD. Within the next three years, however, the agreement decreasees. With regards to the parties in the opposition, at the beginning of the period the green and the left party disagree to a similar degree with the grand coalition. Over time, however, it appears that the green party increasingly agrees with the grand coalition, while the left party agrees less and less with the CDU/CSU- and SPD-led government.</p>
<p>As the number of seats the parties have in the parliament differs widely, it is hard to read agreement <em>within</em> parties from the above graph. For instance, the cycle of CDU/CSU seems to be filled with more and thicker green edges than the one of SPD, however, this could well be because there are simply more politicians (307 vs. 191) and hence more edges displayed. Therefore, we have a closer look at within-party agreement in the following graph:</p>
<center><img src="http://jmbh.github.io/figs/bundestag/bundestag_agreement_time.jpg" width="400" height="350" /></center>
<p>Collapsed over time we see the members of the left party agree most with each other and the members of the social democratic party agree the least with each other. The largest changes in agreement appear in the green and left party: from late 2014 to mid 2015, members of the green party seem to agree less with each other than usual, while members of the left party seem to agree more with each other than usual.</p>
<h2 id="zoom-in-on-small-group-of-members-of-parliament">Zoom in on small Group of Members of Parliament</h2>
<p>While the analyses so far gave a comprehensive <em>overview</em> of the voting behavior amongst members of parliament, the graph is too large to see which node in the graph corresponds to which politician. In the following graph we zoom in on a random subset of 30 politicians and match the nodes to their names:</p>
<p><img src="http://jmbh.github.io/figs/bundestag/bundestag_cor_ss_names.jpg" alt="center" /></p>
<p>Note that correlations are bivariate measures and therefore the correlations in this smaller graph are the same as the ones in the larger graph above. We see the same overall structure as above, but now with names assigned to nodes. Again the members of the green party cluster together, but for instance Nicole Maisch votes more often together with Steffi Lempke than with the other displayed colleagues. We also see that for instance Steffen Kampeter and Christian Schmidt are both members of the convervative party, however are placed at quite distant locations in the graph (and indeed the correlation between their voting behavior is almost zero: -0.04).</p>
<p>Analogous to above, we now look into how voting agreement between the politicians in our subset changes over time by computing a time-varying graph as before:</p>
<p><img src="http://jmbh.github.io/figs/bundestag/bundestag_cor_ss.gif" alt="center" /></p>
<p>We see that voting agreement changes substantially: for instance members of the opposition parties seem to agree less and less with the grand coalition until mid-2015 and then agree again more and more until the end of the period in early 2016. Some politicians seem to change their voting pattern quite dramatically: for example the voting behavior of conserviative party member Heike Bremer strongly correlates with the voting behavior of most of her party colleagues in 2014, however in late 2015 and early 2016 the correlations are close to zero. Also, interestingly, the voting behavior of conservative Steffen Kampeter tends to vote in the opposite direction than his conservative colleagues in early 2014, but then agrees more and more with them until the last recorded votes.</p>
<h2 id="unique-agreement-between-members-of-parliament">‘Unique’ Agreement between Members of Parliament</h2>
<p>So far we looked into how the voting patterns of any pair of members of parliaments correlate with each other. While this is an informative measure and gives a first overview of how politicians vote relative to each other, it is also a measure that is tricky to interpret. For instance two politicians of a party might always vote together because they always align their votes with their common mentor in the party. Or because there is pressure from the whole party to vote for a bill together. Or because they are both members of a specific think tank within the parliament, …</p>
<p>An interesting alternative measure is conditional correlation, which is the correlation between any two members of parliament, <em>after controlling for all other members of parliament</em>. In case of a conditional correlation between two members of parliament there are still many possible explanations (e.g. both might be influenced by some person <em>outside</em> the parliament), however, we are sure that this correlation cannot be explained by the voting pattern by any other member of parliament. We compute this conditional correlation graph and visualize it using the same layout as in the corresponding correlation graph:</p>
<p><img src="http://jmbh.github.io/figs/bundestag/bundestag_cond_ss_names.jpg" alt="center" /></p>
<p>It is apparent that there are less edges and less strong edges. Note that this is what we would expect in this dataset: in a parliament there is a general level of agreement within parties and also between parties, otherwise it would be difficult to pass bills. Therefore, we would expect that a substantial part of a correlation between the voting pattern between any two politicians can be explained by the voting patterns of other politicians. The strongest conditional correlations is the one between Nicole Gohlke and Norbert Mueller of the left party. For some reason these two politicians align their votes in a way that cannot be explained by the voting pattern of other politicians within and outside their party. Note here that</p>
<h2 id="concluding-comments">Concluding comments</h2>
<p>It came as quite a surprise to me that the large majority of votes on bills in the German parliament are not recorded and hence not available to the public (please correct me if I missed something). While this is a major reason to interpret these data with caution, on the other hand the votes on bills that <em>are</em> recorded are the more controversial and therefore probably more interesting ones.</p>
<p>The graphs in this post were the first few obvious things I wanted to look into, but of course many more analyses are possible. I put the preprocessed data (no information lost, just everyting in 3 linked files instead of hundreds) on <a href="https://github.com/jmbh/bundestag">Github</a> alongside with the code that produces the above figures. In case you have any comments, complaints or questions, please comment below!</p>
Wed, 18 May 2016 00:00:00 +0000
http://jmbh.github.io//Analyzing-voting-pattern-of-German-parliament/
http://jmbh.github.io//Analyzing-voting-pattern-of-German-parliament/Interactions between Categorical Variables in Mixed Graphical Models<p>In a <a href="http://jmbh.github.io/Estimation-of-mixed-graphical-models/">previous post</a> we recovered the conditional independence structure in a dataset of <em>mixed variables</em> describing different aspects of the life of individuals diagnosed with Autism Spectrum Disorder, using the <a href="https://cran.r-project.org/web/packages/mgm/index.html">mgm package</a>. While depicting the independence structure in multivariate data set gives a first overview of the relations between variables, in most applications we interested in the exact parameter estimates. For instance, for interactions between continuous variables, we would like to know the sign and the size of parameters - i.e., if the nodes in the graph are positively or negatively related, and how strong these associations are. In the case of interactions between categorical variables, we are interested in the signs and sizes of the set of parameters that describes the exact non-linear relationship between variables.</p>
<p>In this post, we take the analysis a step further and show how to use the output of the <a href="https://cran.r-project.org/web/packages/mgm/index.html">mgm package</a> to take a closer look at the recovered dependencies. Specifically, we will recover the sign and weight of interaction parameter between continuous variables and zoom into interactions between categorical and continuous variables and between two categorical variables. Both the dataset and the code are available on <a href="https://github.com/jmbh/AutismData">Github</a>.</p>
<p>We start out with the conditional dependence graph estimated in the previous post, however, now with <em>variables grouped by their type</em>:</p>
<p><img src="http://jmbh.github.io/figs/2017-11-30-Closer-Look/Autism_VarTypes.jpg" alt="center" /></p>
<p>We obtained this graph by fitting a mixed graphical model using the mgmfit() function as in the <a href="http://jmbh.github.io/Estimation-of-mixed-graphical-models/">previous post</a>:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># load data; available on Github
</span><span class="n">datalist</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">readRDS</span><span class="p">(</span><span class="s1">'autism_datalist.RDS'</span><span class="p">)</span><span class="w">
</span><span class="n">data</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">datalist</span><span class="o">$</span><span class="n">data</span><span class="w">
</span><span class="n">type</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">datalist</span><span class="o">$</span><span class="n">type</span><span class="w">
</span><span class="n">lev</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">datalist</span><span class="o">$</span><span class="n">lev</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">devtools</span><span class="p">)</span><span class="w">
</span><span class="n">install_github</span><span class="p">(</span><span class="s1">'jmbh/mgm'</span><span class="p">)</span><span class="w"> </span><span class="c1"># we need version 1.1-6
</span><span class="n">library</span><span class="p">(</span><span class="n">mgm</span><span class="p">)</span><span class="w">
</span><span class="n">fit</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">mgmfit</span><span class="p">(</span><span class="n">data</span><span class="p">,</span><span class="w"> </span><span class="n">type</span><span class="p">,</span><span class="w"> </span><span class="n">lev</span><span class="p">,</span><span class="w"> </span><span class="n">lambda.sel</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"EBIC"</span><span class="p">,</span><span class="w"> </span><span class="n">d</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">2</span><span class="p">)</span></code></pre></figure>
<h2 id="display-edge-weights-and-signs">Display Edge Weights and Signs</h2>
<p>We now also display the weights of the dependencies. In addition, for interactions between continuous (Gaussian, Poisson) variables, we are able determine the sign of the dependency, as it only depends on one parameter. The signs are saved in <code class="highlighter-rouge">fit$signs</code>. To make plotting easier, there is also a matrix <code class="highlighter-rouge">fit$edgecolor</code>, which gives colors to positive (green), negative (red) and undefined (grey) edge signs.</p>
<p>Now, to plot the weighted adjacency matrix with signs (where defined), we give <code class="highlighter-rouge">fit$edgecolor</code> as input to the argument edge.color in <a href="https://cran.r-project.org/web/packages/qgraph/index.html">qgraph</a> and plot the weighted adjacency matrix <code class="highlighter-rouge">fit$wadj</code> instead of the unweighted adjacency matrix <code class="highlighter-rouge">fit$adj</code>:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">devtools</span><span class="p">)</span><span class="w">
</span><span class="n">install_github</span><span class="p">(</span><span class="s1">'SachaEpskamp/qgraph'</span><span class="p">)</span><span class="w"> </span><span class="c1"># we need version 1.3.3
</span><span class="n">library</span><span class="p">(</span><span class="n">qgraph</span><span class="p">)</span><span class="w">
</span><span class="c1"># define variable types
</span><span class="n">groups_typeV</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="s2">"Gaussian"</span><span class="o">=</span><span class="n">which</span><span class="p">(</span><span class="n">datalist</span><span class="o">$</span><span class="n">type</span><span class="o">==</span><span class="s1">'g'</span><span class="p">),</span><span class="w">
</span><span class="s2">"Poisson"</span><span class="o">=</span><span class="n">which</span><span class="p">(</span><span class="n">datalist</span><span class="o">$</span><span class="n">type</span><span class="o">==</span><span class="s1">'p'</span><span class="p">),</span><span class="w">
</span><span class="s2">"Categorical"</span><span class="o">=</span><span class="n">which</span><span class="p">(</span><span class="n">datalist</span><span class="o">$</span><span class="n">type</span><span class="o">==</span><span class="s1">'c'</span><span class="p">))</span><span class="w">
</span><span class="c1"># pick some nice colors
</span><span class="n">group_col</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"#72CF53"</span><span class="p">,</span><span class="w"> </span><span class="s2">"#53B0CF"</span><span class="p">,</span><span class="w"> </span><span class="s2">"#ED3939"</span><span class="p">)</span><span class="w">
</span><span class="n">jpeg</span><span class="p">(</span><span class="s2">"Autism_VarTypes.jpg"</span><span class="p">,</span><span class="w"> </span><span class="n">height</span><span class="o">=</span><span class="m">2</span><span class="o">*</span><span class="m">900</span><span class="p">,</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="m">2</span><span class="o">*</span><span class="m">1300</span><span class="p">,</span><span class="w"> </span><span class="n">unit</span><span class="o">=</span><span class="s1">'px'</span><span class="p">)</span><span class="w">
</span><span class="n">qgraph</span><span class="p">(</span><span class="n">fit</span><span class="o">$</span><span class="n">wadj</span><span class="p">,</span><span class="w"> </span><span class="c1"># weighted adjacency matrix in model fit object
</span><span class="w"> </span><span class="n">vsize</span><span class="o">=</span><span class="m">3.5</span><span class="p">,</span><span class="w">
</span><span class="n">esize</span><span class="o">=</span><span class="m">5</span><span class="p">,</span><span class="w">
</span><span class="n">layout</span><span class="o">=</span><span class="w"> </span><span class="s1">'spring'</span><span class="p">,</span><span class="w"> </span><span class="c1"># to get the exact same layout as above take it from qgraph object of the earlier post
</span><span class="w"> </span><span class="n">edge.color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fit</span><span class="o">$</span><span class="n">edgecolor</span><span class="p">,</span><span class="w">
</span><span class="n">color</span><span class="o">=</span><span class="n">group_col</span><span class="p">,</span><span class="w">
</span><span class="n">border.width</span><span class="o">=</span><span class="m">1.5</span><span class="p">,</span><span class="w">
</span><span class="n">border.color</span><span class="o">=</span><span class="s2">"black"</span><span class="p">,</span><span class="w">
</span><span class="n">groups</span><span class="o">=</span><span class="n">groups_typeV</span><span class="p">,</span><span class="w">
</span><span class="n">nodeNames</span><span class="o">=</span><span class="n">datalist</span><span class="o">$</span><span class="n">colnames</span><span class="p">,</span><span class="w">
</span><span class="n">legend</span><span class="o">=</span><span class="kc">TRUE</span><span class="p">,</span><span class="w">
</span><span class="n">legend.mode</span><span class="o">=</span><span class="s2">"style2"</span><span class="p">,</span><span class="w">
</span><span class="n">legend.cex</span><span class="o">=</span><span class="m">1.5</span><span class="p">)</span><span class="w">
</span><span class="n">dev.off</span><span class="p">()</span></code></pre></figure>
<p>This gives us the following figure:</p>
<p><img src="http://jmbh.github.io/figs/2017-11-30-Closer-Look/Autism_VarTypes_WeightAndSign.jpg" alt="center" /></p>
<p>Red edges correspond to negative edge weights and green edge weights correspond to positive edge weights. The width of the edges is proportional to the absolut value of the parameter weight. Grey edges connect categorical variables to continuous variables or to other categorical variables and are computed from more than one parameter and thus we cannot assign a sign to these edges.</p>
<p>While the interaction between continuous variables can be interpreted as a conditional covariance similar to the well-known multivariate Gaussian case, the interpretation of edge-weights involving categorical variables is more intricate as they are comprised of several parameters.</p>
<h2 id="interpretation-of-interaction-continuous---categorical">Interpretation of Interaction: Continuous - Categorical</h2>
<p>We first consider the edge weight between the continuous Gaussian variable ‘Working hours’ and the categorical variable ‘Type of Work’, which has the categories (1) No work, (2) Supervised work, (3) Unpaid work and (4) Paid work. We get the estimated parameters behind this edge weight from the matrix of all estimated parameters in the mixed graphical model <code class="highlighter-rouge">fit$mpar.matrix</code>:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">matrix</span><span class="p">(</span><span class="n">fit</span><span class="o">$</span><span class="n">mpar.matrix</span><span class="p">[</span><span class="n">fit</span><span class="o">$</span><span class="n">par.labels</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">16</span><span class="p">,</span><span class="w"> </span><span class="n">fit</span><span class="o">$</span><span class="n">par.labels</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">17</span><span class="p">],</span><span class="w"> </span><span class="n">ncol</span><span class="o">=</span><span class="m">1</span><span class="p">)</span><span class="w">
</span><span class="p">[,</span><span class="m">1</span><span class="p">]</span><span class="w">
</span><span class="p">[</span><span class="m">1</span><span class="p">,]</span><span class="w"> </span><span class="m">-3.7051782</span><span class="w">
</span><span class="p">[</span><span class="m">2</span><span class="p">,]</span><span class="w"> </span><span class="m">0.0000000</span><span class="w">
</span><span class="p">[</span><span class="m">3</span><span class="p">,]</span><span class="w"> </span><span class="m">0.0000000</span><span class="w">
</span><span class="p">[</span><span class="m">4</span><span class="p">,]</span><span class="w"> </span><span class="m">0.5059143</span></code></pre></figure>
<p><code class="highlighter-rouge">fit$par.labels</code> indicates which parameters in <code class="highlighter-rouge">fit$mpar.matrix</code> belong to the interaction between which two variables. Note that in the case of jointly Gaussian data, <code class="highlighter-rouge">fit$mpar.matrix</code> is equivalent to the inverse covariance matrix and each interaction would be represented by 1 value only.</p>
<p>The four values we got from the model parameter matrix represent the interactions of the continuous variable ‘Working hours’ with each of the categories of ‘Type of work’. These can be interpreted in a straight forward way of incraesing/decreasing the probability of a category depending on ‘Working hours’. We see that the probability of category (a) ‘No work’ is greatly decreased by an increase of ‘Working hours’. This makes sense as somebody who does not work has to work 0 hours. Next, working hours seem not to predict the probability of categories (b) ‘Supervised work’ and (c) ‘Unpaid work’. However, increasing working hours does increase the probabilty of category (d) ‘Paid work’, which indicates that individuals who get paid for their work, work longer hours. Note that these interactions are unique in the sense that the influence of all other variables is partialed out!</p>
<h2 id="interpretation-of-interaction-categorical---categorical">Interpretation of Interaction: Categorical - Categorical</h2>
<p>Next we consider the edge weight between the categorical variables (14) ‘Type of Housing’ and the variable (15) ‘Type of Work’ from above. ‘Type of Housing’ has to categories, (a) ‘Not independent’ and (b) ‘Independent’. As in the previous example, we take the relevant parameters from the model parameter matrix:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">fit</span><span class="o">$</span><span class="n">mpar.matrix</span><span class="p">[</span><span class="n">fit</span><span class="o">$</span><span class="n">par.labels</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">14</span><span class="p">,</span><span class="w"> </span><span class="n">fit</span><span class="o">$</span><span class="n">par.labels</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">16</span><span class="p">]</span><span class="w">
</span><span class="p">[,</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="p">[,</span><span class="m">2</span><span class="p">]</span><span class="w"> </span><span class="p">[,</span><span class="m">3</span><span class="p">]</span><span class="w"> </span><span class="p">[,</span><span class="m">4</span><span class="p">]</span><span class="w">
</span><span class="p">[</span><span class="m">1</span><span class="p">,]</span><span class="w"> </span><span class="kc">NA</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">-0.5418989</span><span class="w">
</span><span class="p">[</span><span class="m">2</span><span class="p">,]</span><span class="w"> </span><span class="kc">NA</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0.5418989</span></code></pre></figure>
<p>Again, the rows represent the categories of variable (14) ‘Type of Housing’. The columns indicate how the different catgories of variable (16) ‘Type of Work’ predict the probability of these categories. The first column is the dummy category ‘No work’. The parameters can therefore be interpreted as follows:</p>
<p>Having supervised or unpaid work, does not predict a probability of living independently or not that is different for individuals with no work. Having paid work, however, decreases the probability of living not independently and increases the probability of living independently, compared to the reference category ‘no work’.</p>
<p>The interpretations above correspond to the typical interpretation of parameters in a multinomial regression model, which is indeed what is used in the node wise regression approach we use in the mgm packge to estimate mixed graphical models. For details about the exact parameterization of the multinomial regression model check chapter 4 in the <a href="http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2929880/pdf/nihms201118.pdf">glmnet paper</a>. Note that because we use the node wise regression approach, we could also look at how the categories in (16) ‘Type of work’ predict (17) ‘Working hours’ or how the categories of (14) ‘Type of housing’ predict the probabilities of (16) ‘Type of Work’. These parameters can be obtained by exchanging the row indices with the column indices when subsetting <code class="highlighter-rouge">fit$mpar.matrix</code>. For an elaborate explanation of the node wise regresssion approach and the exact structure of the model parameter matrix please check the <a href="http://arxiv.org/pdf/1510.06871v2.pdf">mgm paper</a>.</p>
Fri, 29 Apr 2016 00:00:00 +0000
http://jmbh.github.io//Interactions-between-categorical-Variables-in-mixed-graphical-models/
http://jmbh.github.io//Interactions-between-categorical-Variables-in-mixed-graphical-models/Estimating Mixed Graphical Models<p>Determining conditional independence relationships through undirected graphical models is a key component in the statistical analysis of complex obervational data in a wide variety of disciplines. In many situations one seeks to estimate the underlying graphical model of a dataset that includes <em>variables of different domains</em>.</p>
<p>As an example, take a typical dataset in the social, behavioral and medical sciences, where one is interested in interactions, for example between gender or country (categorical), frequencies of behaviors or experiences (count) and the dose of a drug (continuous). Other examples are Internet-scale marketing data or high-throughput sequencing data.</p>
<p>There are methods available to estimate mixed graphical models from mixed continuous data, however, these usually have two drawbacks: first, there is a possible information loss due to necessary transformations and second, they cannot incorporate (nominal) categorical variables (for an overview see <a href="http://arxiv.org/abs/1510.05677">here</a>). A <a href="http://arxiv.org/abs/1510.06871">new method</a> implemented in the R-package <a href="https://cran.r-project.org/web/packages/mgm/index.html">mgm</a> addresses these limitations.</p>
<p>In the following, we use the mgm-package to estimate the conditional independence network in a dataset of questionnaire responses of individuals diagnosed with Autism Spectrum Disorder. This dataset includes variables of different domains, such as age (continuous), type of housing (categorical) and number of treatments (count).</p>
<p>The dataset consists of responses of 3521 individuals to a questionnaire including 28 variables of domains continuous, count and categorical and is available <a href="https://github.com/jmbh/AutismData">here</a>.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">datalist</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">readRDS</span><span class="p">(</span><span class="s1">'autism_datalist.RDS'</span><span class="p">)</span><span class="w">
</span><span class="n">data</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">datalist</span><span class="o">$</span><span class="n">data</span><span class="w">
</span><span class="n">type</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">datalist</span><span class="o">$</span><span class="n">type</span><span class="w">
</span><span class="n">lev</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">datalist</span><span class="o">$</span><span class="n">lev</span><span class="w">
</span><span class="o">></span><span class="w"> </span><span class="nf">dim</span><span class="p">(</span><span class="n">data</span><span class="p">)</span><span class="w">
</span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="m">3521</span><span class="w"> </span><span class="m">28</span><span class="w">
</span><span class="o">></span><span class="w"> </span><span class="nf">round</span><span class="p">(</span><span class="n">data</span><span class="p">[</span><span class="m">1</span><span class="o">:</span><span class="m">4</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="m">5</span><span class="p">],</span><span class="m">2</span><span class="p">)</span><span class="w">
</span><span class="n">sex</span><span class="w"> </span><span class="n">IQ</span><span class="w"> </span><span class="n">agediagnosis</span><span class="w"> </span><span class="n">opennessdiagwp</span><span class="w"> </span><span class="n">successself</span><span class="w">
</span><span class="p">[</span><span class="m">1</span><span class="p">,]</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="m">6</span><span class="w"> </span><span class="m">-0.96</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="m">2.21</span><span class="w">
</span><span class="p">[</span><span class="m">2</span><span class="p">,]</span><span class="w"> </span><span class="m">2</span><span class="w"> </span><span class="m">6</span><span class="w"> </span><span class="m">-0.52</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="m">6.11</span><span class="w">
</span><span class="p">[</span><span class="m">3</span><span class="p">,]</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="m">5</span><span class="w"> </span><span class="m">-0.71</span><span class="w"> </span><span class="m">2</span><span class="w"> </span><span class="m">5.62</span><span class="w">
</span><span class="p">[</span><span class="m">4</span><span class="p">,]</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="m">6</span><span class="w"> </span><span class="m">-0.45</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="m">8.00</span></code></pre></figure>
<p>We used our knowledge about the variables to specify the domain (type) of each variable and the number of categories for categorical variables (for non-categorical variables we choose 1). “c”, “g”, “p” stands for categorical, Gaussian and Poisson (count), respectively:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="o">></span><span class="w"> </span><span class="n">type</span><span class="w">
</span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="s2">"c"</span><span class="w"> </span><span class="s2">"g"</span><span class="w"> </span><span class="s2">"g"</span><span class="w"> </span><span class="s2">"c"</span><span class="w"> </span><span class="s2">"g"</span><span class="w"> </span><span class="s2">"c"</span><span class="w"> </span><span class="s2">"c"</span><span class="w"> </span><span class="s2">"p"</span><span class="w"> </span><span class="s2">"p"</span><span class="w"> </span><span class="s2">"p"</span><span class="w"> </span><span class="s2">"p"</span><span class="w"> </span><span class="s2">"p"</span><span class="w"> </span><span class="s2">"p"</span><span class="w">
</span><span class="p">[</span><span class="m">14</span><span class="p">]</span><span class="w"> </span><span class="s2">"c"</span><span class="w"> </span><span class="s2">"p"</span><span class="w"> </span><span class="s2">"c"</span><span class="w"> </span><span class="s2">"g"</span><span class="w"> </span><span class="s2">"p"</span><span class="w"> </span><span class="s2">"p"</span><span class="w"> </span><span class="s2">"p"</span><span class="w"> </span><span class="s2">"p"</span><span class="w"> </span><span class="s2">"g"</span><span class="w"> </span><span class="s2">"g"</span><span class="w"> </span><span class="s2">"g"</span><span class="w"> </span><span class="s2">"g"</span><span class="w"> </span><span class="s2">"g"</span><span class="w">
</span><span class="p">[</span><span class="m">27</span><span class="p">]</span><span class="w"> </span><span class="s2">"c"</span><span class="w"> </span><span class="s2">"g"</span><span class="w">
</span><span class="o">></span><span class="w"> </span><span class="n">lev</span><span class="w">
</span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="m">2</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="m">2</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="m">5</span><span class="w"> </span><span class="m">3</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="m">2</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="m">4</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="m">3</span><span class="w">
</span><span class="p">[</span><span class="m">28</span><span class="p">]</span><span class="w"> </span><span class="m">1</span></code></pre></figure>
<p>The estimation algorithm requires us to make an assumption about the highest order interaction in the true graph. Here we assume that there are at most pairwise interactions in the true graph and set d = 2. The algorithm includes an L1-penalty to obtain a sparse estimate. We can select the regularization parameter lambda using cross validation (CV) or the Extended Bayesian Information Criterion (EBIC). Here, we choose the EBIC, which is known to be a bit more conservative than CV but is computationally faster.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">mgm</span><span class="p">)</span><span class="w">
</span><span class="n">fit</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">mgmfit</span><span class="p">(</span><span class="n">data</span><span class="p">,</span><span class="w"> </span><span class="n">type</span><span class="p">,</span><span class="w"> </span><span class="n">cat</span><span class="p">,</span><span class="w"> </span><span class="n">lambda.sel</span><span class="o">=</span><span class="s2">"EBIC"</span><span class="p">,</span><span class="w"> </span><span class="n">d</span><span class="o">=</span><span class="m">2</span><span class="p">)</span></code></pre></figure>
<p>The fit function returns all estimated parameters and a weighted and unweighted (binarized) adjacency matrix. Here we use the <a href="http://www.jstatsoft.org/article/view/v048i04/v48i04.pdf">qgraph</a> package to visualize the graph:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># define group labels
</span><span class="n">groups_type</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="s2">"Demographics"</span><span class="o">=</span><span class="nf">c</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="m">14</span><span class="p">,</span><span class="m">15</span><span class="p">,</span><span class="m">28</span><span class="p">),</span><span class="w">
</span><span class="s2">"Psychological"</span><span class="o">=</span><span class="nf">c</span><span class="p">(</span><span class="m">2</span><span class="p">,</span><span class="m">4</span><span class="p">,</span><span class="m">5</span><span class="p">,</span><span class="m">6</span><span class="p">,</span><span class="m">18</span><span class="p">,</span><span class="m">20</span><span class="p">,</span><span class="m">21</span><span class="p">),</span><span class="w">
</span><span class="s2">"Social environment"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">7</span><span class="p">,</span><span class="m">16</span><span class="p">,</span><span class="m">17</span><span class="p">,</span><span class="m">19</span><span class="p">,</span><span class="m">26</span><span class="p">,</span><span class="m">27</span><span class="p">),</span><span class="w">
</span><span class="s2">"Medical"</span><span class="o">=</span><span class="nf">c</span><span class="p">(</span><span class="m">3</span><span class="p">,</span><span class="m">8</span><span class="p">,</span><span class="m">9</span><span class="p">,</span><span class="m">10</span><span class="p">,</span><span class="m">11</span><span class="p">,</span><span class="m">12</span><span class="p">,</span><span class="m">13</span><span class="p">,</span><span class="m">22</span><span class="p">,</span><span class="m">23</span><span class="p">,</span><span class="m">24</span><span class="p">,</span><span class="m">25</span><span class="p">))</span><span class="w">
</span><span class="c1"># pick some nice colors
</span><span class="n">group_col</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"#72CF53"</span><span class="p">,</span><span class="w"> </span><span class="s2">"#53B0CF"</span><span class="p">,</span><span class="w"> </span><span class="s2">"#FFB026"</span><span class="p">,</span><span class="w"> </span><span class="s2">"#ED3939"</span><span class="p">)</span><span class="w">
</span><span class="c1"># plot
</span><span class="n">library</span><span class="p">(</span><span class="n">qgraph</span><span class="p">)</span><span class="w">
</span><span class="n">qgraph</span><span class="p">(</span><span class="n">fit</span><span class="o">$</span><span class="n">adj</span><span class="p">,</span><span class="w">
</span><span class="n">vsize</span><span class="o">=</span><span class="m">3.5</span><span class="p">,</span><span class="w">
</span><span class="n">esize</span><span class="o">=</span><span class="m">4</span><span class="p">,</span><span class="w">
</span><span class="n">layout</span><span class="o">=</span><span class="s2">"spring"</span><span class="p">,</span><span class="w">
</span><span class="n">edge.color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rgb</span><span class="p">(</span><span class="m">33</span><span class="p">,</span><span class="m">33</span><span class="p">,</span><span class="m">33</span><span class="p">,</span><span class="m">100</span><span class="p">,</span><span class="w"> </span><span class="n">maxColorValue</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">255</span><span class="p">),</span><span class="w">
</span><span class="n">color</span><span class="o">=</span><span class="n">group_col</span><span class="p">,</span><span class="w">
</span><span class="n">border.width</span><span class="o">=</span><span class="m">1.5</span><span class="p">,</span><span class="w">
</span><span class="n">border.color</span><span class="o">=</span><span class="s2">"black"</span><span class="p">,</span><span class="w">
</span><span class="n">groups</span><span class="o">=</span><span class="n">groups_type</span><span class="p">,</span><span class="w">
</span><span class="n">nodeNames</span><span class="o">=</span><span class="n">datalist</span><span class="o">$</span><span class="n">colnames</span><span class="p">,</span><span class="w">
</span><span class="n">legend</span><span class="o">=</span><span class="kc">TRUE</span><span class="p">,</span><span class="w">
</span><span class="n">legend.mode</span><span class="o">=</span><span class="s2">"style2"</span><span class="p">,</span><span class="w">
</span><span class="n">legend.cex</span><span class="o">=</span><span class="m">.5</span><span class="p">)</span><span class="w">
</span></code></pre></figure>
<p><img src="http://jmbh.github.io/figs/2015-10-31-Estimation-of-mixed-graphical-models/JSS_autism_figure.jpg" alt="center" /></p>
<p>The data to reproduce this analysis can be found <a href="https://github.com/jmbh/AutismData">here</a>. More information about estimating mixed graphical models and the <a href="https://cran.r-project.org/web/packages/mgm/index.html">mgm packagepackage</a> can be found <a href="http://arxiv.org/abs/1510.06871">in this paper</a>. <a href="http://arxiv.org/abs/1510.05677">Here</a> is a paper explaining the theory behind the implemented algorithm.</p>
<p>Computationally efficient methods for Gaussian data are implemented in the <a href="https://cran.r-project.org/web/packages/huge/index.html">huge</a> package and the <a href="https://cran.r-project.org/web/packages/glasso/index.html">glasso</a> package. For binary data, there is the <a href="https://cran.fhcrc.org/web/packages/IsingFit/index.html">IsingFit</a> package.</p>
<p>Great free resources about graphical models are Chapter 17 in the freely available book <a href="https://web.stanford.edu/~hastie/local.ftp/Springer/OLD/ESLII_print4.pdf">The Elements of Statistical Learning</a> and the Coursera course <a href="https://www.coursera.org/course/pgm">Probabilistic Graphical Models</a>.</p>
Mon, 30 Nov 2015 00:00:00 +0000
http://jmbh.github.io//Estimation-of-mixed-graphical-models/
http://jmbh.github.io//Estimation-of-mixed-graphical-models/