Predictability in Network Models
Network models have become a popular way to abstract complex systems and gain insights into relational patterns among observed variables in many areas of science. The majority of these applications focuses on analyzing the structure of the network. However, if the network is not directly observed (Alice and Bob are friends) but estimated from data (there is a relation between smoking and cancer), we can analyze - in addition to the network structure - the predictability of the nodes in the network. That is, we would like to know: how well can a given node in the network predicted by all remaining nodes in the network?
Predictability is interesting for several reasons:
- It gives us an idea of how practically relevant edges are: if node A is connected to many other nodes but these only explain, let’s say, only 1% of its variance, how interesting are the edges connected to A?
- We get an indication of how to design an intervention in order to achieve a change in a certain set of nodes and we can estimate how efficient the intervention will be
- It tells us to which extent different parts of the network are self-determined or determined by other factors that are not included in the network
In this blogpost, we use the R-package mgm to estimate a network model and compute node wise predictability measures for a dataset on Post Traumatic Stress Disorder (PTSD) symptoms of Chinese earthquake victims. We visualize the network model and predictability using the qgraph package and discuss how the combination of network model and node wise predictability can be used to design effective interventions on the symptom network.
We load the data which the authors made freely available:
The datasets contains complete responses to 17 PTSD symptoms of 344 individuals. The answer categories for the intensity of symptoms ranges from 1 ‘not at all’ to 5 ‘extremely’. The exact wording of all symptoms is in the paper of McNally and colleagues.
Estimate Network Model
We estimate a Mixed Graphical Model (MGM), where we treat all variables as continuous-Gaussian variables. Hence we set the type of all variables to
type = 'g' and the number of categories for each variable to 1, which is the default for continuous variables
lev = 1:
Compute Predictability of Nodes
After estimating the network model we are ready to compute the predictability for each node. Node wise predictability (or error) can be easily computed, because the graph is estimated by taking each node in turn and regressing all other nodes on it. As a measure for predictability we pick the propotion of explained variance, as it is straight forward to interpret: 0 means the node at hand is not explained at all by other nodes in the nentwork, 1 means perfect prediction. We centered all variables before estimation in order to remove any influence of the intercepts. For a detailed description of how to compute predictions and to choose predictability measures, check out this preprint. In case there are additional variable types (e.g. categorical) in the network, we can choose an appropriate measure for these variables (e.g. % correct classification, see
We calculated the percentage of variance explained in each of the nodes in the network. Next, we visualize the estimated network and discuss its structure in relation to explained variance.
Visualize Network & Predictability
We provide the estimated weighted adjacency matrix and the node wise predictability measures as arguments to
… and get the following network visualization:
Each variable is represented by a node and the edges correspond to partial correlations, because in this dataset the MGM consists only of conditional Gaussian variables. The green color of the edges indicates that all partial correlations in this graph are positive, and the edge-width is proportional to the absolute value of the partial correlation. The blue pie chart behind the node indicates the predictability measure for each node (more blue = higher predictability).
We see that intrusive memories, traumatic dreams and flashbacks cluster together. Also, we observe that avoidance of thoughts (avoidth) about trauma interacts with avoidance of acitivies reminiscent of the trauma (avoidact) and that hypervigilant (hyper) behavior is related to feeling easily startled (startle). But there are also less obvious interactions, for instance between anger and concentration problems.
Now, if we would like to reduce sleep problems, the network model suggests to intervene on the variables anger and startle. But what the network structure does not tell us is how much we could possibly change sleep through the variables anger and startle. The predictability measure gives us an answer to this question: 53.1%. If the goal was to intervene on amnesia, we see that all adjacent nodes in the network explain only 32.7% of its variance. In addition, we see that there are many small edges connected to amnesia, suggesting that it is hard to intervene on amnesia via other nodes in the symptom network. Thus, one would possibly try to find additional variables that are not included in the network that interact with amnesia or try to intervene on amnesia directly.
Of course, there are limitations to interpreting explained variance as predicted treatment outcome: first, we cannot know the causal direction of the edges, so any edge could point in one or both directions. However, if there is no edge, there is also no causal effect in any direction. Also, it is often reasonable to combine the network model with general knoweldge: for instance, it seems more likely that amnesia causes being upset than the other way around. Second, we estimated the model on cross-sectional data (each row is one person) and hence assume that all people are the same, which is an assumption that is always violated to some extent. To solve this problem we would need (many) repeated measurements of a single person, in order to estimate a model specific to that person. This also solves the first problem to some degree as we can use the direction of time as the direction of causality. One would then use models that predict all symptoms at time point t by all symptoms at an earlier time point, let’s say t-1. An example of such a model is the Vector Autoregressive (VAR) model.
Compare Within vs. Out of Sample Predictability
So far we looked into how well we can predict nodes by all other nodes within our sample. But in most situations we are interested in the predictability of nodes in new, unseen data. In what follows, we compare the within sample predictability with the out of sample predictability.
We first split the data in two parts: a training part (60% of the data), which we use to estimate the network model and a test part, which we will use to compute predictability measures on:
Next, we estimate the network only on the training data and compute the predictability measure both on the training data and the test data:
We now look at the mean predictability over nodes for the training- and test dataset:
As expected, the explained variance is higher in the training dataset. This is because we fit the model to structure that is specific to the training data and is not present in the population (noise). Note that both means are lower than the mean we would get by taking the mean of the explained variances above, because we used less observation to estimate the model and hence have less power to detect edges.
While the explained variance values are lower in the test set, there is a positive relationship between the explained variance of a node in the training- and the test set
which means that if a node has high explained variance in the training set, it tends to also have a high explained variance in the test set.
Edit Nov 3rd: The AND- or OR-rule and Predictability
In the above example I used the OR-rule to combine estimates in the neighborhood regression approach, without justifying why (thanks to Wagner de Lara Machado for pointing this out). Here comes the explanation:
In the neighborhood regression approach to graph estimation we pick each node in the graph and regress all other nodes on this node. If we have three nodes , , , this procedure leads to three regression models:
This procedure leads to two estimates for the edge between and : from regression (1) and from regression (2). If both parameters are nonzero, we clearly set the edge between x1 and x2 to present, and if both parameters are zero, we clearly set the edge between and to not present. However, in some cases the two estimates disagree and we need a rule for this situation: The OR-rule sets an edge to be present if at least one of the estimates is nonzero. The AND-rule sets an edge to be present only if both estimates are nonzero.
Now, to compute predictions and hence a measure of predictability we use the regression models 1-3. Let’s take regression model (3), where we predict by and . Now, if the betas agree ( and agree and and agree), everything is fine. But if there is disagreement, we have the following problem:
When using the AND-rule: if let’s say the parameter is nonzero but is zero, the AND rule sets the edge-parameter - in the graph to zero; however the parameter will still be used for estimation of . This leads to a predictability that is too high. Hence we could have a situation in which a node has no connection in the graph (obtained using the AND-rule) but has a nonzero predictability measure.
When using the OR-rule: if the parameter is nonzero but is zero, the OR-rule sets the edge-parameter - in the graph to be present; however we use the (zero) parameter in regression (3) for prediction. This leads to a predictability that is too small. Hence we could hae the situation that a node has a connetion in the graph but has a zero predictability measure.
Hence, when using the OR-rule, we underestimate the true predictability given the graph and hence get a conservative estimate of predictability in the graph. This is why I chose the OR-rule above.
Okay, but why don’t we adjust the parameters of the regression models 1-3 by setting parameters to zero (AND-rule) or filling in parameters (OR-rule)? This is not possible, because tinkering with the parameters will destroy the prediction model in most situations.
A possible way around this would be to take the estimated graph and then re-estimate the graph (by performing p regressions) but only use those variables as predictors that were connected to the predicted node in the initial graph. However, this 2-stage procedure would lead to a (possibly) completely different scaling for the estimation of each of the neighborhoods of the different nodes. This is likely to lead to an algorithm that does not consistently recover the true graph/network.