Regression with Interaction Terms - How Centering Predictors influences Main Effects
Centering predictors in a regression model with only main effects has no influence on the main effects. In contrast, in a regression model including interaction terms centering predictors does have an influence on the main effects. After getting confused by this, I read this nice paper by Afshartous & Preston (2011) on the topic and played around with the examples in R. I summarize the resulting notes and code snippets in this blogpost.
We give an explanation on two levels:
- By illustrating the issue with the simplest possible example
- By showing in general how main effects are a function of the constants (e.g. means) that are substracted from predictor variables
Explanation 1: Simplest example
The simplest possible example to illustrate the issue is a regression model in which variable is a linear function of variables , , and their product
where we set , and is Gaussian distribution with mean zero and variance . We define the predictors as Gaussians with means and . This code samples observations from this model:
Regression models with main effects
We first verify that centering variables indeed does not affect the main effects. To do so, we first fit the linear regression with only main effects with uncentered predictors
and then with mean centered predictors
The parameter estimates of the regression with uncentered predictors are and . The estimates of the regression with centered predictors are and (we denote estimates from regressions with centered predictors with an asterisk). And indeed, and .
Regression models with main effects + interaction
We include the interaction term and show that centering the predictors now does does affect the main effects. We first fit the regression model without centering
and then with centering
We see that and and and . While the two models have different parameters, they are statistically equivalent. Here this means that expected values of both models are the same. In empirical terms this means that their coefficient of determination is the same. The reader will be able to verify this in Explanation 2 below.
We make two observations:
- In the model with interaction terms, the main effects differ between the regressions with/without centering of predictors
- When centering predictors, the main effects are the same in the model with/without the interaction term (up to some numerical inaccuracy)
Why does centering influence main effects in the presence of an interaction term?
The reason is that in the model with the interaction term, the parameter (uncentered predictors) is the main effect of on if , and the parameter (centered predictors) is the main effect of on if . This means that and are modeling different effects in the data. Here is a more detailed explanation:
Rewriting the model equation in the following way
shows that in the model with interaction term, the effect of on is equal to and therefore a function of . What does the parameter model here? It models the effect of on when . Similarly we could rewrite the effect of on as a function of .
Now let and be the centered predictors. We get the same model equations, now with the parameters estimated using the centered predictors :
Again we focus on the effect of on . What does the the parameter model here? It models the main effect of on when . What remained the same is that is the main effect of on when . But what is new is that .
To summarize, in the uncentered case is the main effect when the predictor variable is equal to zero; and in the centered case, is the main effect when the predictor variable is equal to its mean. Clearly, and model different effects in the data and it is therefore not surprising that the two regressions give us very different estimates.
Centering interpretation of remains the same when adding interaction
Our second observation above was that the estimates of main effects are the same with/without interaction term when centering the predictor variables. This is because in the models without interaction term (centered or uncentered predictors) the interpretation of is the same as in the model with interaction term and centered predictors.
More precisely, in the regression model with only main effects, is the main effect of on averaged over all values of , which is the same as the main effect of on for . This means that if we center predictors, models the same effect in the data in a model with/without interaction term. This is an attractive property to have when one is interested in comparing models with/without interaction term.
Explanation 2: Main effects as functions of added constants
Substracting the mean from predictors is a special case of adding constants to predictors. Here we first show numerically what happens to each regression parameter when adding constants to predictors. Then we show analytically how each parameter is a function of its value in the original regression model (no constant added) and the added constants.
Why are we doing this? We are doing this to develop a more general understanding of what happens when adding constants to predictors. It also puts the above example in a more general context, since we can consider it as a special case of the following analysis.
Numerical experiment I: Only main effects
We first fit a series of regression models with only main effects. In each of them we add a different constant to the predictors. We do this verify that our claim that centering predictors does not change main effects extends to the more general situation of adding constants to predictors.
We first define a sequence of constant values we add to the predictors and create storage for parameter estimates:
We now fit 25 regression models, and in each of them we add a constant
c to both predictors, taken from the sequence
Remark: in Explanation 1 we said that the coefficient of determination does not change when adding constants to the predictors. We invite the reader to verify this by inspecting
We plot all parameters as a function of
We see that the intercept changes as a function of
c. The model at
c = 0 corresponds to the very first model we fitted above. And the model at
c = -1 corresponds to the model fitted with centered predictors. But the key observation is that the main effects do not change. A proof of this and an exact expression for the intercept will fall out of our analysis of the model with interaction term in the last section of this blogpost.
Numerical experiment II: main effects + interaction term
Next we show that this is different when adding the interaction term. We use the same sequence of
c as above and fit regression models with interaction term:
And again we plot all parameters as a function of
This time both the intercept and the main effects are a function of
c, while the interaction effect is constant. At this point the best explanation is simply to go through the algebra, which explains these results exactly. We do this in the next section.
Deriving function for all effects
We plug in the definition of centering in the population regression model we introduced at the very beginning of this blogpost. This gives us every parameter as a function of two things: (1) the parameters in the original model and (b) the added constant. Above we added the same constant to both predictors. Here we consider the general case where the constants can differ.
Our original (unaltered) model is given by:
Now we plug in the predictors with added constants , multiply out, and rearrange:
Now if we equate the respective interecept and slope terms we get:
Now we solve for the parameters from the models with constants added to the predictors.
Because we know we can write and can solve
The same goes for so we have
Finally, to obtain a formula for we plug the just obtained expressions for , and into
and can solve for :
Let’s check whether those fomulas predict the parameter changes as a function of
c in the numerical experiment above.
We plot the computed parameters by the derived expressions as points on the empirical results from the numerical experiments above
and they match the numerical results exactly.
We see that the derived expressions explain exactly how parameters change as a function of the parameters of the reference model and the added constants.
If we set , we get the same derivation for the regression model without interaction term. We find that , , and .