Intro
In this tutorial I will try to explain the logic behind marginal
effects.
I will provide you with practical examples. And give you some
exercises.
To copy the code click the button in the upper right corner of the
code-chunks.
Questions can be addressed to Jochem Tolsma .
Custom functions
package.check
: Check if packages are installed (and
install if not) in R (source ).
fpackage.check <- function(packages) {
lapply(packages, FUN = function(x) {
if (!require(x, character.only = TRUE)) {
install.packages(x, dependencies = TRUE)
library(x, character.only = TRUE)
}
})
}
Packages
tidyverse
: if you can’t base them, join them
miniGUI
:
rgl
: plotting stuf
Deriv
: to let R calculate derivatives
margin
: an R package that will calculate margins for
you
mvtnorm
: to simulate data (with correlated
covariates)
boot
: to calculate SE/CI of the marginal effects
packages = c("tidyverse", "miniGUI", "rgl", "Deriv", "margins", "mvtnorm", "boot", "DAMisc")
fpackage.check(packages)
Session info
This is my setup.
sessionInfo()
#> R version 4.2.0 (2022-04-22 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19042)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=Dutch_Netherlands.utf8 LC_CTYPE=Dutch_Netherlands.utf8
#> [3] LC_MONETARY=Dutch_Netherlands.utf8 LC_NUMERIC=C
#> [5] LC_TIME=Dutch_Netherlands.utf8
#>
#> attached base packages:
#> [1] tcltk stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] DAMisc_1.7.2 boot_1.3-28 mvtnorm_1.1-3 margins_0.3.26 Deriv_4.1.3 miniGUI_0.8-1
#> [7] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.9 purrr_0.3.4 readr_2.1.2 tidyr_1.2.0
#> [13] tibble_3.1.7 ggplot2_3.4.0 tidyverse_1.3.1 rgl_0.109.2 knitr_1.39
#>
#> loaded via a namespace (and not attached):
#> [1] VGAM_1.1-6 minqa_1.2.4 colorspace_2.0-3 ellipsis_0.3.2
#> [5] snakecase_0.11.0 base64enc_0.1-3 fs_1.5.2 optiscale_1.2.2
#> [9] rstudioapi_0.13 DT_0.23 fansi_1.0.3 lubridate_1.8.0
#> [13] xml2_1.3.3 codetools_0.2-18 splines_4.2.0 extrafont_0.18
#> [17] effects_4.2-1 jsonlite_1.8.0 nloptr_2.0.3 broom_0.8.0
#> [21] Rttf2pt1_1.3.10 dbplyr_2.2.0 png_0.1-7 compiler_4.2.0
#> [25] httr_1.4.3 backports_1.4.1 assertthat_0.2.1 Matrix_1.4-1
#> [29] fastmap_1.1.0 survey_4.1-1 cli_3.3.0 formatR_1.12
#> [33] htmltools_0.5.2 tools_4.2.0 coda_0.19-4 gtable_0.3.0
#> [37] glue_1.6.2 Rcpp_1.0.8.3 carData_3.0-5 cellranger_1.1.0
#> [41] jquerylib_0.1.4 vctrs_0.5.1 nlme_3.1-157 extrafontdb_1.0
#> [45] insight_0.17.1 xfun_0.31 lme4_1.1-29 rvest_1.0.2
#> [49] lifecycle_1.0.3 klippy_0.0.0.9500 srvyr_1.1.2 MASS_7.3-56
#> [53] scales_1.2.0 hms_1.1.1 parallel_4.2.0 RColorBrewer_1.1-3
#> [57] yaml_2.3.5 pbapply_1.5-0 pander_0.6.5 sass_0.4.1
#> [61] latticeExtra_0.6-29 stringi_1.7.6 rlang_1.0.6 pkgconfig_2.0.3
#> [65] evaluate_0.15 lattice_0.20-45 prediction_0.3.14 htmlwidgets_1.5.4
#> [69] tidyselect_1.1.2 plyr_1.8.7 magrittr_2.0.3 R6_2.5.1
#> [73] generics_0.1.2 clarkeTest_0.1.0 DBI_1.1.2 pillar_1.7.0
#> [77] haven_2.5.0 withr_2.5.0 jtools_2.2.1 survival_3.3-1
#> [81] abind_1.4-5 nnet_7.3-17 janitor_2.1.0 modelr_0.1.8
#> [85] crayon_1.5.1 car_3.0-13 utf8_1.2.2 tzdb_0.3.0
#> [89] rmarkdown_2.14 jpeg_0.1-9 grid_4.2.0 readxl_1.4.0
#> [93] data.table_1.14.2 AICcmodavg_2.3-1 reprex_2.0.1 digest_0.6.29
#> [97] xtable_1.8-4 stats4_4.2.0 munsell_0.5.0 unmarked_1.2.5
#> [101] bslib_0.3.1 mitools_2.4
Background reading
and theory
Ai and Norton, 2003. Interaction terms in logit and probit models :
https://doi.org/10.1016/S0165-1765(03)00032-6
Marginal effect
definition
What is a marginal effect? If you google, you will find this
definition:
Marginal effects measure the impact that an instantaneous change in
one variable has on the outcome variable while all other variables are
held constant.
But I would say the definition depends on the measurement level of
your independent variable.
For a dichotomous independent variable:
The change in your dependent variable if
(ceteris paribus) your independent variable x1 changes from 0 to
1 .
More formally:
\[ ME = f(X|x1=1) - f(X|x1=0),
\]
where \(Y = f(X)\) and \(X\) is your set of covariates.
For a metric independent variable:
The change in your dependent variable, Y,
at some value, x, of your independent variable x1, per (ceteris paribus)
small change of x1 at x of s.
More formally:
\[ ME = \frac{f(X|x1=x + s) - f(X|x1=x -
s)}{2s}, \] where \(s\) is an
infinitesimal small number.
This is thus the slope (or more precisely the partial derivative) of
\(f(X)\) at the point where \(x1\) is \(x\) :
\[ ME = f^{'}(X|x1=x), \] and
\[ f^{'}(X) = \displaystyle
\frac{\partial f(X)}{\partial x1} \]
Where \(X\) is your set of
covariates, \(x1\) is one of your
covariates, \(x\) is a value of your
covariate \(x1\) .
Average Marginal
Effect / Marginal Effect at Mean
Because you can calculate the ME at different values of your other
covariates, you can calculate many MEs.
If you calculate the ME for all respondents in your dataset (and
their observed values of covariates) and take the mean, you will end up
with the AME , the Average Marginal Effect.
If you calculate the ME for covariate values set at their mean, you will
end up with the MEM , the Marginal Effect at Mean (or
MME, Mean Marginal Effect).
Marginal
interaction effect
Sometimes this is incorrectly defined as:
\[ \displaystyle \frac{\partial
f(X)}{\partial (x1x2)} \] Correctly would be:
\[ \displaystyle \frac{\partial
f^2(X)}{\partial(x1)\partial(x2)} \]
Interpretation: How does the effect of
\(x1\) on \(Y\) change with a change in \(x2\) , conditional on \(X\) .
Naturally, you have analogous definitions if \(x1\) and/or \(x2\) are dichotomous.
Simulated data
set.seed(227863) # so we all have the same data
# we start with simulating some correlated covariates
sigma <- matrix(c(4, 2, 3, 2, 3, 1.5, 3, 1.5, 5), ncol = 3)
# sigma
df <- data.frame(rmvnorm(n = 500, mean = c(6, 8, 7), sigma = sigma))
colnames(df) <- c("age", "sex", "educ")
df %>%
mutate(sex = ifelse(sex > mean(sex) + 0.3 * sd(sex), 0, 1), sex = as.factor(sex), sex = recode_factor(sex,
`0` = "men", `1` = "women"), age = round(age, 2), educ = round(educ, 2)) -> df
# Let us add a linear dependent variable with the following formula
df %>%
mutate(y_lin = 10 + 5 * as.numeric(sex) + 2.4 * age + 2 * educ + 0.5 * as.numeric(sex) * educ + -0.2 *
age * educ + rnorm(500, 0, 0.5)) -> df
head(df)
#> age sex educ y_lin
#> 1 6.87 women 9.34 51.13692
#> 2 6.34 women 4.64 42.90037
#> 3 5.87 women 7.96 48.96823
#> 4 1.29 women 4.01 33.57377
#> 5 6.66 men 5.66 36.26559
#> 6 9.88 men 9.96 43.77554
This was hopefully, all quite straightforward. But let us also
simulate a dichotomous dependent variable.
Thus we would like to simulate a dichotomous dependent variable
(0/1), from a Bernoulli distribution, with a logit link function. So we
can illustrate marginal effects with logistic regression model.
The log-odds of the probability of an event is a linear combination
of the independent variables
Please remember, the logistic function is:
\[ f(X) = P(Y=1|X) = n(X) = \pi_x =
\frac{e^X}{(1 + e^X)} = \frac{1} {(1+e^{-X})} \]
We can easily calculate \(P(Y=1|X)\)
in R if we know \(X\) via the function
plogis(X)
.
And if we have \(P(Y=1|X)\) we can
simulate an observed variable according to a binomial distribution:
\[Y_{di} \sim {\sf Binom}(1,
P(Y=1|X))\]
df %>%
mutate(y_di = rbinom(n = 500, size = 1, prob = plogis(-2.4 + 0.3 * as.numeric(sex) + 0.4 * age +
0.3 * educ + 0.1 * as.numeric(sex) * educ + -0.1 * age * educ))) -> df
table(df$sex, df$y_di)
#>
#> 0 1
#> men 152 27
#> women 141 180
head(df)
#> age sex educ y_lin y_di
#> 1 6.87 women 9.34 51.13692 0
#> 2 6.34 women 4.64 42.90037 1
#> 3 5.87 women 7.96 48.96823 1
#> 4 1.29 women 4.01 33.57377 1
#> 5 6.66 men 5.66 36.26559 1
#> 6 9.88 men 9.96 43.77554 0
Linear model
Estimate the model
m1 <- lm(y_lin ~ sex + age + educ, data = df)
summary(m1)
#>
#> Call:
#> lm(formula = y_lin ~ sex + age + educ, data = df)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -7.5318 -0.5558 0.2797 0.9000 3.3035
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 19.22576 0.30004 64.08 <2e-16 ***
#> sexwomen 8.83186 0.15553 56.78 <2e-16 ***
#> age 1.05363 0.04703 22.40 <2e-16 ***
#> educ 1.71399 0.03890 44.07 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 1.455 on 496 degrees of freedom
#> Multiple R-squared: 0.9355, Adjusted R-squared: 0.9351
#> F-statistic: 2399 on 3 and 496 DF, p-value: < 2.2e-16
And plot the model.
colors <- ifelse(df$sex == "men", "red", "blue")
with(df, plot3d(educ, age, predict(m1), col = colors))
Marginal effects of
age.
We know that \(Y = intercept + b_sex*sex +
b_age*age + b_educ*educ\) . We know that age is a continuous
variable. Thus we calculate the derivative (no worries we will use the
function D
) of Y with respect to age.
ME_age <- D(expression(intercept + b_sex * sex + b_age * age + b_educ * educ), "age")
ME_age
#> b_age
Find the value of this coefficient:
# coef(m1)
b_intercept <- coef(m1)[1]
b_sex <- coef(m1)[2]
b_age <- coef(m1)[3]
b_educ <- coef(m1)[4]
b_age
#> age
#> 1.053627
Exercise: Calculate the average marginal effect for age by using the
numerical approach to calculate the derivative
Show Answer
s <- 0.01 #define a small step
dfmin <- dfplus <- df #let us copy the datasets
dfplus$age <- df$age + s #we add a small step to age in one of the datasets
dfmin$age <- df$age - s #we substract a small step to age in one of the datasets
ysplus <- predict(m1, newdata = dfplus) #we calculate the predicted values based on our model parameters and new values of covariates
ysmin <- predict(m1, newdata = dfmin) #we calculate the predicted values based on our model parameters and new values of covariates
ME_age <- (ysplus - ysmin)/(2 * s) #calculate the MEs
AME_age <- mean(ME_age) #calculate the AME
AME_age
#> [1] 1.053627
Marginal effects of
sex.
Remember the definition?
ME_sex <- with(df, (b_intercept + b_sex * 1 + b_age * age + b_educ * educ) - (b_intercept + b_sex * 0 +
b_age * age + b_educ * educ))
ME_sex
#> [1] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [11] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [21] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [31] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [41] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [51] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [61] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [71] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [81] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [91] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [101] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [111] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [121] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [131] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [141] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [151] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [161] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [171] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [181] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [191] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [201] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [211] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [221] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [231] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [241] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [251] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [261] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [271] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [281] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [291] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [301] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [311] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [321] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [331] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [341] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [351] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [361] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [371] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [381] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [391] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [401] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [411] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [421] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [431] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [441] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [451] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [461] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [471] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [481] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [491] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
A bit shorter would be:
dfnews0 <- dfnews1 <- df
# construct two datasets
dfnews0$sex <- "men"
dfnews1$sex <- "women"
# let R calculate the predicted scores
mean(predict(m1, dfnews1) - predict(m1, dfnews0))
#> [1] 8.831865
So, lets calculate the AME: mean(ME_sex)
=
8.8318647.
And what is the interpretation? Well, let us plot the values of the
dependent variable, if we hold education constant and vary age, for
men and women .
plot(c(-10:10), b_intercept + b_sex * 1 + b_age * c(-10:10) + b_educ * mean(df$educ), col = "red", type = "b",
xlab = "age", ylab = "y_lin", xlim = c(-10, 10), ylim = c(0, 60))
par(new = TRUE)
plot(c(-10:10), b_intercept + b_sex * 0 + b_age * c(-10:10) + b_educ * mean(df$educ), col = "blue", type = "b",
xlab = "", ylab = "", xlim = c(-10, 10), ylim = c(0, 60))
You see, that at all values of age, the difference between men and women is AME.
check with R
summary(margins(m1)) #SE via deltamethod
#> factor AME SE z p lower upper
#> age 1.0536 0.0470 22.4029 0.0000 0.9614 1.1458
#> educ 1.7140 0.0389 44.0662 0.0000 1.6378 1.7902
#> sexwomen 8.8319 0.1555 56.7852 0.0000 8.5270 9.1367
summary(margins(m1, vce = "bootstrap", iterations = 999)) #SE via bootstrapping
#> factor AME SE z p lower upper
#> age 1.0536 0.0538 19.5965 0.0000 0.9482 1.1590
#> educ 1.7140 0.0471 36.4029 0.0000 1.6217 1.8063
#> sexwomen 8.8319 0.1396 63.2743 0.0000 8.5583 9.1054
Logistic model
m1_d <- glm(y_di ~ sex + age + educ, data = df, family = binomial)
summary(m1_d)
#>
#> Call:
#> glm(formula = y_di ~ sex + age + educ, family = binomial, data = df)
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -2.0237 -0.9296 -0.4609 1.0013 2.1158
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 0.63124 0.48127 1.312 0.189656
#> sexwomen 1.43639 0.25730 5.583 2.37e-08 ***
#> age -0.28712 0.07645 -3.756 0.000173 ***
#> educ -0.04639 0.06069 -0.764 0.444632
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 678.28 on 499 degrees of freedom
#> Residual deviance: 560.37 on 496 degrees of freedom
#> AIC: 568.37
#>
#> Number of Fisher Scoring iterations: 4
Let us have a look at the model:
# with(df, plot3d(sex, age, fitted.values(m1_d), col=colors))
with(df, plot3d(educ, age, fitted.values(m1_d), col = colors))
Marginal effects of
age
Please remember, the logistic function is:
\[ f(X) = \frac{1} {(1+e^{-X})}
\]
And the definition of the ME for a continuous variable \(x1\) :
\[ ME = f^{'}(X|x1=x), \]
Thus we need to find \(f^{'}(X)\) with respect to \(x1\) (i.e., age).
You could do that yourself by using the chain-rule:
\[\displaystyle \frac{\partial
f(X)}{\partial (x1)} = \displaystyle \frac{\partial f(X)}{\partial
(X)}*\displaystyle \frac{\partial (X)}{\partial (x1)}\]
… or let R help you.
ME_age_f2 <- D(expression(1/(1 + exp(-1 * (b_intercept + b_sex * sex + b_age * age + b_educ * educ)))),
"age")
ME_age_f2
#> exp(-1 * (b_intercept + b_sex * sex + b_age * age + b_educ *
#> educ)) * b_age/(1 + exp(-1 * (b_intercept + b_sex * sex +
#> b_age * age + b_educ * educ)))^2
Luckily for us, the derivative of plogis(X)
is
dlogis(X)
.
You wanna check? simply run: ?dlogis
.
Retrieve the coefficients:
coef(m1_d)
#> (Intercept) sexwomen age educ
#> 0.6312358 1.4363873 -0.2871232 -0.0463920
b_intercept <- coef(m1_d)[1]
b_sex <- coef(m1_d)[2]
b_age <- coef(m1_d)[3]
b_educ <- coef(m1_d)[4]
And fill in the formula:
ME_age <- with(df, dlogis(b_intercept + b_sex * (sex == "women") + b_age * age + b_educ * educ) * b_age)
# shorter would be:
ME_age <- dlogis(predict(m1_d)) * b_age
head(ME_age)
#> 1 2 3 4 5 6
#> -0.06976658 -0.07176243 -0.07177781 -0.04251901 -0.04164254 -0.01742796
So, you notice - hopefully - that that the ME_age is different for
each respondent. Let us take the mean.
AME_age = -0.0546062.:
AME_age <- mean(ME_age)
AME_age
#> [1] -0.05460625
marginal effect
sex.
Remember \(ME_{sex} = f(X|sex=1) -
f(X|sex=0)\) .
And once again simply fill in the formula.
ME_sex <- with(df, plogis(b_intercept + b_sex * 1 + b_age * age + b_educ * educ) - plogis(b_intercept +
b_sex * 0 + b_age * age + b_educ * educ))
AME_sex <- mean(ME_sex)
AME_sex
#> [1] 0.290718
or,…
# new datasets
dfs0 <- dfs1 <- df
dfs0$sex <- "men"
dfs1$sex <- "women"
ME_sex <- plogis(predict(m1_d, dfs1)) - plogis(predict(m1_d, dfs0))
AME_sex <- mean(ME_sex)
AME_sex
#> [1] 0.290718
Let us plot the marginal effect at different values of the other
covariates.
with(df, plot3d(educ, age, ME_sex, col = colors))
AME_sex = 0.290718.
Check with R.
# okay, now let's use R to calculate the marginal effects
summary(margins(m1_d)) #make sure that variable sex is defined as a factor not as a numeric variable.
#> factor AME SE z p lower upper
#> age -0.0546 0.0138 -3.9608 0.0001 -0.0816 -0.0276
#> educ -0.0088 0.0115 -0.7661 0.4436 -0.0314 0.0138
#> sexwomen 0.2907 0.0485 5.9983 0.0000 0.1957 0.3857
Everything seems to be okay!
Linear model with
interactions
Let us estimate a better fitting model.
m2 <- lm(y_lin ~ sex + age + educ + sex:educ + age:educ, data = df)
# note that we define the interaction within the formula. This is crucial!!
summary(m2)
#>
#> Call:
#> lm(formula = y_lin ~ sex + age + educ + sex:educ + age:educ,
#> data = df)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -1.30718 -0.33026 0.00721 0.32965 1.94917
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 14.866935 0.284848 52.19 <2e-16 ***
#> sexwomen 5.022406 0.197319 25.45 <2e-16 ***
#> age 2.396873 0.036563 65.56 <2e-16 ***
#> educ 2.527558 0.040563 62.31 <2e-16 ***
#> sexwomen:educ 0.497224 0.025560 19.45 <2e-16 ***
#> age:educ -0.200642 0.004748 -42.26 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.5058 on 494 degrees of freedom
#> Multiple R-squared: 0.9922, Adjusted R-squared: 0.9922
#> F-statistic: 1.263e+04 on 5 and 494 DF, p-value: < 2.2e-16
Retrieve coefficients.
coef(m2)
#> (Intercept) sexwomen age educ sexwomen:educ age:educ
#> 14.8669349 5.0224058 2.3968728 2.5275580 0.4972235 -0.2006417
b_intercept <- coef(m2)[1]
b_sex <- coef(m2)[2]
b_age <- coef(m2)[3]
b_educ <- coef(m2)[4]
b_sex_educ <- coef(m2)[5]
b_age_educ <- coef(m2)[6]
marginal effect
sex
Let’s use power of R.
ME_sex <- predict(m2, newdata = dfs1) - predict(m2, newdata = dfs0)
AME_sex <- mean(ME_sex)
AME_sex
#> [1] 8.484782
marginal effect
age
ME_age_f <- expression(b_age + b_age_educ * educ) #this is the derivative with respect to age
ME_age <- with(df, eval(ME_age_f)) #Do you see I do not fit all variables myself but use this convenient function eval()
AME_age <- mean(ME_age)
AME_age
#> [1] 0.9997207
Marginal effect
education
ME_educ_f <- expression(b_educ + b_age_educ * age + b_sex_educ * (sex == "women"))
ME_educ <- with(df, eval(ME_educ_f))
AME_educ <- mean(ME_educ)
AME_educ
#> [1] 1.65055
Let us check with
R
summary(margins(m2))
#> factor AME SE z p lower upper
#> age 0.9997 0.0164 61.0272 0.0000 0.9676 1.0318
#> educ 1.6505 0.0136 121.6687 0.0000 1.6240 1.6771
#> sexwomen 8.4848 0.0554 153.1992 0.0000 8.3762 8.5933
Marginal interaction
effect educ*age
First calculate derivative to age and then to educ.
fx <- expression(intercept + b_sex * sex + b_age * age + b_educ * educ + b_sex_educ * sex * educ + b_age_educ *
age * educ)
dfxdage <- D(fx, "age")
d2fxdagededuc <- D(dfxdage, "educ")
# check if the variable sex is in our expression we then need to be careful because it is factor.
# but it is not.
AME_age_educ <- mean(with(df, eval(d2fxdagededuc)))
AME_age_educ
#> [1] -0.2006417
And this is (of course) our regression coefficient for the
interaction.
Marginal interaction
effect sex*educ
First calculate derivative to sex and then to educ.
fx <- expression(intercept + b_sex * sex + b_age * age + b_educ * educ + b_sex_educ * sex * educ + b_age_educ *
age * educ)
dfxdsex <- expression(intercept + b_sex * 1 + b_age * age + b_educ * educ + b_sex_educ * 1 * educ + b_age_educ *
age * educ - (intercept + b_sex * 0 + b_age * age + b_educ * educ + b_sex_educ * 0 * educ + b_age_educ *
age * educ))
dfxdsex
#> expression(intercept + b_sex * 1 + b_age * age + b_educ * educ +
#> b_sex_educ * 1 * educ + b_age_educ * age * educ - (intercept +
#> b_sex * 0 + b_age * age + b_educ * educ + b_sex_educ * 0 *
#> educ + b_age_educ * age * educ))
dfxdsex <- Simplify(dfxdsex)
dfxdsex
#> expression(b_sex + b_sex_educ * educ)
d2fxdsexdeduc <- D(dfxdsex, "educ")
d2fxdsexdeduc
#> b_sex_educ
# this is just a constant, namely our interaction coefficient
eval(d2fxdsexdeduc)
#> sexwomen:educ
#> 0.4972235
Logistic with model
interactions
Let’s estimate a better model
m2_d <- glm(y_di ~ sex + age + educ + sex:educ + age:educ, family = binomial, data = df)
summary(m2_d)
#>
#> Call:
#> glm(formula = y_di ~ sex + age + educ + sex:educ + age:educ,
#> family = binomial, data = df)
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -1.8203 -0.8882 -0.3042 0.9333 2.1433
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -2.73893 1.38148 -1.983 0.04741 *
#> sexwomen 0.95791 0.98451 0.973 0.33056
#> age 0.46274 0.18541 2.496 0.01257 *
#> educ 0.56442 0.21555 2.619 0.00883 **
#> sexwomen:educ 0.06869 0.13933 0.493 0.62202
#> age:educ -0.12350 0.02823 -4.375 1.21e-05 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 678.28 on 499 degrees of freedom
#> Residual deviance: 533.69 on 494 degrees of freedom
#> AIC: 545.69
#>
#> Number of Fisher Scoring iterations: 5
Plotting the estimated model is very insightful.
with(df, plot3d(sex, age, fitted.values(m2_d), col = colors))
# with(df, plot3d(educ, age, fitted.values(m2_d), col=colors)) with(df, plot3d(educ[sex=='men'],
# age[sex=='men'], fitted.values(m2_d)[sex=='men'], col=colors[sex=='men'])) with(df,
# plot3d(educ[sex=='women'], age[sex=='women'], fitted.values(m2_d)[sex=='women'],
# col=colors[sex=='women']))
Retrieve coefficients.
coef(m2_d)
#> (Intercept) sexwomen age educ sexwomen:educ age:educ
#> -2.73893262 0.95790989 0.46274191 0.56442210 0.06868779 -0.12350206
b_intercept <- coef(m2_d)[1]
b_sex <- coef(m2_d)[2]
b_age <- coef(m2_d)[3]
b_educ <- coef(m2_d)[4]
b_sex_educ <- coef(m2_d)[5]
b_age_educ <- coef(m2_d)[6]
marginal effect
sex
Let’s use power of R
ME_sex <- plogis(predict(m2_d, newdata = dfs1)) - plogis(predict(m2_d, newdata = dfs0))
AME_sex <- mean(ME_sex)
AME_sex
#> [1] 0.2700221
marginal effect
age
ME_age <- with(df, dlogis(predict(m2_d)) * (b_age + b_age_educ * educ))
AME_age <- mean(ME_age)
AME_age
#> [1] -0.06071402
marginal effect
educ
ME_educ <- with(df, dlogis(predict(m2_d)) * (b_educ + b_sex_educ * (sex == "women") + b_age_educ * age))
AME_educ <- mean(ME_educ)
AME_educ
#> [1] -0.009992943
check with R
summary(margins(m2_d))
#> factor AME SE z p lower upper
#> age -0.0607 0.0134 -4.5156 0.0000 -0.0871 -0.0344
#> educ -0.0100 0.0111 -0.9018 0.3672 -0.0317 0.0117
#> sexwomen 0.2700 0.0483 5.5892 0.0000 0.1753 0.3647
Marginal interaction
effects.
Warning, calculating the derivatives may be
difficult depending on f(x), and your mathematical background
;-)
We need the double derivative of \(f(x)\) . We will call this function
ddlogis
.
# f(X) = P(Y=1|X) = n(X) = exp(X) / (1 + exp(X)) = 1 / (1+exp(-X))
nX <- expression(1/(1 + exp(-X)))
# similar as dlogis
dnXdx <- D(nX, "X")
# higher order derivative
d2nxdxdx <- D(dnXdx, "X")
# d2nxdxdx
ddlogis <- function(X) {
-(exp(-X)/(1 + exp(-X))^2 - exp(-X) * (2 * (exp(-X) * (1 + exp(-X))))/((1 + exp(-X))^2)^2)
}
marginal interaction
effect educ*age
MEeducage <- with(df, ddlogis(predict(m2_d)) * (b_educ + b_age_educ * age) * (b_age + b_age_educ * educ) +
dlogis(predict(m2_d)) * (b_age_educ))
AMEeducage <- mean(MEeducage)
AMEeducage
#> [1] -0.01726894
Numerical
approach
s <- 0.001 #define a small step
# define datasets
dfplusplus <- dfplusmin <- dfminplus <- dfminmin <- df
# add the small step to the variables
dfplusmin$age <- dfplusplus$age <- df$age + s
dfminmin$age <- dfminplus$age <- df$age - s
dfplusplus$educ <- dfminplus$educ <- df$educ + s
dfplusmin$educ <- dfminmin$educ <- df$educ - s
# calculate the predicted probabilities
p11 <- plogis(predict(m2_d, dfplusplus))
p10 <- plogis(predict(m2_d, dfplusmin))
p01 <- plogis(predict(m2_d, dfminplus))
p00 <- plogis(predict(m2_d, dfminmin))
# and the marginal effects. be aware of all the brackets. :-(
am <- (((p11 - p01)/(2 * s)) - ((p10 - p00)/(2 * s)))/(2 * s)
mean(am)
#> [1] -0.01711588
Exercise: Suppose that all estimated interaction terms are 0 in model
m2_d what would be the marginal effect of the interaction effect of
education with age according to the above? Thus what would be:
MEeducage.
marginal interaction
effect sex*educ
First calculate derivative to sex and then to educ
dfxdsex <- with(df, plogis(predict(m2_d, dfs1)) - plogis(predict(m2_d, dfs0)))
MEsexeduc <- with(df, dlogis(predict(m2_d, dfs1)) * (b_educ + b_sex_educ * 1 + b_age_educ * age) - dlogis(predict(m2_d,
dfs0)) * (b_educ + b_sex_educ * 0 + b_age_educ * age))
AMEsexeduc <- mean(MEsexeduc)
AMEsexeduc
#> [1] -0.00386123
Numerical
approach
s <- 0.001 #define a small step
# define datasets
dfplusplus <- dfplusmin <- dfminplus <- dfminmin <- df
# add the small step to the variables
dfplusmin$sex <- dfplusplus$sex <- "women"
dfminmin$sex <- dfminplus$sex <- "men"
dfplusplus$educ <- dfminplus$educ <- df$educ + s
dfplusmin$educ <- dfminmin$educ <- df$educ - s
# calculate the predicted probabilities
p11 <- plogis(predict(m2_d, dfplusplus))
p10 <- plogis(predict(m2_d, dfplusmin))
p01 <- plogis(predict(m2_d, dfminplus))
p00 <- plogis(predict(m2_d, dfminmin))
# and the marginal effects. be aware of all the brackets. :-(
am <- ((p11 - p01) - (p10 - p00))/(2 * s)
mean(am)
#> [1] -0.00386123
Let us plot these interaction effects against the fitted values.
plot(fitted.values(m2_d), MEsexeduc, col = colors)
You see that at some points the interaction is negative and at some
it is positive.
Check with R2
results <- intEff(m2_d, vars = c("age", "educ"), data = df)
mean(results$byobs$int$int_eff)
#> [1] -0.01726894
quite close.
results <- intEff(m2_d, vars = c("sex", "educ"), data = df)
mean(results$byobs$int$int_eff)
#> [1] 0.04301771
This is not the same as above, probably because function cannot
handle double interactions??
Take home message
When you use a non-linear model, the interpretation of marginal
effects is difficult.
Only when you have a non-linear model, the use of marginal effects
makes sense.
MEM: How useful are they?
The absence or presence of an estimated interaction term does not
say much about the presence or absence of a (marginal) interaction
effect when your link function is not linear.
R does not provide packages/functions that calculate marginal
effects for interaction effects for non-linear models.
SE of marginal
effects.
Well, you made it this far in the tutorial. The goal was to show you
the underlying idea and interpretation of marginal effects. Now, that
you understand, you can apply the standard packages in R to calculate
the marginal effects for you, like margin
. There is,
however, one caveat. These packages are not able to calculate (correct)
marginal effects of interaction effects and not all non-linear models
are supported. This means you sometimes have to calculate the marginal
effects yourself. Which can be a challenge but as you see, is doable.
But if you calculate the ME yourself you might also want to know the
standard error (SE) or confidence-interval (CI) of the marginal
effect.
There are two common approaches:
Delta method
Bootstrapping
The Delta method is default in most packages but it is quite crude.
Bootstrapping is slow but quite easy to implement yourself. For more
background reading on bootstrapping you could start here:
Davison, A.C. and Hinkley, D.V. (1997) Bootstrap Methods and
Their Application. Cambridge University Press.
Example 1: Linear
without interactions
# we need to define a function in which we calculate our marginal effects. This is quite easy
# because it is simply copy paste from the above.
bootFunc <- function(data, i) {
df <- data[i, ] #these are the df we will calculate our statistics of interest
m1 <- lm(y_lin ~ sex + age + educ, data = df) #we first estimate the model on the generated datasets
c(coef(m1)[2], coef(m1)[3], coef(m1)[4]) #we save our statistics of interest in a vector
}
b <- boot(df, bootFunc, R = 999) #we feet the boot function our original dataset and our function
b #names are not very nice. Well, you will manage.
# boot.ci(b, index=1) #you could ask for the CI summary(margins(m1, vce='bootstrap',
# iterations=999)) #SE via bootstrapping. Please notice the order of the variabels is messed up.
# #you could check with if margins gives the same results. Spoiler alert: YES!
#>
#> ORDINARY NONPARAMETRIC BOOTSTRAP
#>
#>
#> Call:
#> boot(data = df, statistic = bootFunc, R = 999)
#>
#>
#> Bootstrap Statistics :
#> original bias std. error
#> t1* 8.831865 -0.0045442056 0.13365235
#> t2* 1.053627 -0.0014882377 0.05228616
#> t3* 1.713989 -0.0006844387 0.04880639
Okay, our estimates of the AME and the SE are very close. Our boot
function is a lot faster than the function implemented in
margin
. Don’t know why.
Example 2: Logistic
model with interactions
ddlogis <- function(X) {
-(exp(-X)/(1 + exp(-X))^2 - exp(-X) * (2 * (exp(-X) * (1 + exp(-X))))/((1 + exp(-X))^2)^2)
} #dont forget we need this one as well.
bootFunc <- function(data, i) {
df <- data[i, ] #bootrap datasets
m2_d <- glm(y_di ~ sex + age + educ + sex:educ + age:educ, family = binomial, data = df) #estimate model
# retrieve coefficients
b_intercept <- coef(m2_d)[1]
b_sex <- coef(m2_d)[2]
b_age <- coef(m2_d)[3]
b_educ <- coef(m2_d)[4]
b_sex_educ <- coef(m2_d)[5]
b_age_educ <- coef(m2_d)[6]
# AME sex
newdata_sex0 <- df
newdata_sex0$sex <- "men" #use the correct factor levels!!
newdata_sex1 <- df
newdata_sex1$sex <- "women" #use the correct factor levels!!
ME_sex <- plogis(predict(m2_d, newdata = newdata_sex1)) - plogis(predict(m2_d, newdata = newdata_sex0))
AME_sex <- mean(ME_sex)
# AME age
ME_age <- with(df, dlogis(predict(m2_d)) * (b_age + b_age_educ * educ))
AME_age <- mean(ME_age)
# AME educ
ME_educ <- with(df, dlogis(predict(m2_d)) * (b_educ + b_sex_educ * (sex == "women") + b_age_educ *
age))
AME_educ <- mean(ME_educ)
# AME interaction effect sex*educ
MEsexeduc <- with(df, dlogis(predict(m2_d, dfs1)) * (b_educ + b_sex_educ * 1 + b_age_educ * age) -
dlogis(predict(m2_d, dfs0)) * (b_educ + b_sex_educ * 0 + b_age_educ * age))
AMEsexeduc <- mean(MEsexeduc)
# AME interaction effect age*educ
MEeducage <- with(df, ddlogis(predict(m2_d)) * (b_educ + b_age_educ * age) * (b_age + b_age_educ *
educ) + dlogis(predict(m2_d)) * (b_age_educ))
AMEeducage <- mean(MEeducage)
c(AME_sex, AME_age, AME_educ, AMEsexeduc, AMEeducage) #save results
}
b <- boot(df, bootFunc, R = 999)
b <- summary(b)
rownames(b) <- c("AME_sex", "AME_age", "AME_educ", "AME_sex*educ", "AME_educ*age")
b
#>
#> Number of bootstrap replications R = 999
#> original bootBias bootSE bootMed
#> AME_sex 0.2700221 -3.4979e-03 0.0484563 0.2666374
#> AME_age -0.0607140 -4.4300e-04 0.0137299 -0.0611598
#> AME_educ -0.0099929 -7.8897e-05 0.0115793 -0.0103263
#> AME_sex*educ -0.0038612 1.0225e-02 0.0233875 0.0051893
#> AME_educ*age -0.0172689 -4.7421e-04 0.0041145 -0.0176206
Copyright © 2022 Jochem Tolsma
---
title: "Tutorial marginal effects"
author: '[Jochem Tolsma](https://www.jochemtolsma.nl) - Radboud University / University of Groningen, the Netherlands'
date: "Last compiled on `r format(Sys.time(), '%B, %Y')`"
output:
  html_document:
    toc:  true
    toc_float: true
    number_sections: true
    code_folding: show
    code_download: yes
    theme: flatly
    highlight: default
    
---


```{r globalsettings, echo=FALSE, warning=FALSE, message=FALSE}
library(knitr)
opts_chunk$set(tidy.opts=list(width.cutoff=100),tidy=TRUE, warning = FALSE, message = FALSE,comment = "#>", cache=TRUE, echo=TRUE, class.source=c("test"), class.output=c("test2"))
options(width = 100)
library(rgl)
rgl::setupKnitr()
knitr::knit_hooks$set(webgl = hook_webgl)
```

```{r colorize, echo=FALSE}
colorize <- function(x, color) {
  if (knitr::is_latex_output()) {
    sprintf("\\textcolor{%s}{%s}", color, x)
  } else if (knitr::is_html_output()) {
    sprintf("<span style='color: %s;'>%s</span>", color, 
            x)
  } else x
}

```

```{r klippy, echo=FALSE, include=TRUE}
klippy::klippy(position = c('top', 'right'))
#klippy::klippy(color = 'darkred')
#klippy::klippy(tooltip_message = 'Click to copy', tooltip_success = 'Done')
```

```{css style settings, echo = FALSE}
.test {
  max-height: 300px;
  overflow-y: auto;
  overflow-x: auto;
  margin: 0px;
}

.test2 {
  max-height: 300px;
  overflow-y: auto;
  overflow-x: auto;
  margin: 0px;
  #background-color: white;
  color: rgb(201, 76, 76);
}


h1, .h1, h2, .h2, h3, .h3 {
  margin-top: 24px;
}



blockquote {
    padding: 10px 20px;
    margin: 0 0 20px;
    font-size: 14px;
    border-left: 5px solid #eee;
    background-color: rgb(255,255,224,1);
}


.button1 {
  background-color: grey; /* Red */ 
  border: 2px solid black;
  color: white;
  padding: 15px 32px;
  text-align: center;
  text-decoration: none;
  display: inline-block;
  font-size: 16px;
  margin: 4px 2px;
  cursor: pointer;
  /* border-radius: 12px; */
  width: 100%;
}

.button1:hover {
  box-shadow: 0 12px 16px 0 rgba(0,0,0,0.24), 0 17px 50px 0 rgba(0,0,0,0.19);
}

.button1:active {
  border: 2px solid red;
}



```




---


#  Intro


In this tutorial I will try to explain the logic behind marginal effects.  

I will provide you with practical examples. And give you some exercises. 

To copy the code click the button in the upper right corner of the code-chunks.

Questions can be addressed to [Jochem Tolsma](mailto:jochem.tolsma@ru.nl).

---  



## Custom functions

- `package.check`: Check if packages are installed (and install if not) in R ([source](https://vbaliga.github.io/verify-that-r-packages-are-installed-and-loaded/)). 



```{r, results='hide', echo=TRUE}
fpackage.check <- function(packages) {
  lapply(packages, FUN = function(x) {
    if (!require(x, character.only = TRUE)) {
      install.packages(x, dependencies = TRUE)
      library(x, character.only = TRUE)
    }
  })
}


```

---  

## Packages


- `tidyverse`: if you can't base them, join them  
- `miniGUI`: 
- `rgl`: plotting stuf   
- `Deriv`: to let R calculate derivatives  
- `margin`: an R package that will calculate margins for you  
- `mvtnorm`: to simulate data (with correlated covariates)  
- `boot`: to calculate SE/CI of the marginal effects

```{r, results='hide', echo=TRUE}
packages = c("tidyverse", "miniGUI", "rgl", "Deriv", "margins", "mvtnorm", "boot", "DAMisc")

fpackage.check(packages)

```


## Session info

This is my setup. 

```{r}
sessionInfo()
```

--- 

## Background reading and theory

Ai and Norton, 2003. Interaction terms in logit and probit models : https://doi.org/10.1016/S0165-1765(03)00032-6


### Marginal effect definition  {#defs}

What is a marginal effect? If you google, you will find this definition: 

> Marginal effects measure the impact that an instantaneous change in one variable has on the outcome variable while all other variables are held constant. 

But I would say the definition depends on the measurement level of your independent variable. 

For a dichotomous independent variable: 

* `r colorize("The change in your dependent variable if (ceteris paribus) your independent variable x1 changes from 0 to 1", "red")`. 

More formally: 

$$ ME = f(X|x1=1) - f(X|x1=0), $$ 

where $Y = f(X)$ and $X$ is your set of covariates. 

For a metric independent variable:  

* `r colorize("The change in your dependent variable, Y, at some value, x,  of your independent variable x1, per (ceteris paribus) small change of x1 at x of s.", "red")`

More formally: 

$$ ME = \frac{f(X|x1=x + s) - f(X|x1=x - s)}{2s}, $$
where $s$ is an infinitesimal small number. 

This is thus the slope (or more precisely the partial derivative) of $f(X)$ at the point where $x1$ is $x$: 

$$ ME = f^{'}(X|x1=x), $$
and $$ f^{'}(X)  = \displaystyle \frac{\partial f(X)}{\partial x1} $$


Where $X$ is your set of covariates, $x1$ is one of your covariates, $x$ is a value of your covariate $x1$. 


### Average Marginal Effect / Marginal Effect at Mean

Because you can calculate the ME at different values of your other covariates, you can calculate many MEs. 

If you calculate the ME for all respondents in your dataset (and their observed values of covariates) and take the mean, you will end up with the **AME**, the Average Marginal Effect.  
If you calculate the ME for covariate values set at their mean, you will end up with the **MEM**, the Marginal Effect at Mean (or MME, Mean Marginal Effect). 

### Marginal interaction effect

Sometimes this is incorrectly defined as:

$$ \displaystyle \frac{\partial f(X)}{\partial (x1x2)} $$
Correctly would be:

$$ \displaystyle \frac{\partial f^2(X)}{\partial(x1)\partial(x2)} $$

`r colorize ("Interpretation: How does the effect of $x1$ on $Y$ change with a change in $x2$, conditional on $X$.", "red")` 

Naturally, you have analogous definitions if $x1$ and/or $x2$ are dichotomous. 

---  



## Simulated data  

```{r}
set.seed(227863) # so we all have the same data

# we start with simulating some correlated covariates
sigma <- matrix(c(4,2,3, 2,3,1.5, 3,1.5,5), ncol=3)
#sigma
df <- data.frame(rmvnorm(n=500, mean=c(6,8,7), sigma=sigma))
colnames(df) <- c("age", "sex", "educ")

df %>% 
  mutate(sex = ifelse(sex > mean(sex) + 0.3*sd(sex), 0,1), 
         sex = as.factor(sex),
         sex = recode_factor(sex, '0'="men", '1'="women"), 
         age = round(age, 2), 
         educ = round(educ, 2)) -> df

# Let us add a linear dependent variable with the following formula
df %>% 
  mutate(y_lin = 10 + 5*as.numeric(sex) + 2.4*age + 2*educ + 0.5*as.numeric(sex)*educ + -0.2*age*educ + rnorm(500, 0, .5)) -> df

head(df)

```
<br>

This was hopefully, all quite straightforward. But let us also simulate a dichotomous dependent variable. 


Thus we would like to simulate a dichotomous dependent variable (0/1), from a Bernoulli distribution, with a logit link function. So we can illustrate marginal effects with logistic regression model. 

The log-odds of the probability of an event is a linear combination of the independent variables

Please remember, the logistic function is:  

$$ f(X) = P(Y=1|X) = n(X) = \pi_x = \frac{e^X}{(1 + e^X)}  = \frac{1} {(1+e^{-X})} $$


We can easily calculate $P(Y=1|X)$ in R if we know $X$ via the function `plogis(X)`.  

<!--- 
do i need this:  apply the link function (inverse logistic function / logit / in R: qlogis(pi_x)): ln(pi_x / 1 - pi_x) = X
---> 

And if we have $P(Y=1|X)$ we can simulate an observed variable according to a binomial distribution: 

$$Y_{di} \sim {\sf Binom}(1, P(Y=1|X))$$

```{r}

df %>%
  mutate(y_di = rbinom(n=500, size=1, prob=plogis(-2.4 + .3*as.numeric(sex) + .4*age + .3*educ + .1*as.numeric(sex)*educ + -0.1*age*educ))) -> df

table(df$sex, df$y_di)
head(df)
```

---  


# Linear model

Estimate the model  

```{r }

m1 <- lm(y_lin ~ sex + age + educ, data=df)
summary(m1)

```

<br>
And plot the model.
```{r , webgl=TRUE}

colors <- ifelse(df$sex=="men", "red", "blue")

with(df, plot3d(educ, age, predict(m1), col=colors))
```


## Marginal effects of age. 

We know that $Y = intercept + b_sex*sex + b_age*age + b_educ*educ$. 
We know that age is a continuous variable. 
Thus we calculate the derivative (no worries we will use the function `D`) of Y with respect to age. 

```{r}
ME_age <- D(expression(intercept + b_sex*sex + b_age*age + b_educ*educ), "age") 
ME_age
```
Find the value of this coefficient: 

```{r}

#coef(m1)
b_intercept <- coef(m1)[1]
b_sex <- coef(m1)[2]
b_age <- coef(m1)[3]
b_educ <- coef(m1)[4]
b_age
```
<br> 

> Exercise: Calculate the average marginal effect for age by using the numerical approach to calculate the derivative


<!--- include a button here ---> 

<script>
function myFunction() {

            var btn = document.getElementById("myButton");
            //to make it fancier
            if (btn.value == "Click to Hide Answer") {
                btn.value = "Show Answer";
                btn.innerHTML = "Show Answer";
            }
            else {
                btn.value = "Click to Hide Answer";
                btn.innerHTML = "Hide Answer";
            }
            //this is what you're looking for
            var x = document.getElementById("myDIV");
            if (x.style.display === "none") {
                x.style.display = "block";
            } else {
                x.style.display = "none";
            }
        }
          
</script>



<button class=button1 onclick="myFunction()" id="myButton" value="Click to Show Answer">Show Answer</button>

<div style="display:none;" id="myDIV">
<br>



```{r}
s <- .01 #define a small step

dfmin <- dfplus <- df #let us copy the datasets
dfplus$age <- df$age + s #we add a small step to age in one of the datasets
dfmin$age <- df$age - s #we substract a small step to age in one of the datasets

ysplus <- predict(m1, newdata=dfplus) #we calculate the predicted values based on our model parameters and new values of covariates
ysmin <- predict(m1, newdata=dfmin) #we calculate the predicted values based on our model parameters and new values of covariates

ME_age <- (ysplus - ysmin) / (2*s) #calculate the MEs

AME_age <- mean(ME_age) #calculate the AME

AME_age
```
</div>

## Marginal effects of sex.

Remember the definition? 

```{r}
ME_sex <- with(df, (b_intercept + b_sex*1 + b_age*age + b_educ*educ) - (b_intercept + b_sex*0 + b_age*age + b_educ*educ))
ME_sex
```

A bit shorter would be: 
```{r}
dfnews0 <- dfnews1 <- df 
#construct two datasets
dfnews0$sex <- "men"
dfnews1$sex <- "women"
#let R calculate the predicted scores
mean(predict(m1, dfnews1) - predict(m1, dfnews0))
```
<br> 
So, lets calculate the AME: `mean(ME_sex)` = `r mean(ME_sex)`. 

And what is the interpretation? Well, let us plot the values of the dependent variable, if we hold education constant and vary age, for `r colorize("men", "red")` and `r colorize("women", "blue")`. 

```{r}
plot(c(-10:10), b_intercept + b_sex*1 + b_age*c(-10:10) + b_educ*mean(df$educ), col="red", type= "b", xlab="age", ylab="y_lin", xlim=c(-10,10), ylim=c(0, 60))
par(new=TRUE)
plot(c(-10:10), b_intercept + b_sex*0 + b_age*c(-10:10) + b_educ*mean(df$educ), col="blue", type= "b", xlab="", ylab="", xlim=c(-10,10), ylim=c(0, 60))

```
<br> 

You see, that at all values of age, the difference between `r colorize("men", "red")` and `r colorize("women", "blue")` is AME.  

## check with R

```{r}
summary(margins(m1)) #SE via deltamethod
summary(margins(m1, vce="bootstrap", iterations=999)) #SE via bootstrapping
```

---  

# Logistic model

```{r}
m1_d <- glm(y_di ~ sex + age + educ, data=df, family=binomial)
summary(m1_d)
```
<br>

Let us have a look at the model: 

```{r, webgl=TRUE}

#with(df, plot3d(sex, age, fitted.values(m1_d), col=colors))
with(df, plot3d(educ, age, fitted.values(m1_d), col=colors))
```


## Marginal effects of age  

Please remember, the logistic function is:  

$$ f(X) =  \frac{1} {(1+e^{-X})} $$


And the definition of the ME for a continuous variable $x1$: 


$$ ME = f^{'}(X|x1=x), $$

Thus we need to find $f^{'}(X)$ with respect to $x1$ (i.e., age). 

You could do that yourself by using the chain-rule: 

$$\displaystyle \frac{\partial f(X)}{\partial (x1)} = \displaystyle \frac{\partial f(X)}{\partial (X)}*\displaystyle \frac{\partial (X)}{\partial (x1)}$$

... or let R help you. 

```{r}
ME_age_f2 <- D(expression(1 / (1 + exp(-1*(b_intercept + b_sex*sex + b_age*age + b_educ*educ)))), "age") 
ME_age_f2

```
<br>

Luckily for us, the derivative of `plogis(X)` is `dlogis(X)`. 

You wanna check? simply run: `?dlogis`. 


Retrieve the coefficients: 

```{r}
coef(m1_d)
b_intercept <- coef(m1_d)[1]
b_sex <- coef(m1_d)[2]
b_age <- coef(m1_d)[3]
b_educ <- coef(m1_d)[4]
```
And fill in the formula: 

```{r}
ME_age <- with(df, dlogis(b_intercept + b_sex*(sex=="women")+ b_age*age + b_educ*educ)*b_age)
#shorter would be: 
ME_age <- dlogis(predict(m1_d))*b_age
head(ME_age)
```
So, you notice - hopefully - that that the ME_age is different for each respondent. Let us take the mean. 


**AME_age** = `r mean(ME_age)`.:

```{r}
AME_age <- mean(ME_age)
AME_age
```


## marginal effect sex. 

Remember $ME_{sex} = f(X|sex=1) - f(X|sex=0)$. 

And once again simply fill in the formula. 

```{r}
ME_sex <- with(df, plogis(b_intercept + b_sex*1+ b_age*age + b_educ*educ) - plogis(b_intercept + b_sex*0+ b_age*age + b_educ*educ))
AME_sex <- mean(ME_sex)
AME_sex
```

or,...
```{r}
#new datasets
dfs0 <- dfs1 <- df
dfs0$sex <- "men"
dfs1$sex <- "women"
ME_sex <- plogis(predict(m1_d, dfs1)) - plogis(predict(m1_d, dfs0))
AME_sex <- mean(ME_sex)
AME_sex
```

Let us plot the marginal effect at different values of the other covariates. 

```{r , webgl=TRUE}
with(df, plot3d(educ, age, ME_sex, col=colors))
```

AME_sex = `r AME_sex`. 

## Check with R. 

```{r}
#okay, now let's use R to calculate the marginal effects
summary(margins(m1_d)) #make sure that variable sex is defined as a factor not as a numeric variable. 
```
Everything seems to be okay! 

---   

# Linear model with interactions

Let us estimate a better fitting model. 

```{r}
m2 <- lm(y_lin ~ sex + age + educ + sex:educ + age:educ, data=df)
#note that we define the interaction within the formula. This is crucial!!
summary(m2)
```

Retrieve coefficients. 

```{r}
coef(m2)
b_intercept <- coef(m2)[1]
b_sex <- coef(m2)[2]
b_age <- coef(m2)[3]
b_educ <- coef(m2)[4]
b_sex_educ <- coef(m2)[5]
b_age_educ <- coef(m2)[6]
``` 

## marginal effect sex

Let's use power of R. 

```{r}
ME_sex <- predict(m2, newdata=dfs1) - predict(m2, newdata=dfs0)
AME_sex <- mean(ME_sex)
AME_sex
```


## marginal effect age  

```{r}
ME_age_f <- expression(b_age + b_age_educ*educ) #this is the derivative with respect to age
ME_age <- with(df, eval(ME_age_f)) #Do you see I do not fit all variables myself but use this convenient function eval()
AME_age <- mean(ME_age)
AME_age
```

## Marginal effect education  

```{r}
ME_educ_f <- expression(b_educ + b_age_educ*age + b_sex_educ*(sex=="women"))
ME_educ <- with(df, eval(ME_educ_f))
AME_educ <- mean(ME_educ)
AME_educ
```

## Let us check with R

```{r}
summary(margins(m2))
```



## Marginal interaction effect educ*age

First calculate derivative to age and then to educ. 

```{r}

fx <- expression(intercept + b_sex*sex + b_age*age + b_educ*educ + b_sex_educ*sex*educ + b_age_educ*age*educ)
dfxdage <- D(fx, "age")
d2fxdagededuc <- D(dfxdage, "educ")
#check if the variable sex is in our expression we then need to be careful because it is factor. but it is not. 
AME_age_educ <- mean(with(df, eval(d2fxdagededuc) ))
AME_age_educ
```
And this is (of course) our regression coefficient for the interaction. 

## Marginal interaction effect sex*educ

First calculate derivative to sex and then to educ. 

```{r}
fx <- expression(intercept + b_sex*sex + b_age*age + b_educ*educ + b_sex_educ*sex*educ + b_age_educ*age*educ)
dfxdsex <- expression(intercept + b_sex*1 + b_age*age + b_educ*educ + b_sex_educ*1*educ + b_age_educ*age*educ -(intercept + b_sex*0 + b_age*age + b_educ*educ + b_sex_educ*0*educ + b_age_educ*age*educ))
dfxdsex
dfxdsex <- Simplify(dfxdsex)
dfxdsex
d2fxdsexdeduc <- D(dfxdsex, "educ")
d2fxdsexdeduc
#this is just a constant, namely our interaction coefficient
eval(d2fxdsexdeduc)
```

---  


# Logistic with model interactions

Let's estimate a better model

```{r}
m2_d <- glm(y_di ~ sex + age + educ + sex:educ + age:educ, family=binomial, data=df)
summary(m2_d)
```
Plotting the estimated model is very insightful. 

```{r, results="hold", webgl=TRUE }
with(df, plot3d(sex, age, fitted.values(m2_d), col=colors))
#with(df, plot3d(educ, age, fitted.values(m2_d), col=colors))
#with(df, plot3d(educ[sex=="men"], age[sex=="men"], fitted.values(m2_d)[sex=="men"], col=colors[sex=="men"]))
#with(df, plot3d(educ[sex=="women"], age[sex=="women"], fitted.values(m2_d)[sex=="women"], col=colors[sex=="women"]))
```

Retrieve coefficients. 

```{r}
coef(m2_d)
b_intercept <- coef(m2_d)[1]
b_sex <- coef(m2_d)[2]
b_age <- coef(m2_d)[3]
b_educ <- coef(m2_d)[4]
b_sex_educ <- coef(m2_d)[5]
b_age_educ <- coef(m2_d)[6]
```

## marginal effect sex

Let's use power of R

```{r}
ME_sex <- plogis(predict(m2_d, newdata=dfs1)) - plogis(predict(m2_d, newdata=dfs0))
AME_sex <- mean(ME_sex)
AME_sex
```

## marginal effect age  

```{r}
ME_age <- with(df, dlogis(predict(m2_d))*(b_age + b_age_educ*educ))
AME_age <- mean(ME_age)
AME_age
```

## marginal effect educ 

```{r}
ME_educ <- with(df, dlogis(predict(m2_d))*(b_educ + b_sex_educ*(sex=="women") + b_age_educ*age)) 
AME_educ <- mean(ME_educ)
AME_educ
```

## check with R

```{r}
summary(margins(m2_d))

```


## Marginal interaction effects.

`r colorize("Warning, calculating the derivatives may be difficult depending on f(x), and your mathematical background ;-)", "red")`

We need the double derivative of $f(x)$. 
We will call this function `ddlogis`.

```{r}
# f(X) = P(Y=1|X) = n(X) = exp(X) / (1 + exp(X))  = 1 / (1+exp(-X))
nX <- expression(1 / (1+exp(-X))) 
#similar as dlogis
dnXdx <- D(nX, "X")
#higher order derivative 
d2nxdxdx <- D(dnXdx, "X")
#d2nxdxdx
ddlogis <- function(X){-(exp(-X)/(1 + exp(-X))^2 - exp(-X) * (2 * (exp(-X) * (1 + exp(-X))))/((1 +     exp(-X))^2)^2)}
```

## marginal interaction effect educ*age

```{r}
MEeducage <- with(df, ddlogis(predict(m2_d))*(b_educ + b_age_educ*age)*(b_age + b_age_educ * educ) + dlogis(predict(m2_d))*(b_age_educ))

AMEeducage <- mean(MEeducage)
AMEeducage
```
### Numerical approach


```{r}
s <- .001 #define a small step

#define datasets
dfplusplus <- dfplusmin <- dfminplus <- dfminmin <- df 
#add the small step to the variables
dfplusmin$age <- dfplusplus$age <- df$age + s 
dfminmin$age <- dfminplus$age <- df$age - s
dfplusplus$educ <- dfminplus$educ <- df$educ + s
dfplusmin$educ <- dfminmin$educ <- df$educ - s

#calculate the predicted probabilities
p11 <- plogis(predict(m2_d, dfplusplus))
p10 <- plogis(predict(m2_d, dfplusmin))
p01 <- plogis(predict(m2_d, dfminplus))
p00 <- plogis(predict(m2_d, dfminmin))

#and the marginal effects. be aware of all the brackets. :-(
am <- (((p11 - p01)/(2*s)) - ((p10 - p00)/(2*s))) / (2*s)
mean(am)
```



<br>

> Exercise: Suppose that all estimated interaction terms are 0 in model m2_d what would be the marginal effect of the interaction effect of education with age according to the above? Thus what would be: MEeducage. 


## marginal interaction effect sex*educ

First calculate derivative to sex and then to educ

```{r}
dfxdsex <- with(df, plogis(predict(m2_d, dfs1)) - plogis(predict(m2_d, dfs0)))

MEsexeduc  <- with(df, dlogis(predict(m2_d, dfs1))*(b_educ + b_sex_educ*1 + b_age_educ*age) - dlogis(predict(m2_d, dfs0))*(b_educ + b_sex_educ*0 + b_age_educ*age))  

AMEsexeduc <- mean(MEsexeduc)
AMEsexeduc

``` 
### Numerical approach


```{r}
s <- .001 #define a small step

#define datasets
dfplusplus <- dfplusmin <- dfminplus <- dfminmin <- df 
#add the small step to the variables
dfplusmin$sex <- dfplusplus$sex <- "women"
dfminmin$sex <- dfminplus$sex <- "men"
dfplusplus$educ <- dfminplus$educ <- df$educ + s
dfplusmin$educ <- dfminmin$educ <- df$educ - s

#calculate the predicted probabilities
p11 <- plogis(predict(m2_d, dfplusplus))
p10 <- plogis(predict(m2_d, dfplusmin))
p01 <- plogis(predict(m2_d, dfminplus))
p00 <- plogis(predict(m2_d, dfminmin))

#and the marginal effects. be aware of all the brackets. :-(
am <- ((p11 - p01) - (p10 - p00)) / (2*s)
mean(am)

```
Let us plot these interaction effects against the fitted values. 

```{r}
plot(fitted.values(m2_d), MEsexeduc, col=colors)
```
<br>
You see that at some points the interaction is negative and at some it is positive.


## Check with R2 


```{r}
results<- intEff(m2_d, vars=c("age", "educ"), data=df)
mean(results$byobs$int$int_eff)
```
quite close. 

```{r}
results<- intEff(m2_d, vars=c("sex", "educ"), data=df)
mean(results$byobs$int$int_eff)

```
This is not the same as above, probably because function cannot handle double interactions??  


---  

# Take home message

* When you use a non-linear model, the interpretation of marginal effects is difficult.  
* Only when you have a non-linear model, the use of marginal effects makes sense.  
* MEM: How useful are they?  
* The absence or presence of an estimated interaction term does not say much about the presence or absence of a (marginal) interaction effect when your link function is not linear. 
* R does not provide packages/functions that calculate marginal effects for interaction effects for non-linear models. 

---  

# SE of marginal effects. 

Well, you made it this far in the tutorial. The goal was to show you the underlying idea and interpretation of marginal effects. Now, that you understand, you can apply the standard packages in R to calculate the marginal effects for you, like `margin`. 
There is, however, one caveat. These packages are not able to calculate (correct) marginal effects of interaction effects and not all non-linear models are supported. This means you sometimes have to calculate the marginal effects yourself. Which can be a challenge but as you see, is doable. But if you calculate the ME yourself you might also want to know the standard error (SE) or confidence-interval (CI) of the marginal effect. 

There are two common approaches:  

- Delta method  
- Bootstrapping  

The Delta method is default in most packages but it is quite crude. 
Bootstrapping is slow but quite easy to implement yourself. For more background reading on bootstrapping you could start here: 

Davison, A.C. and Hinkley, D.V. (1997) *Bootstrap Methods and Their Application.* Cambridge University Press.



## Example 1: Linear without interactions

```{r, results='hold'}
#we need to define a function in which we calculate our marginal effects. This is quite easy because it is simply copy paste from the above. 

bootFunc <- function(data, i){
  df <- data[i,] #these are the df we will calculate our statistics of interest
  m1 <- lm(y_lin ~ sex + age + educ, data=df) #we first estimate the model on the generated datasets
  c(coef(m1)[2], coef(m1)[3], coef(m1)[4]) #we save our statistics of interest in a vector
}

b <- boot(df, bootFunc, R = 999) #we feet the boot function our original dataset and our function
b #names are not very nice. Well, you will manage. 

#boot.ci(b, index=1) #you could ask for the CI
#summary(margins(m1, vce="bootstrap", iterations=999)) #SE via bootstrapping. Please notice the order of the variabels is messed up. #you could check with if margins gives the same results. Spoiler alert: YES!
```
<br> 

Okay, our estimates of the AME and the SE are very close. Our boot function is a lot faster than the function implemented in `margin`. Don't know why. 

## Example 2: Logistic model with interactions
```{r}

ddlogis <- function(X){-(exp(-X)/(1 + exp(-X))^2 - exp(-X) * (2 * (exp(-X) * (1 + exp(-X))))/((1 +     exp(-X))^2)^2)} #dont forget we need this one as well. 

bootFunc <- function(data, i){
  df <- data[i,] #bootrap datasets
  
  m2_d <- glm(y_di ~ sex + age + educ + sex:educ + age:educ, family=binomial, data=df) #estimate model
  
  #retrieve coefficients
  b_intercept <- coef(m2_d)[1]
  b_sex <- coef(m2_d)[2]
  b_age <- coef(m2_d)[3]
  b_educ <- coef(m2_d)[4]
  b_sex_educ <- coef(m2_d)[5]
  b_age_educ <- coef(m2_d)[6]
  
  #AME sex
  newdata_sex0 <- df
  newdata_sex0$sex <- "men" #use the correct factor levels!!
  newdata_sex1 <- df
  newdata_sex1$sex <- "women" #use the correct factor levels!!
  ME_sex <- plogis(predict(m2_d, newdata=newdata_sex1)) - plogis(predict(m2_d, newdata=newdata_sex0))
  AME_sex <- mean(ME_sex)
  
  #AME age
  ME_age <- with(df, dlogis(predict(m2_d))*(b_age + b_age_educ*educ))
  AME_age <- mean(ME_age)
  
  #AME educ
  ME_educ <- with(df, dlogis(predict(m2_d))*(b_educ + b_sex_educ*(sex=="women") + b_age_educ*age)) 
  AME_educ <- mean(ME_educ)

  #AME interaction effect sex*educ
  MEsexeduc  <- with(df, dlogis(predict(m2_d, dfs1))*(b_educ + b_sex_educ*1 + b_age_educ*age) - dlogis(predict(m2_d, dfs0))*(b_educ + b_sex_educ*0 + b_age_educ*age))  
  AMEsexeduc <- mean(MEsexeduc)
  
  #AME interaction effect age*educ
  MEeducage <- with(df, ddlogis(predict(m2_d))*(b_educ + b_age_educ*age)*(b_age + b_age_educ * educ) + dlogis(predict(m2_d))*(b_age_educ))
  AMEeducage <- mean(MEeducage)

  
  c(AME_sex, AME_age, AME_educ, AMEsexeduc, AMEeducage) #save results
}

b <- boot(df, bootFunc, R = 999)


b<- summary(b)
rownames(b) <- c("AME_sex", "AME_age", "AME_educ", "AME_sex*educ", "AME_educ*age")
b
```




<style>
.center {
  text-align: center;
  color: red;
}
</style>

<hr>
<br>
<p class="center">Copyright &copy; 2022 Jochem Tolsma</p>

