1 Intro

In this tutorial I will try to explain the logic behind marginal effects.

I will provide you with practical examples. And give you some exercises.

To copy the code click the button in the upper right corner of the code-chunks.

Questions can be addressed to Jochem Tolsma.


1.1 Custom functions

  • package.check: Check if packages are installed (and install if not) in R (source).
fpackage.check <- function(packages) {
    lapply(packages, FUN = function(x) {
        if (!require(x, character.only = TRUE)) {
            install.packages(x, dependencies = TRUE)
            library(x, character.only = TRUE)
        }
    })
}

1.2 Packages

  • tidyverse: if you can’t base them, join them
  • miniGUI:
  • rgl: plotting stuf
  • Deriv: to let R calculate derivatives
  • margin: an R package that will calculate margins for you
  • mvtnorm: to simulate data (with correlated covariates)
  • boot: to calculate SE/CI of the marginal effects
packages = c("tidyverse", "miniGUI", "rgl", "Deriv", "margins", "mvtnorm", "boot", "DAMisc")

fpackage.check(packages)

1.3 Session info

This is my setup.

sessionInfo()
#> R version 4.2.0 (2022-04-22 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19042)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=Dutch_Netherlands.utf8  LC_CTYPE=Dutch_Netherlands.utf8   
#> [3] LC_MONETARY=Dutch_Netherlands.utf8 LC_NUMERIC=C                      
#> [5] LC_TIME=Dutch_Netherlands.utf8    
#> 
#> attached base packages:
#> [1] tcltk     stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] DAMisc_1.7.2    boot_1.3-28     mvtnorm_1.1-3   margins_0.3.26  Deriv_4.1.3     miniGUI_0.8-1  
#>  [7] forcats_0.5.1   stringr_1.4.0   dplyr_1.0.9     purrr_0.3.4     readr_2.1.2     tidyr_1.2.0    
#> [13] tibble_3.1.7    ggplot2_3.4.0   tidyverse_1.3.1 rgl_0.109.2     knitr_1.39     
#> 
#> loaded via a namespace (and not attached):
#>   [1] VGAM_1.1-6          minqa_1.2.4         colorspace_2.0-3    ellipsis_0.3.2     
#>   [5] snakecase_0.11.0    base64enc_0.1-3     fs_1.5.2            optiscale_1.2.2    
#>   [9] rstudioapi_0.13     DT_0.23             fansi_1.0.3         lubridate_1.8.0    
#>  [13] xml2_1.3.3          codetools_0.2-18    splines_4.2.0       extrafont_0.18     
#>  [17] effects_4.2-1       jsonlite_1.8.0      nloptr_2.0.3        broom_0.8.0        
#>  [21] Rttf2pt1_1.3.10     dbplyr_2.2.0        png_0.1-7           compiler_4.2.0     
#>  [25] httr_1.4.3          backports_1.4.1     assertthat_0.2.1    Matrix_1.4-1       
#>  [29] fastmap_1.1.0       survey_4.1-1        cli_3.3.0           formatR_1.12       
#>  [33] htmltools_0.5.2     tools_4.2.0         coda_0.19-4         gtable_0.3.0       
#>  [37] glue_1.6.2          Rcpp_1.0.8.3        carData_3.0-5       cellranger_1.1.0   
#>  [41] jquerylib_0.1.4     vctrs_0.5.1         nlme_3.1-157        extrafontdb_1.0    
#>  [45] insight_0.17.1      xfun_0.31           lme4_1.1-29         rvest_1.0.2        
#>  [49] lifecycle_1.0.3     klippy_0.0.0.9500   srvyr_1.1.2         MASS_7.3-56        
#>  [53] scales_1.2.0        hms_1.1.1           parallel_4.2.0      RColorBrewer_1.1-3 
#>  [57] yaml_2.3.5          pbapply_1.5-0       pander_0.6.5        sass_0.4.1         
#>  [61] latticeExtra_0.6-29 stringi_1.7.6       rlang_1.0.6         pkgconfig_2.0.3    
#>  [65] evaluate_0.15       lattice_0.20-45     prediction_0.3.14   htmlwidgets_1.5.4  
#>  [69] tidyselect_1.1.2    plyr_1.8.7          magrittr_2.0.3      R6_2.5.1           
#>  [73] generics_0.1.2      clarkeTest_0.1.0    DBI_1.1.2           pillar_1.7.0       
#>  [77] haven_2.5.0         withr_2.5.0         jtools_2.2.1        survival_3.3-1     
#>  [81] abind_1.4-5         nnet_7.3-17         janitor_2.1.0       modelr_0.1.8       
#>  [85] crayon_1.5.1        car_3.0-13          utf8_1.2.2          tzdb_0.3.0         
#>  [89] rmarkdown_2.14      jpeg_0.1-9          grid_4.2.0          readxl_1.4.0       
#>  [93] data.table_1.14.2   AICcmodavg_2.3-1    reprex_2.0.1        digest_0.6.29      
#>  [97] xtable_1.8-4        stats4_4.2.0        munsell_0.5.0       unmarked_1.2.5     
#> [101] bslib_0.3.1         mitools_2.4

1.4 Background reading and theory

Ai and Norton, 2003. Interaction terms in logit and probit models : https://doi.org/10.1016/S0165-1765(03)00032-6

1.4.1 Marginal effect definition

What is a marginal effect? If you google, you will find this definition:

Marginal effects measure the impact that an instantaneous change in one variable has on the outcome variable while all other variables are held constant.

But I would say the definition depends on the measurement level of your independent variable.

For a dichotomous independent variable:

  • The change in your dependent variable if (ceteris paribus) your independent variable x1 changes from 0 to 1.

More formally:

\[ ME = f(X|x1=1) - f(X|x1=0), \]

where \(Y = f(X)\) and \(X\) is your set of covariates.

For a metric independent variable:

  • The change in your dependent variable, Y, at some value, x, of your independent variable x1, per (ceteris paribus) small change of x1 at x of s.

More formally:

\[ ME = \frac{f(X|x1=x + s) - f(X|x1=x - s)}{2s}, \] where \(s\) is an infinitesimal small number.

This is thus the slope (or more precisely the partial derivative) of \(f(X)\) at the point where \(x1\) is \(x\):

\[ ME = f^{'}(X|x1=x), \] and \[ f^{'}(X) = \displaystyle \frac{\partial f(X)}{\partial x1} \]

Where \(X\) is your set of covariates, \(x1\) is one of your covariates, \(x\) is a value of your covariate \(x1\).

1.4.2 Average Marginal Effect / Marginal Effect at Mean

Because you can calculate the ME at different values of your other covariates, you can calculate many MEs.

If you calculate the ME for all respondents in your dataset (and their observed values of covariates) and take the mean, you will end up with the AME, the Average Marginal Effect.
If you calculate the ME for covariate values set at their mean, you will end up with the MEM, the Marginal Effect at Mean (or MME, Mean Marginal Effect).

1.4.3 Marginal interaction effect

Sometimes this is incorrectly defined as:

\[ \displaystyle \frac{\partial f(X)}{\partial (x1x2)} \] Correctly would be:

\[ \displaystyle \frac{\partial f^2(X)}{\partial(x1)\partial(x2)} \]

Interpretation: How does the effect of \(x1\) on \(Y\) change with a change in \(x2\), conditional on \(X\).

Naturally, you have analogous definitions if \(x1\) and/or \(x2\) are dichotomous.


1.5 Simulated data

set.seed(227863)  # so we all have the same data

# we start with simulating some correlated covariates
sigma <- matrix(c(4, 2, 3, 2, 3, 1.5, 3, 1.5, 5), ncol = 3)
# sigma
df <- data.frame(rmvnorm(n = 500, mean = c(6, 8, 7), sigma = sigma))
colnames(df) <- c("age", "sex", "educ")

df %>%
    mutate(sex = ifelse(sex > mean(sex) + 0.3 * sd(sex), 0, 1), sex = as.factor(sex), sex = recode_factor(sex,
        `0` = "men", `1` = "women"), age = round(age, 2), educ = round(educ, 2)) -> df

# Let us add a linear dependent variable with the following formula
df %>%
    mutate(y_lin = 10 + 5 * as.numeric(sex) + 2.4 * age + 2 * educ + 0.5 * as.numeric(sex) * educ + -0.2 *
        age * educ + rnorm(500, 0, 0.5)) -> df

head(df)
#>    age   sex educ    y_lin
#> 1 6.87 women 9.34 51.13692
#> 2 6.34 women 4.64 42.90037
#> 3 5.87 women 7.96 48.96823
#> 4 1.29 women 4.01 33.57377
#> 5 6.66   men 5.66 36.26559
#> 6 9.88   men 9.96 43.77554


This was hopefully, all quite straightforward. But let us also simulate a dichotomous dependent variable.

Thus we would like to simulate a dichotomous dependent variable (0/1), from a Bernoulli distribution, with a logit link function. So we can illustrate marginal effects with logistic regression model.

The log-odds of the probability of an event is a linear combination of the independent variables

Please remember, the logistic function is:

\[ f(X) = P(Y=1|X) = n(X) = \pi_x = \frac{e^X}{(1 + e^X)} = \frac{1} {(1+e^{-X})} \]

We can easily calculate \(P(Y=1|X)\) in R if we know \(X\) via the function plogis(X).

And if we have \(P(Y=1|X)\) we can simulate an observed variable according to a binomial distribution:

\[Y_{di} \sim {\sf Binom}(1, P(Y=1|X))\]

df %>%
    mutate(y_di = rbinom(n = 500, size = 1, prob = plogis(-2.4 + 0.3 * as.numeric(sex) + 0.4 * age +
        0.3 * educ + 0.1 * as.numeric(sex) * educ + -0.1 * age * educ))) -> df

table(df$sex, df$y_di)
#>        
#>           0   1
#>   men   152  27
#>   women 141 180
head(df)
#>    age   sex educ    y_lin y_di
#> 1 6.87 women 9.34 51.13692    0
#> 2 6.34 women 4.64 42.90037    1
#> 3 5.87 women 7.96 48.96823    1
#> 4 1.29 women 4.01 33.57377    1
#> 5 6.66   men 5.66 36.26559    1
#> 6 9.88   men 9.96 43.77554    0

2 Linear model

Estimate the model

m1 <- lm(y_lin ~ sex + age + educ, data = df)
summary(m1)
#> 
#> Call:
#> lm(formula = y_lin ~ sex + age + educ, data = df)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -7.5318 -0.5558  0.2797  0.9000  3.3035 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 19.22576    0.30004   64.08   <2e-16 ***
#> sexwomen     8.83186    0.15553   56.78   <2e-16 ***
#> age          1.05363    0.04703   22.40   <2e-16 ***
#> educ         1.71399    0.03890   44.07   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 1.455 on 496 degrees of freedom
#> Multiple R-squared:  0.9355, Adjusted R-squared:  0.9351 
#> F-statistic:  2399 on 3 and 496 DF,  p-value: < 2.2e-16


And plot the model.

colors <- ifelse(df$sex == "men", "red", "blue")

with(df, plot3d(educ, age, predict(m1), col = colors))

2.1 Marginal effects of age.

We know that \(Y = intercept + b_sex*sex + b_age*age + b_educ*educ\). We know that age is a continuous variable. Thus we calculate the derivative (no worries we will use the function D) of Y with respect to age.

ME_age <- D(expression(intercept + b_sex * sex + b_age * age + b_educ * educ), "age")
ME_age
#> b_age

Find the value of this coefficient:

# coef(m1)
b_intercept <- coef(m1)[1]
b_sex <- coef(m1)[2]
b_age <- coef(m1)[3]
b_educ <- coef(m1)[4]
b_age
#>      age 
#> 1.053627


Exercise: Calculate the average marginal effect for age by using the numerical approach to calculate the derivative

2.2 Marginal effects of sex.

Remember the definition?

ME_sex <- with(df, (b_intercept + b_sex * 1 + b_age * age + b_educ * educ) - (b_intercept + b_sex * 0 +
    b_age * age + b_educ * educ))
ME_sex
#>   [1] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#>  [11] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#>  [21] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#>  [31] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#>  [41] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#>  [51] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#>  [61] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#>  [71] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#>  [81] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#>  [91] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [101] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [111] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [121] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [131] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [141] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [151] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [161] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [171] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [181] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [191] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [201] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [211] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [221] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [231] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [241] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [251] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [261] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [271] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [281] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [291] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [301] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [311] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [321] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [331] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [341] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [351] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [361] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [371] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [381] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [391] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [401] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [411] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [421] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [431] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [441] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [451] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [461] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [471] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [481] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865
#> [491] 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865 8.831865

A bit shorter would be:

dfnews0 <- dfnews1 <- df
# construct two datasets
dfnews0$sex <- "men"
dfnews1$sex <- "women"
# let R calculate the predicted scores
mean(predict(m1, dfnews1) - predict(m1, dfnews0))
#> [1] 8.831865


So, lets calculate the AME: mean(ME_sex) = 8.8318647.

And what is the interpretation? Well, let us plot the values of the dependent variable, if we hold education constant and vary age, for men and women.

plot(c(-10:10), b_intercept + b_sex * 1 + b_age * c(-10:10) + b_educ * mean(df$educ), col = "red", type = "b",
    xlab = "age", ylab = "y_lin", xlim = c(-10, 10), ylim = c(0, 60))
par(new = TRUE)
plot(c(-10:10), b_intercept + b_sex * 0 + b_age * c(-10:10) + b_educ * mean(df$educ), col = "blue", type = "b",
    xlab = "", ylab = "", xlim = c(-10, 10), ylim = c(0, 60))


You see, that at all values of age, the difference between men and women is AME.

2.3 check with R

summary(margins(m1))  #SE via deltamethod
#>    factor    AME     SE       z      p  lower  upper
#>       age 1.0536 0.0470 22.4029 0.0000 0.9614 1.1458
#>      educ 1.7140 0.0389 44.0662 0.0000 1.6378 1.7902
#>  sexwomen 8.8319 0.1555 56.7852 0.0000 8.5270 9.1367
summary(margins(m1, vce = "bootstrap", iterations = 999))  #SE via bootstrapping
#>    factor    AME     SE       z      p  lower  upper
#>       age 1.0536 0.0538 19.5965 0.0000 0.9482 1.1590
#>      educ 1.7140 0.0471 36.4029 0.0000 1.6217 1.8063
#>  sexwomen 8.8319 0.1396 63.2743 0.0000 8.5583 9.1054

3 Logistic model

m1_d <- glm(y_di ~ sex + age + educ, data = df, family = binomial)
summary(m1_d)
#> 
#> Call:
#> glm(formula = y_di ~ sex + age + educ, family = binomial, data = df)
#> 
#> Deviance Residuals: 
#>     Min       1Q   Median       3Q      Max  
#> -2.0237  -0.9296  -0.4609   1.0013   2.1158  
#> 
#> Coefficients:
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  0.63124    0.48127   1.312 0.189656    
#> sexwomen     1.43639    0.25730   5.583 2.37e-08 ***
#> age         -0.28712    0.07645  -3.756 0.000173 ***
#> educ        -0.04639    0.06069  -0.764 0.444632    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 678.28  on 499  degrees of freedom
#> Residual deviance: 560.37  on 496  degrees of freedom
#> AIC: 568.37
#> 
#> Number of Fisher Scoring iterations: 4


Let us have a look at the model:

# with(df, plot3d(sex, age, fitted.values(m1_d), col=colors))
with(df, plot3d(educ, age, fitted.values(m1_d), col = colors))

3.1 Marginal effects of age

Please remember, the logistic function is:

\[ f(X) = \frac{1} {(1+e^{-X})} \]

And the definition of the ME for a continuous variable \(x1\):

\[ ME = f^{'}(X|x1=x), \]

Thus we need to find \(f^{'}(X)\) with respect to \(x1\) (i.e., age).

You could do that yourself by using the chain-rule:

\[\displaystyle \frac{\partial f(X)}{\partial (x1)} = \displaystyle \frac{\partial f(X)}{\partial (X)}*\displaystyle \frac{\partial (X)}{\partial (x1)}\]

… or let R help you.

ME_age_f2 <- D(expression(1/(1 + exp(-1 * (b_intercept + b_sex * sex + b_age * age + b_educ * educ)))),
    "age")
ME_age_f2
#> exp(-1 * (b_intercept + b_sex * sex + b_age * age + b_educ * 
#>     educ)) * b_age/(1 + exp(-1 * (b_intercept + b_sex * sex + 
#>     b_age * age + b_educ * educ)))^2


Luckily for us, the derivative of plogis(X) is dlogis(X).

You wanna check? simply run: ?dlogis.

Retrieve the coefficients:

coef(m1_d)
#> (Intercept)    sexwomen         age        educ 
#>   0.6312358   1.4363873  -0.2871232  -0.0463920
b_intercept <- coef(m1_d)[1]
b_sex <- coef(m1_d)[2]
b_age <- coef(m1_d)[3]
b_educ <- coef(m1_d)[4]

And fill in the formula:

ME_age <- with(df, dlogis(b_intercept + b_sex * (sex == "women") + b_age * age + b_educ * educ) * b_age)
# shorter would be:
ME_age <- dlogis(predict(m1_d)) * b_age
head(ME_age)
#>           1           2           3           4           5           6 
#> -0.06976658 -0.07176243 -0.07177781 -0.04251901 -0.04164254 -0.01742796

So, you notice - hopefully - that that the ME_age is different for each respondent. Let us take the mean.

AME_age = -0.0546062.:

AME_age <- mean(ME_age)
AME_age
#> [1] -0.05460625

3.2 marginal effect sex.

Remember \(ME_{sex} = f(X|sex=1) - f(X|sex=0)\).

And once again simply fill in the formula.

ME_sex <- with(df, plogis(b_intercept + b_sex * 1 + b_age * age + b_educ * educ) - plogis(b_intercept +
    b_sex * 0 + b_age * age + b_educ * educ))
AME_sex <- mean(ME_sex)
AME_sex
#> [1] 0.290718

or,…

# new datasets
dfs0 <- dfs1 <- df
dfs0$sex <- "men"
dfs1$sex <- "women"
ME_sex <- plogis(predict(m1_d, dfs1)) - plogis(predict(m1_d, dfs0))
AME_sex <- mean(ME_sex)
AME_sex
#> [1] 0.290718

Let us plot the marginal effect at different values of the other covariates.

with(df, plot3d(educ, age, ME_sex, col = colors))

AME_sex = 0.290718.

3.3 Check with R.

# okay, now let's use R to calculate the marginal effects
summary(margins(m1_d))  #make sure that variable sex is defined as a factor not as a numeric variable. 
#>    factor     AME     SE       z      p   lower   upper
#>       age -0.0546 0.0138 -3.9608 0.0001 -0.0816 -0.0276
#>      educ -0.0088 0.0115 -0.7661 0.4436 -0.0314  0.0138
#>  sexwomen  0.2907 0.0485  5.9983 0.0000  0.1957  0.3857

Everything seems to be okay!


4 Linear model with interactions

Let us estimate a better fitting model.

m2 <- lm(y_lin ~ sex + age + educ + sex:educ + age:educ, data = df)
# note that we define the interaction within the formula. This is crucial!!
summary(m2)
#> 
#> Call:
#> lm(formula = y_lin ~ sex + age + educ + sex:educ + age:educ, 
#>     data = df)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -1.30718 -0.33026  0.00721  0.32965  1.94917 
#> 
#> Coefficients:
#>                Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)   14.866935   0.284848   52.19   <2e-16 ***
#> sexwomen       5.022406   0.197319   25.45   <2e-16 ***
#> age            2.396873   0.036563   65.56   <2e-16 ***
#> educ           2.527558   0.040563   62.31   <2e-16 ***
#> sexwomen:educ  0.497224   0.025560   19.45   <2e-16 ***
#> age:educ      -0.200642   0.004748  -42.26   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.5058 on 494 degrees of freedom
#> Multiple R-squared:  0.9922, Adjusted R-squared:  0.9922 
#> F-statistic: 1.263e+04 on 5 and 494 DF,  p-value: < 2.2e-16

Retrieve coefficients.

coef(m2)
#>   (Intercept)      sexwomen           age          educ sexwomen:educ      age:educ 
#>    14.8669349     5.0224058     2.3968728     2.5275580     0.4972235    -0.2006417
b_intercept <- coef(m2)[1]
b_sex <- coef(m2)[2]
b_age <- coef(m2)[3]
b_educ <- coef(m2)[4]
b_sex_educ <- coef(m2)[5]
b_age_educ <- coef(m2)[6]

4.1 marginal effect sex

Let’s use power of R.

ME_sex <- predict(m2, newdata = dfs1) - predict(m2, newdata = dfs0)
AME_sex <- mean(ME_sex)
AME_sex
#> [1] 8.484782

4.2 marginal effect age

ME_age_f <- expression(b_age + b_age_educ * educ)  #this is the derivative with respect to age
ME_age <- with(df, eval(ME_age_f))  #Do you see I do not fit all variables myself but use this convenient function eval()
AME_age <- mean(ME_age)
AME_age
#> [1] 0.9997207

4.3 Marginal effect education

ME_educ_f <- expression(b_educ + b_age_educ * age + b_sex_educ * (sex == "women"))
ME_educ <- with(df, eval(ME_educ_f))
AME_educ <- mean(ME_educ)
AME_educ
#> [1] 1.65055

4.4 Let us check with R

summary(margins(m2))
#>    factor    AME     SE        z      p  lower  upper
#>       age 0.9997 0.0164  61.0272 0.0000 0.9676 1.0318
#>      educ 1.6505 0.0136 121.6687 0.0000 1.6240 1.6771
#>  sexwomen 8.4848 0.0554 153.1992 0.0000 8.3762 8.5933

4.5 Marginal interaction effect educ*age

First calculate derivative to age and then to educ.

fx <- expression(intercept + b_sex * sex + b_age * age + b_educ * educ + b_sex_educ * sex * educ + b_age_educ *
    age * educ)
dfxdage <- D(fx, "age")
d2fxdagededuc <- D(dfxdage, "educ")
# check if the variable sex is in our expression we then need to be careful because it is factor.
# but it is not.
AME_age_educ <- mean(with(df, eval(d2fxdagededuc)))
AME_age_educ
#> [1] -0.2006417

And this is (of course) our regression coefficient for the interaction.

4.6 Marginal interaction effect sex*educ

First calculate derivative to sex and then to educ.

fx <- expression(intercept + b_sex * sex + b_age * age + b_educ * educ + b_sex_educ * sex * educ + b_age_educ *
    age * educ)
dfxdsex <- expression(intercept + b_sex * 1 + b_age * age + b_educ * educ + b_sex_educ * 1 * educ + b_age_educ *
    age * educ - (intercept + b_sex * 0 + b_age * age + b_educ * educ + b_sex_educ * 0 * educ + b_age_educ *
    age * educ))
dfxdsex
#> expression(intercept + b_sex * 1 + b_age * age + b_educ * educ + 
#>     b_sex_educ * 1 * educ + b_age_educ * age * educ - (intercept + 
#>     b_sex * 0 + b_age * age + b_educ * educ + b_sex_educ * 0 * 
#>     educ + b_age_educ * age * educ))
dfxdsex <- Simplify(dfxdsex)
dfxdsex
#> expression(b_sex + b_sex_educ * educ)
d2fxdsexdeduc <- D(dfxdsex, "educ")
d2fxdsexdeduc
#> b_sex_educ
# this is just a constant, namely our interaction coefficient
eval(d2fxdsexdeduc)
#> sexwomen:educ 
#>     0.4972235

5 Logistic with model interactions

Let’s estimate a better model

m2_d <- glm(y_di ~ sex + age + educ + sex:educ + age:educ, family = binomial, data = df)
summary(m2_d)
#> 
#> Call:
#> glm(formula = y_di ~ sex + age + educ + sex:educ + age:educ, 
#>     family = binomial, data = df)
#> 
#> Deviance Residuals: 
#>     Min       1Q   Median       3Q      Max  
#> -1.8203  -0.8882  -0.3042   0.9333   2.1433  
#> 
#> Coefficients:
#>               Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)   -2.73893    1.38148  -1.983  0.04741 *  
#> sexwomen       0.95791    0.98451   0.973  0.33056    
#> age            0.46274    0.18541   2.496  0.01257 *  
#> educ           0.56442    0.21555   2.619  0.00883 ** 
#> sexwomen:educ  0.06869    0.13933   0.493  0.62202    
#> age:educ      -0.12350    0.02823  -4.375 1.21e-05 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 678.28  on 499  degrees of freedom
#> Residual deviance: 533.69  on 494  degrees of freedom
#> AIC: 545.69
#> 
#> Number of Fisher Scoring iterations: 5

Plotting the estimated model is very insightful.

with(df, plot3d(sex, age, fitted.values(m2_d), col = colors))
# with(df, plot3d(educ, age, fitted.values(m2_d), col=colors)) with(df, plot3d(educ[sex=='men'],
# age[sex=='men'], fitted.values(m2_d)[sex=='men'], col=colors[sex=='men'])) with(df,
# plot3d(educ[sex=='women'], age[sex=='women'], fitted.values(m2_d)[sex=='women'],
# col=colors[sex=='women']))

Retrieve coefficients.

coef(m2_d)
#>   (Intercept)      sexwomen           age          educ sexwomen:educ      age:educ 
#>   -2.73893262    0.95790989    0.46274191    0.56442210    0.06868779   -0.12350206
b_intercept <- coef(m2_d)[1]
b_sex <- coef(m2_d)[2]
b_age <- coef(m2_d)[3]
b_educ <- coef(m2_d)[4]
b_sex_educ <- coef(m2_d)[5]
b_age_educ <- coef(m2_d)[6]

5.1 marginal effect sex

Let’s use power of R

ME_sex <- plogis(predict(m2_d, newdata = dfs1)) - plogis(predict(m2_d, newdata = dfs0))
AME_sex <- mean(ME_sex)
AME_sex
#> [1] 0.2700221

5.2 marginal effect age

ME_age <- with(df, dlogis(predict(m2_d)) * (b_age + b_age_educ * educ))
AME_age <- mean(ME_age)
AME_age
#> [1] -0.06071402

5.3 marginal effect educ

ME_educ <- with(df, dlogis(predict(m2_d)) * (b_educ + b_sex_educ * (sex == "women") + b_age_educ * age))
AME_educ <- mean(ME_educ)
AME_educ
#> [1] -0.009992943

5.4 check with R

summary(margins(m2_d))
#>    factor     AME     SE       z      p   lower   upper
#>       age -0.0607 0.0134 -4.5156 0.0000 -0.0871 -0.0344
#>      educ -0.0100 0.0111 -0.9018 0.3672 -0.0317  0.0117
#>  sexwomen  0.2700 0.0483  5.5892 0.0000  0.1753  0.3647

5.5 Marginal interaction effects.

Warning, calculating the derivatives may be difficult depending on f(x), and your mathematical background ;-)

We need the double derivative of \(f(x)\). We will call this function ddlogis.

# f(X) = P(Y=1|X) = n(X) = exp(X) / (1 + exp(X)) = 1 / (1+exp(-X))
nX <- expression(1/(1 + exp(-X)))
# similar as dlogis
dnXdx <- D(nX, "X")
# higher order derivative
d2nxdxdx <- D(dnXdx, "X")
# d2nxdxdx
ddlogis <- function(X) {
    -(exp(-X)/(1 + exp(-X))^2 - exp(-X) * (2 * (exp(-X) * (1 + exp(-X))))/((1 + exp(-X))^2)^2)
}

5.6 marginal interaction effect educ*age

MEeducage <- with(df, ddlogis(predict(m2_d)) * (b_educ + b_age_educ * age) * (b_age + b_age_educ * educ) +
    dlogis(predict(m2_d)) * (b_age_educ))

AMEeducage <- mean(MEeducage)
AMEeducage
#> [1] -0.01726894

5.6.1 Numerical approach

s <- 0.001  #define a small step

# define datasets
dfplusplus <- dfplusmin <- dfminplus <- dfminmin <- df
# add the small step to the variables
dfplusmin$age <- dfplusplus$age <- df$age + s
dfminmin$age <- dfminplus$age <- df$age - s
dfplusplus$educ <- dfminplus$educ <- df$educ + s
dfplusmin$educ <- dfminmin$educ <- df$educ - s

# calculate the predicted probabilities
p11 <- plogis(predict(m2_d, dfplusplus))
p10 <- plogis(predict(m2_d, dfplusmin))
p01 <- plogis(predict(m2_d, dfminplus))
p00 <- plogis(predict(m2_d, dfminmin))

# and the marginal effects. be aware of all the brackets. :-(
am <- (((p11 - p01)/(2 * s)) - ((p10 - p00)/(2 * s)))/(2 * s)
mean(am)
#> [1] -0.01711588


Exercise: Suppose that all estimated interaction terms are 0 in model m2_d what would be the marginal effect of the interaction effect of education with age according to the above? Thus what would be: MEeducage.

5.7 marginal interaction effect sex*educ

First calculate derivative to sex and then to educ

dfxdsex <- with(df, plogis(predict(m2_d, dfs1)) - plogis(predict(m2_d, dfs0)))

MEsexeduc <- with(df, dlogis(predict(m2_d, dfs1)) * (b_educ + b_sex_educ * 1 + b_age_educ * age) - dlogis(predict(m2_d,
    dfs0)) * (b_educ + b_sex_educ * 0 + b_age_educ * age))

AMEsexeduc <- mean(MEsexeduc)
AMEsexeduc
#> [1] -0.00386123

5.7.1 Numerical approach

s <- 0.001  #define a small step

# define datasets
dfplusplus <- dfplusmin <- dfminplus <- dfminmin <- df
# add the small step to the variables
dfplusmin$sex <- dfplusplus$sex <- "women"
dfminmin$sex <- dfminplus$sex <- "men"
dfplusplus$educ <- dfminplus$educ <- df$educ + s
dfplusmin$educ <- dfminmin$educ <- df$educ - s

# calculate the predicted probabilities
p11 <- plogis(predict(m2_d, dfplusplus))
p10 <- plogis(predict(m2_d, dfplusmin))
p01 <- plogis(predict(m2_d, dfminplus))
p00 <- plogis(predict(m2_d, dfminmin))

# and the marginal effects. be aware of all the brackets. :-(
am <- ((p11 - p01) - (p10 - p00))/(2 * s)
mean(am)
#> [1] -0.00386123

Let us plot these interaction effects against the fitted values.

plot(fitted.values(m2_d), MEsexeduc, col = colors)


You see that at some points the interaction is negative and at some it is positive.

5.8 Check with R2

results <- intEff(m2_d, vars = c("age", "educ"), data = df)
mean(results$byobs$int$int_eff)
#> [1] -0.01726894

quite close.

results <- intEff(m2_d, vars = c("sex", "educ"), data = df)
mean(results$byobs$int$int_eff)
#> [1] 0.04301771

This is not the same as above, probably because function cannot handle double interactions??


6 Take home message

  • When you use a non-linear model, the interpretation of marginal effects is difficult.
  • Only when you have a non-linear model, the use of marginal effects makes sense.
  • MEM: How useful are they?
  • The absence or presence of an estimated interaction term does not say much about the presence or absence of a (marginal) interaction effect when your link function is not linear.
  • R does not provide packages/functions that calculate marginal effects for interaction effects for non-linear models.

7 SE of marginal effects.

Well, you made it this far in the tutorial. The goal was to show you the underlying idea and interpretation of marginal effects. Now, that you understand, you can apply the standard packages in R to calculate the marginal effects for you, like margin. There is, however, one caveat. These packages are not able to calculate (correct) marginal effects of interaction effects and not all non-linear models are supported. This means you sometimes have to calculate the marginal effects yourself. Which can be a challenge but as you see, is doable. But if you calculate the ME yourself you might also want to know the standard error (SE) or confidence-interval (CI) of the marginal effect.

There are two common approaches:

  • Delta method
  • Bootstrapping

The Delta method is default in most packages but it is quite crude. Bootstrapping is slow but quite easy to implement yourself. For more background reading on bootstrapping you could start here:

Davison, A.C. and Hinkley, D.V. (1997) Bootstrap Methods and Their Application. Cambridge University Press.

7.1 Example 1: Linear without interactions

# we need to define a function in which we calculate our marginal effects. This is quite easy
# because it is simply copy paste from the above.

bootFunc <- function(data, i) {
    df <- data[i, ]  #these are the df we will calculate our statistics of interest
    m1 <- lm(y_lin ~ sex + age + educ, data = df)  #we first estimate the model on the generated datasets
    c(coef(m1)[2], coef(m1)[3], coef(m1)[4])  #we save our statistics of interest in a vector
}

b <- boot(df, bootFunc, R = 999)  #we feet the boot function our original dataset and our function
b  #names are not very nice. Well, you will manage. 

# boot.ci(b, index=1) #you could ask for the CI summary(margins(m1, vce='bootstrap',
# iterations=999)) #SE via bootstrapping. Please notice the order of the variabels is messed up.
# #you could check with if margins gives the same results. Spoiler alert: YES!
#> 
#> ORDINARY NONPARAMETRIC BOOTSTRAP
#> 
#> 
#> Call:
#> boot(data = df, statistic = bootFunc, R = 999)
#> 
#> 
#> Bootstrap Statistics :
#>     original        bias    std. error
#> t1* 8.831865 -0.0045442056  0.13365235
#> t2* 1.053627 -0.0014882377  0.05228616
#> t3* 1.713989 -0.0006844387  0.04880639


Okay, our estimates of the AME and the SE are very close. Our boot function is a lot faster than the function implemented in margin. Don’t know why.

7.2 Example 2: Logistic model with interactions

ddlogis <- function(X) {
    -(exp(-X)/(1 + exp(-X))^2 - exp(-X) * (2 * (exp(-X) * (1 + exp(-X))))/((1 + exp(-X))^2)^2)
}  #dont forget we need this one as well. 

bootFunc <- function(data, i) {
    df <- data[i, ]  #bootrap datasets

    m2_d <- glm(y_di ~ sex + age + educ + sex:educ + age:educ, family = binomial, data = df)  #estimate model

    # retrieve coefficients
    b_intercept <- coef(m2_d)[1]
    b_sex <- coef(m2_d)[2]
    b_age <- coef(m2_d)[3]
    b_educ <- coef(m2_d)[4]
    b_sex_educ <- coef(m2_d)[5]
    b_age_educ <- coef(m2_d)[6]

    # AME sex
    newdata_sex0 <- df
    newdata_sex0$sex <- "men"  #use the correct factor levels!!
    newdata_sex1 <- df
    newdata_sex1$sex <- "women"  #use the correct factor levels!!
    ME_sex <- plogis(predict(m2_d, newdata = newdata_sex1)) - plogis(predict(m2_d, newdata = newdata_sex0))
    AME_sex <- mean(ME_sex)

    # AME age
    ME_age <- with(df, dlogis(predict(m2_d)) * (b_age + b_age_educ * educ))
    AME_age <- mean(ME_age)

    # AME educ
    ME_educ <- with(df, dlogis(predict(m2_d)) * (b_educ + b_sex_educ * (sex == "women") + b_age_educ *
        age))
    AME_educ <- mean(ME_educ)

    # AME interaction effect sex*educ
    MEsexeduc <- with(df, dlogis(predict(m2_d, dfs1)) * (b_educ + b_sex_educ * 1 + b_age_educ * age) -
        dlogis(predict(m2_d, dfs0)) * (b_educ + b_sex_educ * 0 + b_age_educ * age))
    AMEsexeduc <- mean(MEsexeduc)

    # AME interaction effect age*educ
    MEeducage <- with(df, ddlogis(predict(m2_d)) * (b_educ + b_age_educ * age) * (b_age + b_age_educ *
        educ) + dlogis(predict(m2_d)) * (b_age_educ))
    AMEeducage <- mean(MEeducage)

    c(AME_sex, AME_age, AME_educ, AMEsexeduc, AMEeducage)  #save results
}

b <- boot(df, bootFunc, R = 999)


b <- summary(b)
rownames(b) <- c("AME_sex", "AME_age", "AME_educ", "AME_sex*educ", "AME_educ*age")
b
#> 
#> Number of bootstrap replications R = 999 
#>                original    bootBias    bootSE    bootMed
#> AME_sex       0.2700221 -3.4979e-03 0.0484563  0.2666374
#> AME_age      -0.0607140 -4.4300e-04 0.0137299 -0.0611598
#> AME_educ     -0.0099929 -7.8897e-05 0.0115793 -0.0103263
#> AME_sex*educ -0.0038612  1.0225e-02 0.0233875  0.0051893
#> AME_educ*age -0.0172689 -4.7421e-04 0.0041145 -0.0176206


Copyright © 2022 Jochem Tolsma

---
title: "Tutorial marginal effects"
author: '[Jochem Tolsma](https://www.jochemtolsma.nl) - Radboud University / University of Groningen, the Netherlands'
date: "Last compiled on `r format(Sys.time(), '%B, %Y')`"
output:
  html_document:
    toc:  true
    toc_float: true
    number_sections: true
    code_folding: show
    code_download: yes
    theme: flatly
    highlight: default
    
---


```{r globalsettings, echo=FALSE, warning=FALSE, message=FALSE}
library(knitr)
opts_chunk$set(tidy.opts=list(width.cutoff=100),tidy=TRUE, warning = FALSE, message = FALSE,comment = "#>", cache=TRUE, echo=TRUE, class.source=c("test"), class.output=c("test2"))
options(width = 100)
library(rgl)
rgl::setupKnitr()
knitr::knit_hooks$set(webgl = hook_webgl)
```

```{r colorize, echo=FALSE}
colorize <- function(x, color) {
  if (knitr::is_latex_output()) {
    sprintf("\\textcolor{%s}{%s}", color, x)
  } else if (knitr::is_html_output()) {
    sprintf("<span style='color: %s;'>%s</span>", color, 
            x)
  } else x
}

```

```{r klippy, echo=FALSE, include=TRUE}
klippy::klippy(position = c('top', 'right'))
#klippy::klippy(color = 'darkred')
#klippy::klippy(tooltip_message = 'Click to copy', tooltip_success = 'Done')
```

```{css style settings, echo = FALSE}
.test {
  max-height: 300px;
  overflow-y: auto;
  overflow-x: auto;
  margin: 0px;
}

.test2 {
  max-height: 300px;
  overflow-y: auto;
  overflow-x: auto;
  margin: 0px;
  #background-color: white;
  color: rgb(201, 76, 76);
}


h1, .h1, h2, .h2, h3, .h3 {
  margin-top: 24px;
}



blockquote {
    padding: 10px 20px;
    margin: 0 0 20px;
    font-size: 14px;
    border-left: 5px solid #eee;
    background-color: rgb(255,255,224,1);
}


.button1 {
  background-color: grey; /* Red */ 
  border: 2px solid black;
  color: white;
  padding: 15px 32px;
  text-align: center;
  text-decoration: none;
  display: inline-block;
  font-size: 16px;
  margin: 4px 2px;
  cursor: pointer;
  /* border-radius: 12px; */
  width: 100%;
}

.button1:hover {
  box-shadow: 0 12px 16px 0 rgba(0,0,0,0.24), 0 17px 50px 0 rgba(0,0,0,0.19);
}

.button1:active {
  border: 2px solid red;
}



```




---


#  Intro


In this tutorial I will try to explain the logic behind marginal effects.  

I will provide you with practical examples. And give you some exercises. 

To copy the code click the button in the upper right corner of the code-chunks.

Questions can be addressed to [Jochem Tolsma](mailto:jochem.tolsma@ru.nl).

---  



## Custom functions

- `package.check`: Check if packages are installed (and install if not) in R ([source](https://vbaliga.github.io/verify-that-r-packages-are-installed-and-loaded/)). 



```{r, results='hide', echo=TRUE}
fpackage.check <- function(packages) {
  lapply(packages, FUN = function(x) {
    if (!require(x, character.only = TRUE)) {
      install.packages(x, dependencies = TRUE)
      library(x, character.only = TRUE)
    }
  })
}


```

---  

## Packages


- `tidyverse`: if you can't base them, join them  
- `miniGUI`: 
- `rgl`: plotting stuf   
- `Deriv`: to let R calculate derivatives  
- `margin`: an R package that will calculate margins for you  
- `mvtnorm`: to simulate data (with correlated covariates)  
- `boot`: to calculate SE/CI of the marginal effects

```{r, results='hide', echo=TRUE}
packages = c("tidyverse", "miniGUI", "rgl", "Deriv", "margins", "mvtnorm", "boot", "DAMisc")

fpackage.check(packages)

```


## Session info

This is my setup. 

```{r}
sessionInfo()
```

--- 

## Background reading and theory

Ai and Norton, 2003. Interaction terms in logit and probit models : https://doi.org/10.1016/S0165-1765(03)00032-6


### Marginal effect definition  {#defs}

What is a marginal effect? If you google, you will find this definition: 

> Marginal effects measure the impact that an instantaneous change in one variable has on the outcome variable while all other variables are held constant. 

But I would say the definition depends on the measurement level of your independent variable. 

For a dichotomous independent variable: 

* `r colorize("The change in your dependent variable if (ceteris paribus) your independent variable x1 changes from 0 to 1", "red")`. 

More formally: 

$$ ME = f(X|x1=1) - f(X|x1=0), $$ 

where $Y = f(X)$ and $X$ is your set of covariates. 

For a metric independent variable:  

* `r colorize("The change in your dependent variable, Y, at some value, x,  of your independent variable x1, per (ceteris paribus) small change of x1 at x of s.", "red")`

More formally: 

$$ ME = \frac{f(X|x1=x + s) - f(X|x1=x - s)}{2s}, $$
where $s$ is an infinitesimal small number. 

This is thus the slope (or more precisely the partial derivative) of $f(X)$ at the point where $x1$ is $x$: 

$$ ME = f^{'}(X|x1=x), $$
and $$ f^{'}(X)  = \displaystyle \frac{\partial f(X)}{\partial x1} $$


Where $X$ is your set of covariates, $x1$ is one of your covariates, $x$ is a value of your covariate $x1$. 


### Average Marginal Effect / Marginal Effect at Mean

Because you can calculate the ME at different values of your other covariates, you can calculate many MEs. 

If you calculate the ME for all respondents in your dataset (and their observed values of covariates) and take the mean, you will end up with the **AME**, the Average Marginal Effect.  
If you calculate the ME for covariate values set at their mean, you will end up with the **MEM**, the Marginal Effect at Mean (or MME, Mean Marginal Effect). 

### Marginal interaction effect

Sometimes this is incorrectly defined as:

$$ \displaystyle \frac{\partial f(X)}{\partial (x1x2)} $$
Correctly would be:

$$ \displaystyle \frac{\partial f^2(X)}{\partial(x1)\partial(x2)} $$

`r colorize ("Interpretation: How does the effect of $x1$ on $Y$ change with a change in $x2$, conditional on $X$.", "red")` 

Naturally, you have analogous definitions if $x1$ and/or $x2$ are dichotomous. 

---  



## Simulated data  

```{r}
set.seed(227863) # so we all have the same data

# we start with simulating some correlated covariates
sigma <- matrix(c(4,2,3, 2,3,1.5, 3,1.5,5), ncol=3)
#sigma
df <- data.frame(rmvnorm(n=500, mean=c(6,8,7), sigma=sigma))
colnames(df) <- c("age", "sex", "educ")

df %>% 
  mutate(sex = ifelse(sex > mean(sex) + 0.3*sd(sex), 0,1), 
         sex = as.factor(sex),
         sex = recode_factor(sex, '0'="men", '1'="women"), 
         age = round(age, 2), 
         educ = round(educ, 2)) -> df

# Let us add a linear dependent variable with the following formula
df %>% 
  mutate(y_lin = 10 + 5*as.numeric(sex) + 2.4*age + 2*educ + 0.5*as.numeric(sex)*educ + -0.2*age*educ + rnorm(500, 0, .5)) -> df

head(df)

```
<br>

This was hopefully, all quite straightforward. But let us also simulate a dichotomous dependent variable. 


Thus we would like to simulate a dichotomous dependent variable (0/1), from a Bernoulli distribution, with a logit link function. So we can illustrate marginal effects with logistic regression model. 

The log-odds of the probability of an event is a linear combination of the independent variables

Please remember, the logistic function is:  

$$ f(X) = P(Y=1|X) = n(X) = \pi_x = \frac{e^X}{(1 + e^X)}  = \frac{1} {(1+e^{-X})} $$


We can easily calculate $P(Y=1|X)$ in R if we know $X$ via the function `plogis(X)`.  

<!--- 
do i need this:  apply the link function (inverse logistic function / logit / in R: qlogis(pi_x)): ln(pi_x / 1 - pi_x) = X
---> 

And if we have $P(Y=1|X)$ we can simulate an observed variable according to a binomial distribution: 

$$Y_{di} \sim {\sf Binom}(1, P(Y=1|X))$$

```{r}

df %>%
  mutate(y_di = rbinom(n=500, size=1, prob=plogis(-2.4 + .3*as.numeric(sex) + .4*age + .3*educ + .1*as.numeric(sex)*educ + -0.1*age*educ))) -> df

table(df$sex, df$y_di)
head(df)
```

---  


# Linear model

Estimate the model  

```{r }

m1 <- lm(y_lin ~ sex + age + educ, data=df)
summary(m1)

```

<br>
And plot the model.
```{r , webgl=TRUE}

colors <- ifelse(df$sex=="men", "red", "blue")

with(df, plot3d(educ, age, predict(m1), col=colors))
```


## Marginal effects of age. 

We know that $Y = intercept + b_sex*sex + b_age*age + b_educ*educ$. 
We know that age is a continuous variable. 
Thus we calculate the derivative (no worries we will use the function `D`) of Y with respect to age. 

```{r}
ME_age <- D(expression(intercept + b_sex*sex + b_age*age + b_educ*educ), "age") 
ME_age
```
Find the value of this coefficient: 

```{r}

#coef(m1)
b_intercept <- coef(m1)[1]
b_sex <- coef(m1)[2]
b_age <- coef(m1)[3]
b_educ <- coef(m1)[4]
b_age
```
<br> 

> Exercise: Calculate the average marginal effect for age by using the numerical approach to calculate the derivative


<!--- include a button here ---> 

<script>
function myFunction() {

            var btn = document.getElementById("myButton");
            //to make it fancier
            if (btn.value == "Click to Hide Answer") {
                btn.value = "Show Answer";
                btn.innerHTML = "Show Answer";
            }
            else {
                btn.value = "Click to Hide Answer";
                btn.innerHTML = "Hide Answer";
            }
            //this is what you're looking for
            var x = document.getElementById("myDIV");
            if (x.style.display === "none") {
                x.style.display = "block";
            } else {
                x.style.display = "none";
            }
        }
          
</script>



<button class=button1 onclick="myFunction()" id="myButton" value="Click to Show Answer">Show Answer</button>

<div style="display:none;" id="myDIV">
<br>



```{r}
s <- .01 #define a small step

dfmin <- dfplus <- df #let us copy the datasets
dfplus$age <- df$age + s #we add a small step to age in one of the datasets
dfmin$age <- df$age - s #we substract a small step to age in one of the datasets

ysplus <- predict(m1, newdata=dfplus) #we calculate the predicted values based on our model parameters and new values of covariates
ysmin <- predict(m1, newdata=dfmin) #we calculate the predicted values based on our model parameters and new values of covariates

ME_age <- (ysplus - ysmin) / (2*s) #calculate the MEs

AME_age <- mean(ME_age) #calculate the AME

AME_age
```
</div>

## Marginal effects of sex.

Remember the definition? 

```{r}
ME_sex <- with(df, (b_intercept + b_sex*1 + b_age*age + b_educ*educ) - (b_intercept + b_sex*0 + b_age*age + b_educ*educ))
ME_sex
```

A bit shorter would be: 
```{r}
dfnews0 <- dfnews1 <- df 
#construct two datasets
dfnews0$sex <- "men"
dfnews1$sex <- "women"
#let R calculate the predicted scores
mean(predict(m1, dfnews1) - predict(m1, dfnews0))
```
<br> 
So, lets calculate the AME: `mean(ME_sex)` = `r mean(ME_sex)`. 

And what is the interpretation? Well, let us plot the values of the dependent variable, if we hold education constant and vary age, for `r colorize("men", "red")` and `r colorize("women", "blue")`. 

```{r}
plot(c(-10:10), b_intercept + b_sex*1 + b_age*c(-10:10) + b_educ*mean(df$educ), col="red", type= "b", xlab="age", ylab="y_lin", xlim=c(-10,10), ylim=c(0, 60))
par(new=TRUE)
plot(c(-10:10), b_intercept + b_sex*0 + b_age*c(-10:10) + b_educ*mean(df$educ), col="blue", type= "b", xlab="", ylab="", xlim=c(-10,10), ylim=c(0, 60))

```
<br> 

You see, that at all values of age, the difference between `r colorize("men", "red")` and `r colorize("women", "blue")` is AME.  

## check with R

```{r}
summary(margins(m1)) #SE via deltamethod
summary(margins(m1, vce="bootstrap", iterations=999)) #SE via bootstrapping
```

---  

# Logistic model

```{r}
m1_d <- glm(y_di ~ sex + age + educ, data=df, family=binomial)
summary(m1_d)
```
<br>

Let us have a look at the model: 

```{r, webgl=TRUE}

#with(df, plot3d(sex, age, fitted.values(m1_d), col=colors))
with(df, plot3d(educ, age, fitted.values(m1_d), col=colors))
```


## Marginal effects of age  

Please remember, the logistic function is:  

$$ f(X) =  \frac{1} {(1+e^{-X})} $$


And the definition of the ME for a continuous variable $x1$: 


$$ ME = f^{'}(X|x1=x), $$

Thus we need to find $f^{'}(X)$ with respect to $x1$ (i.e., age). 

You could do that yourself by using the chain-rule: 

$$\displaystyle \frac{\partial f(X)}{\partial (x1)} = \displaystyle \frac{\partial f(X)}{\partial (X)}*\displaystyle \frac{\partial (X)}{\partial (x1)}$$

... or let R help you. 

```{r}
ME_age_f2 <- D(expression(1 / (1 + exp(-1*(b_intercept + b_sex*sex + b_age*age + b_educ*educ)))), "age") 
ME_age_f2

```
<br>

Luckily for us, the derivative of `plogis(X)` is `dlogis(X)`. 

You wanna check? simply run: `?dlogis`. 


Retrieve the coefficients: 

```{r}
coef(m1_d)
b_intercept <- coef(m1_d)[1]
b_sex <- coef(m1_d)[2]
b_age <- coef(m1_d)[3]
b_educ <- coef(m1_d)[4]
```
And fill in the formula: 

```{r}
ME_age <- with(df, dlogis(b_intercept + b_sex*(sex=="women")+ b_age*age + b_educ*educ)*b_age)
#shorter would be: 
ME_age <- dlogis(predict(m1_d))*b_age
head(ME_age)
```
So, you notice - hopefully - that that the ME_age is different for each respondent. Let us take the mean. 


**AME_age** = `r mean(ME_age)`.:

```{r}
AME_age <- mean(ME_age)
AME_age
```


## marginal effect sex. 

Remember $ME_{sex} = f(X|sex=1) - f(X|sex=0)$. 

And once again simply fill in the formula. 

```{r}
ME_sex <- with(df, plogis(b_intercept + b_sex*1+ b_age*age + b_educ*educ) - plogis(b_intercept + b_sex*0+ b_age*age + b_educ*educ))
AME_sex <- mean(ME_sex)
AME_sex
```

or,...
```{r}
#new datasets
dfs0 <- dfs1 <- df
dfs0$sex <- "men"
dfs1$sex <- "women"
ME_sex <- plogis(predict(m1_d, dfs1)) - plogis(predict(m1_d, dfs0))
AME_sex <- mean(ME_sex)
AME_sex
```

Let us plot the marginal effect at different values of the other covariates. 

```{r , webgl=TRUE}
with(df, plot3d(educ, age, ME_sex, col=colors))
```

AME_sex = `r AME_sex`. 

## Check with R. 

```{r}
#okay, now let's use R to calculate the marginal effects
summary(margins(m1_d)) #make sure that variable sex is defined as a factor not as a numeric variable. 
```
Everything seems to be okay! 

---   

# Linear model with interactions

Let us estimate a better fitting model. 

```{r}
m2 <- lm(y_lin ~ sex + age + educ + sex:educ + age:educ, data=df)
#note that we define the interaction within the formula. This is crucial!!
summary(m2)
```

Retrieve coefficients. 

```{r}
coef(m2)
b_intercept <- coef(m2)[1]
b_sex <- coef(m2)[2]
b_age <- coef(m2)[3]
b_educ <- coef(m2)[4]
b_sex_educ <- coef(m2)[5]
b_age_educ <- coef(m2)[6]
``` 

## marginal effect sex

Let's use power of R. 

```{r}
ME_sex <- predict(m2, newdata=dfs1) - predict(m2, newdata=dfs0)
AME_sex <- mean(ME_sex)
AME_sex
```


## marginal effect age  

```{r}
ME_age_f <- expression(b_age + b_age_educ*educ) #this is the derivative with respect to age
ME_age <- with(df, eval(ME_age_f)) #Do you see I do not fit all variables myself but use this convenient function eval()
AME_age <- mean(ME_age)
AME_age
```

## Marginal effect education  

```{r}
ME_educ_f <- expression(b_educ + b_age_educ*age + b_sex_educ*(sex=="women"))
ME_educ <- with(df, eval(ME_educ_f))
AME_educ <- mean(ME_educ)
AME_educ
```

## Let us check with R

```{r}
summary(margins(m2))
```



## Marginal interaction effect educ*age

First calculate derivative to age and then to educ. 

```{r}

fx <- expression(intercept + b_sex*sex + b_age*age + b_educ*educ + b_sex_educ*sex*educ + b_age_educ*age*educ)
dfxdage <- D(fx, "age")
d2fxdagededuc <- D(dfxdage, "educ")
#check if the variable sex is in our expression we then need to be careful because it is factor. but it is not. 
AME_age_educ <- mean(with(df, eval(d2fxdagededuc) ))
AME_age_educ
```
And this is (of course) our regression coefficient for the interaction. 

## Marginal interaction effect sex*educ

First calculate derivative to sex and then to educ. 

```{r}
fx <- expression(intercept + b_sex*sex + b_age*age + b_educ*educ + b_sex_educ*sex*educ + b_age_educ*age*educ)
dfxdsex <- expression(intercept + b_sex*1 + b_age*age + b_educ*educ + b_sex_educ*1*educ + b_age_educ*age*educ -(intercept + b_sex*0 + b_age*age + b_educ*educ + b_sex_educ*0*educ + b_age_educ*age*educ))
dfxdsex
dfxdsex <- Simplify(dfxdsex)
dfxdsex
d2fxdsexdeduc <- D(dfxdsex, "educ")
d2fxdsexdeduc
#this is just a constant, namely our interaction coefficient
eval(d2fxdsexdeduc)
```

---  


# Logistic with model interactions

Let's estimate a better model

```{r}
m2_d <- glm(y_di ~ sex + age + educ + sex:educ + age:educ, family=binomial, data=df)
summary(m2_d)
```
Plotting the estimated model is very insightful. 

```{r, results="hold", webgl=TRUE }
with(df, plot3d(sex, age, fitted.values(m2_d), col=colors))
#with(df, plot3d(educ, age, fitted.values(m2_d), col=colors))
#with(df, plot3d(educ[sex=="men"], age[sex=="men"], fitted.values(m2_d)[sex=="men"], col=colors[sex=="men"]))
#with(df, plot3d(educ[sex=="women"], age[sex=="women"], fitted.values(m2_d)[sex=="women"], col=colors[sex=="women"]))
```

Retrieve coefficients. 

```{r}
coef(m2_d)
b_intercept <- coef(m2_d)[1]
b_sex <- coef(m2_d)[2]
b_age <- coef(m2_d)[3]
b_educ <- coef(m2_d)[4]
b_sex_educ <- coef(m2_d)[5]
b_age_educ <- coef(m2_d)[6]
```

## marginal effect sex

Let's use power of R

```{r}
ME_sex <- plogis(predict(m2_d, newdata=dfs1)) - plogis(predict(m2_d, newdata=dfs0))
AME_sex <- mean(ME_sex)
AME_sex
```

## marginal effect age  

```{r}
ME_age <- with(df, dlogis(predict(m2_d))*(b_age + b_age_educ*educ))
AME_age <- mean(ME_age)
AME_age
```

## marginal effect educ 

```{r}
ME_educ <- with(df, dlogis(predict(m2_d))*(b_educ + b_sex_educ*(sex=="women") + b_age_educ*age)) 
AME_educ <- mean(ME_educ)
AME_educ
```

## check with R

```{r}
summary(margins(m2_d))

```


## Marginal interaction effects.

`r colorize("Warning, calculating the derivatives may be difficult depending on f(x), and your mathematical background ;-)", "red")`

We need the double derivative of $f(x)$. 
We will call this function `ddlogis`.

```{r}
# f(X) = P(Y=1|X) = n(X) = exp(X) / (1 + exp(X))  = 1 / (1+exp(-X))
nX <- expression(1 / (1+exp(-X))) 
#similar as dlogis
dnXdx <- D(nX, "X")
#higher order derivative 
d2nxdxdx <- D(dnXdx, "X")
#d2nxdxdx
ddlogis <- function(X){-(exp(-X)/(1 + exp(-X))^2 - exp(-X) * (2 * (exp(-X) * (1 + exp(-X))))/((1 +     exp(-X))^2)^2)}
```

## marginal interaction effect educ*age

```{r}
MEeducage <- with(df, ddlogis(predict(m2_d))*(b_educ + b_age_educ*age)*(b_age + b_age_educ * educ) + dlogis(predict(m2_d))*(b_age_educ))

AMEeducage <- mean(MEeducage)
AMEeducage
```
### Numerical approach


```{r}
s <- .001 #define a small step

#define datasets
dfplusplus <- dfplusmin <- dfminplus <- dfminmin <- df 
#add the small step to the variables
dfplusmin$age <- dfplusplus$age <- df$age + s 
dfminmin$age <- dfminplus$age <- df$age - s
dfplusplus$educ <- dfminplus$educ <- df$educ + s
dfplusmin$educ <- dfminmin$educ <- df$educ - s

#calculate the predicted probabilities
p11 <- plogis(predict(m2_d, dfplusplus))
p10 <- plogis(predict(m2_d, dfplusmin))
p01 <- plogis(predict(m2_d, dfminplus))
p00 <- plogis(predict(m2_d, dfminmin))

#and the marginal effects. be aware of all the brackets. :-(
am <- (((p11 - p01)/(2*s)) - ((p10 - p00)/(2*s))) / (2*s)
mean(am)
```



<br>

> Exercise: Suppose that all estimated interaction terms are 0 in model m2_d what would be the marginal effect of the interaction effect of education with age according to the above? Thus what would be: MEeducage. 


## marginal interaction effect sex*educ

First calculate derivative to sex and then to educ

```{r}
dfxdsex <- with(df, plogis(predict(m2_d, dfs1)) - plogis(predict(m2_d, dfs0)))

MEsexeduc  <- with(df, dlogis(predict(m2_d, dfs1))*(b_educ + b_sex_educ*1 + b_age_educ*age) - dlogis(predict(m2_d, dfs0))*(b_educ + b_sex_educ*0 + b_age_educ*age))  

AMEsexeduc <- mean(MEsexeduc)
AMEsexeduc

``` 
### Numerical approach


```{r}
s <- .001 #define a small step

#define datasets
dfplusplus <- dfplusmin <- dfminplus <- dfminmin <- df 
#add the small step to the variables
dfplusmin$sex <- dfplusplus$sex <- "women"
dfminmin$sex <- dfminplus$sex <- "men"
dfplusplus$educ <- dfminplus$educ <- df$educ + s
dfplusmin$educ <- dfminmin$educ <- df$educ - s

#calculate the predicted probabilities
p11 <- plogis(predict(m2_d, dfplusplus))
p10 <- plogis(predict(m2_d, dfplusmin))
p01 <- plogis(predict(m2_d, dfminplus))
p00 <- plogis(predict(m2_d, dfminmin))

#and the marginal effects. be aware of all the brackets. :-(
am <- ((p11 - p01) - (p10 - p00)) / (2*s)
mean(am)

```
Let us plot these interaction effects against the fitted values. 

```{r}
plot(fitted.values(m2_d), MEsexeduc, col=colors)
```
<br>
You see that at some points the interaction is negative and at some it is positive.


## Check with R2 


```{r}
results<- intEff(m2_d, vars=c("age", "educ"), data=df)
mean(results$byobs$int$int_eff)
```
quite close. 

```{r}
results<- intEff(m2_d, vars=c("sex", "educ"), data=df)
mean(results$byobs$int$int_eff)

```
This is not the same as above, probably because function cannot handle double interactions??  


---  

# Take home message

* When you use a non-linear model, the interpretation of marginal effects is difficult.  
* Only when you have a non-linear model, the use of marginal effects makes sense.  
* MEM: How useful are they?  
* The absence or presence of an estimated interaction term does not say much about the presence or absence of a (marginal) interaction effect when your link function is not linear. 
* R does not provide packages/functions that calculate marginal effects for interaction effects for non-linear models. 

---  

# SE of marginal effects. 

Well, you made it this far in the tutorial. The goal was to show you the underlying idea and interpretation of marginal effects. Now, that you understand, you can apply the standard packages in R to calculate the marginal effects for you, like `margin`. 
There is, however, one caveat. These packages are not able to calculate (correct) marginal effects of interaction effects and not all non-linear models are supported. This means you sometimes have to calculate the marginal effects yourself. Which can be a challenge but as you see, is doable. But if you calculate the ME yourself you might also want to know the standard error (SE) or confidence-interval (CI) of the marginal effect. 

There are two common approaches:  

- Delta method  
- Bootstrapping  

The Delta method is default in most packages but it is quite crude. 
Bootstrapping is slow but quite easy to implement yourself. For more background reading on bootstrapping you could start here: 

Davison, A.C. and Hinkley, D.V. (1997) *Bootstrap Methods and Their Application.* Cambridge University Press.



## Example 1: Linear without interactions

```{r, results='hold'}
#we need to define a function in which we calculate our marginal effects. This is quite easy because it is simply copy paste from the above. 

bootFunc <- function(data, i){
  df <- data[i,] #these are the df we will calculate our statistics of interest
  m1 <- lm(y_lin ~ sex + age + educ, data=df) #we first estimate the model on the generated datasets
  c(coef(m1)[2], coef(m1)[3], coef(m1)[4]) #we save our statistics of interest in a vector
}

b <- boot(df, bootFunc, R = 999) #we feet the boot function our original dataset and our function
b #names are not very nice. Well, you will manage. 

#boot.ci(b, index=1) #you could ask for the CI
#summary(margins(m1, vce="bootstrap", iterations=999)) #SE via bootstrapping. Please notice the order of the variabels is messed up. #you could check with if margins gives the same results. Spoiler alert: YES!
```
<br> 

Okay, our estimates of the AME and the SE are very close. Our boot function is a lot faster than the function implemented in `margin`. Don't know why. 

## Example 2: Logistic model with interactions
```{r}

ddlogis <- function(X){-(exp(-X)/(1 + exp(-X))^2 - exp(-X) * (2 * (exp(-X) * (1 + exp(-X))))/((1 +     exp(-X))^2)^2)} #dont forget we need this one as well. 

bootFunc <- function(data, i){
  df <- data[i,] #bootrap datasets
  
  m2_d <- glm(y_di ~ sex + age + educ + sex:educ + age:educ, family=binomial, data=df) #estimate model
  
  #retrieve coefficients
  b_intercept <- coef(m2_d)[1]
  b_sex <- coef(m2_d)[2]
  b_age <- coef(m2_d)[3]
  b_educ <- coef(m2_d)[4]
  b_sex_educ <- coef(m2_d)[5]
  b_age_educ <- coef(m2_d)[6]
  
  #AME sex
  newdata_sex0 <- df
  newdata_sex0$sex <- "men" #use the correct factor levels!!
  newdata_sex1 <- df
  newdata_sex1$sex <- "women" #use the correct factor levels!!
  ME_sex <- plogis(predict(m2_d, newdata=newdata_sex1)) - plogis(predict(m2_d, newdata=newdata_sex0))
  AME_sex <- mean(ME_sex)
  
  #AME age
  ME_age <- with(df, dlogis(predict(m2_d))*(b_age + b_age_educ*educ))
  AME_age <- mean(ME_age)
  
  #AME educ
  ME_educ <- with(df, dlogis(predict(m2_d))*(b_educ + b_sex_educ*(sex=="women") + b_age_educ*age)) 
  AME_educ <- mean(ME_educ)

  #AME interaction effect sex*educ
  MEsexeduc  <- with(df, dlogis(predict(m2_d, dfs1))*(b_educ + b_sex_educ*1 + b_age_educ*age) - dlogis(predict(m2_d, dfs0))*(b_educ + b_sex_educ*0 + b_age_educ*age))  
  AMEsexeduc <- mean(MEsexeduc)
  
  #AME interaction effect age*educ
  MEeducage <- with(df, ddlogis(predict(m2_d))*(b_educ + b_age_educ*age)*(b_age + b_age_educ * educ) + dlogis(predict(m2_d))*(b_age_educ))
  AMEeducage <- mean(MEeducage)

  
  c(AME_sex, AME_age, AME_educ, AMEsexeduc, AMEeducage) #save results
}

b <- boot(df, bootFunc, R = 999)


b<- summary(b)
rownames(b) <- c("AME_sex", "AME_age", "AME_educ", "AME_sex*educ", "AME_educ*age")
b
```




<style>
.center {
  text-align: center;
  color: red;
}
</style>

<hr>
<br>
<p class="center">Copyright &copy; 2022 Jochem Tolsma</p>

