SHAP Values of Additive Models (2025)

Table of Contents
The models SHAP Final words Related

Posted on June 28, 2024 by Michael Mayer in R bloggers | 0 Comments

[This article was first published on R – Michael's and Christian's Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Within only a few years, SHAP (Shapley additive explanations) has emerged as the number 1 way to investigate black-box models. The basic idea is to decompose model predictions into additive contributions of the features in a fair way. Studying decompositions of many predictions allows to derive global properties of the model.

What happens if we apply SHAP algorithms to additive models? Why would this ever make sense?

In the spirit of our “Lost In Translation” series, we provide both high-quality Python and R code.

The models

Let’s build the models using a dataset with three highly correlated covariates and a (deterministic) response.

R

Python

library(lightgbm)library(kernelshap)library(shapviz)#===================================================================# Make small data#===================================================================make_data <- function(n = 100) { x1 <- seq(0.01, 1, length = n) data.frame( x1 = x1, x2 = log(x1), x3 = x1 > 0.7 ) |> transform(y = 1 + 0.2 * x1 + 0.5 * x2 + x3 + 10 * sin(2 * pi * x1))}df <- make_data()head(df)cor(df) |> round(2)# x1 x2 x3 y# x1 1.00 0.90 0.80 -0.72# x2 0.90 1.00 0.58 -0.53# x3 0.80 0.58 1.00 -0.59# y -0.72 -0.53 -0.59 1.00#===================================================================# Additive linear model and additive boosted trees#===================================================================# Linear regressionfit_lm <- lm(y ~ poly(x1, 3) + poly(x2, 3) + x3, data = df)summary(fit_lm)# Boosted treesxvars <- setdiff(colnames(df), "y")X <- data.matrix(df[xvars])params <- list( learning_rate = 0.05, objective = "mse", max_depth = 1, colsample_bynode = 0.7)fit_lgb <- lgb.train( params = params, data = lgb.Dataset(X, label = df$y), nrounds = 300)
import numpy as npimport lightgbm as lgbimport shapfrom sklearn.preprocessing import PolynomialFeaturesfrom sklearn.compose import ColumnTransformerfrom sklearn.pipeline import Pipelinefrom sklearn.linear_model import LinearRegression#===================================================================# Make small data#===================================================================def make_data(n=100): x1 = np.linspace(0.01, 1, n) x2 = np.log(x1) x3 = x1 > 0.7 X = np.column_stack((x1, x2, x3)) y = 1 + 0.2 * x1 + 0.5 * x2 + x3 + np.sin(2 * np.pi * x1) return X, yX, y = make_data()#===================================================================# Additive linear model and additive boosted trees#===================================================================# Linear model with polynomial termspoly = PolynomialFeatures(degree=3, include_bias=False)preprocessor = ColumnTransformer( transformers=[ ("poly0", poly, [0]), ("poly1", poly, [1]), ("other", "passthrough", [2]), ])model_lm = Pipeline( steps=[ ("preprocessor", preprocessor), ("lm", LinearRegression()), ])_ = model_lm.fit(X, y)# Boosted trees with single-split treesparams = dict( learning_rate=0.05, objective="mse", max_depth=1, colsample_bynode=0.7,)model_lgb = lgb.train( params=params, train_set=lgb.Dataset(X, label=y), num_boost_round=300,)

SHAP

For both models, we use exact permutation SHAP and exact Kernel SHAP. Furthermore, the linear model is analyzed with “additive SHAP”, and the tree-based model with TreeSHAP.

Do the algorithms provide the same?

R

Python

system.time({ # 1s shap_lm <- list( add = shapviz(additive_shap(fit_lm, df)), kern = kernelshap(fit_lm, X = df[xvars], bg_X = df), perm = permshap(fit_lm, X = df[xvars], bg_X = df) ) shap_lgb <- list( tree = shapviz(fit_lgb, X), kern = kernelshap(fit_lgb, X = X, bg_X = X), perm = permshap(fit_lgb, X = X, bg_X = X) )})# Consistent SHAP values for linear regressionall.equal(shap_lm$add$S, shap_lm$perm$S)all.equal(shap_lm$kern$S, shap_lm$perm$S)# Consistent SHAP values for boosted treesall.equal(shap_lgb$lgb_tree$S, shap_lgb$lgb_perm$S)all.equal(shap_lgb$lgb_kern$S, shap_lgb$lgb_perm$S)# Linear coefficient of x3 equals slope of SHAP valuestail(coef(fit_lm), 1) # 0.682815diff(range(shap_lm$kern$S[, "x3"])) # 0.682815sv_dependence(shap_lm$add, xvars)sv_dependence(shap_lm$add, xvars, color_var = NULL)
shap_lm = { "add": shap.Explainer(model_lm.predict, masker=X, algorithm="additive")(X), "perm": shap.Explainer(model_lm.predict, masker=X, algorithm="exact")(X), "kern": shap.KernelExplainer(model_lm.predict, data=X).shap_values(X),}shap_lgb = { "tree": shap.Explainer(model_lgb)(X), "perm": shap.Explainer(model_lgb.predict, masker=X, algorithm="exact")(X), "kern": shap.KernelExplainer(model_lgb.predict, data=X).shap_values(X),}# Consistency for additive linear regressioneps = 1e-12assert np.abs(shap_lm["add"].values - shap_lm["perm"].values).max() < epsassert np.abs(shap_lm["perm"].values - shap_lm["kern"]).max() < eps# Consistency for additive boosted treesassert np.abs(shap_lgb["tree"].values - shap_lgb["perm"].values).max() < epsassert np.abs(shap_lgb["perm"].values - shap_lgb["kern"]).max() < eps# Linear effect of last feature in the fitted modelmodel_lm.named_steps["lm"].coef_[-1] # 1.112096# Linear effect of last feature derived from SHAP values (ignore the sign)shap_lm["perm"][:, 2].values.ptp() # 1.112096shap.plots.scatter(shap_lm["add"])
SHAP Values of Additive Models (1)

Yes – the three algorithms within model provide the same SHAP values. Furthermore, the SHAP values reconstruct the additive components of the features.

Didactically, this is very helpful when introducing SHAP as a method: Pick a white-box and a black-box model and compare their SHAP dependence plots. For the white-box model, you simply see the additive components, while the dependence plots of the black-box model show scatter due to interactions.

Remark: The exact equivalence between algorithms is lost, when

  • there are too many features for exact procedures (~10+ features), and/or when
  • the background data of Kernel/Permutation SHAP does not agree with the training data. This leads to slightly different estimates of the baseline value, which itself influences the calculation of SHAP values.

Final words

  • SHAP algorithms applied to additive models typically give identical results. Slight differences might occur because sampling versions of the algos are used, or a different baseline value is estimated.
  • The resulting SHAP values describe the additive components.
  • Didactically, it helps to see SHAP analyses of white-box and black-box models side by side.

R script , Python notebook

Related

To leave a comment for the author, please follow the link and comment on their blog: R – Michael's and Christian's Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

SHAP Values of Additive Models (2025)
Top Articles
Latest Posts
Recommended Articles
Article information

Author: The Hon. Margery Christiansen

Last Updated:

Views: 6029

Rating: 5 / 5 (50 voted)

Reviews: 81% of readers found this page helpful

Author information

Name: The Hon. Margery Christiansen

Birthday: 2000-07-07

Address: 5050 Breitenberg Knoll, New Robert, MI 45409

Phone: +2556892639372

Job: Investor Mining Engineer

Hobby: Sketching, Cosplaying, Glassblowing, Genealogy, Crocheting, Archery, Skateboarding

Introduction: My name is The Hon. Margery Christiansen, I am a bright, adorable, precious, inexpensive, gorgeous, comfortable, happy person who loves writing and wants to share my knowledge and understanding with you.