---
title: "xgboost - Gradient Boosting"
output: html_document
vignette: >
%\VignetteEncoding{UTF-8}
%\VignetteIndexEntry{xgboost - Gradient Boosting}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
chunk_output_type: console
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "vig/"
)
options(rmarkdown.html_vignette.check_title = FALSE)
```
This guide is designed as a quick-stop reference of how to use some of the more popular machine learning R packages with `vivid`. In the following examples, we use the air quality data for regression and the iris data for classification.
### xgboost - eXtreme Gradient Boosting
The `xgboost` package (short for eXtreme Gradient Boosting) is an implementation of gradient boosting that supports regression and classification.
```{r, message=FALSE}
library('vivid')
library("xgboost")
```
As seen in Section `Custom Predict Function`, the `xgboost` package requires the user to supply a custom predict function to work with `vivid`. When setting the `data` argument in `xgboost`, remember to include all the variables (including the response). When producing the custom predict function, the structure must match that in the below example. Note that the term `data` must be used and not the actual name of the data.
```{r, eval = F}
# load data
aq <- na.omit(airquality)
# build xgboost model
gbst <- xgboost(data = as.matrix(aq[,1:6]),
label = as.matrix(aq[,1]),
nrounds = 100,
verbose = 0)
# predict function for GBM
pFun <- function(fit, data, ...) predict(fit, as.matrix(data[,1:6]))
# vivid
vi <- vivi(data = aq, fit = gbst, response = 'Ozone', predictFun = pFun)
```
#### Heatmap
```{r, bst_r_heat, out.width = '100%', eval = F}
viviHeatmap(mat = vi)
```
```{r, echo = F, out.width = '100%'}
knitr::include_graphics("https://raw.githubusercontent.com/AlanInglis/vivid/master/vignettes/vig/bst_r_heat-1.png")
```
Figure 1: Heatmap of a xgboost regression fit displaying 2-way interaction strength on the off diagonal and individual variable importance on the diagonal.
#### PDP
```{r, bst_r_pdp, out.width='100%', eval = F}
pdpPairs(data = aq,
fit = gbst,
response = "Ozone",
nmax = 50,
gridSize = 4,
nIce = 10,
predictFun = pFun)
```
```{r, echo = F, out.width = '100%'}
knitr::include_graphics("https://raw.githubusercontent.com/AlanInglis/vivid/master/vignettes/vig/bst_r_pdp-1.png")
```
Figure 2: Generalized pairs partial dependence plot for a xgboost regression fit.
### Classification
```{r, eval = F}
iris$Species <- as.numeric(iris$Species) - 1
# Create a DMatrix object
dtrain <- xgb.DMatrix(data = as.matrix(iris[, -5]), label = iris$Species)
# Set parameters
params <- list(
objective = "multi:softprob",
num_class = 3,
eval_metric = "mlogloss"
)
# Train the model
bst_model <- xgb.train(params, dtrain, nrounds = 100)
# Define the custom prediction function
pFun <- function(fit, newdata,...) {
# Create a DMatrix object from the new data
dnewdata <- xgb.DMatrix(data = as.matrix(newdata)[,-5])
# Use the predict method from xgboost to get predictions
preds <- predict(bst_model, dnewdata)
# Since xgboost returns probabilities for each class,
# we convert them to class labels
pred_labels <- max.col(matrix(preds, ncol = length(unique(newdata$Species)), byrow = TRUE)) - 1
# If the function expects probabilities, you can return 'preds' instead
# Otherwise, return the predicted class labels
return(pred_labels)
}
# vivid
vi <- vivi(data = iris, fit = bst_model, response = 'Species', class = 'setosa', predictFun = pFun)
```
```{r, bst_c_heat, eval = F}
viviHeatmap(mat = vi)
```
```{r, echo = F, out.width = '100%'}
knitr::include_graphics("https://raw.githubusercontent.com/AlanInglis/vivid/master/vignettes/vig/bst_c_heat-1.png")
```
Figure 3: Heatmap of a xgboost classification fit displaying 2-way interaction strength on the off diagonal and individual variable importance on the diagonal.