--- title: "randomForest - Random Forest" output: html_document vignette: > %\VignetteEncoding{UTF-8} %\VignetteIndexEntry{randomForest - Random Forest} %\VignetteEngine{knitr::rmarkdown} editor_options: chunk_output_type: console --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "vig/" ) options(rmarkdown.html_vignette.check_title = FALSE) ``` This guide is designed as a quick-stop reference of how to use some of the more popular machine learning R packages with `vivid`. In the following examples, we use the air quality data for regression and the iris data for classification. ### randomForest - Random Forest The `randomForest` package in R implements the Random Forest algorithm for classification and regression, a popular ensemble method that builds multiple decision trees during training and aggregates their results for predictions. ```{r, message=FALSE} library('vivid') library("randomForest") ``` ### Regression ```{r, eval = F} # load data aq <- na.omit(airquality) # build rf model rf <- randomForest(Ozone ~ ., data = aq) # vivid vi <- vivi(data = aq, fit = rf, response = 'Ozone') ``` #### Heatmap ```{r, rf_r_heat, out.width = '100%', eval = F} viviHeatmap(mat = vi) ``` ```{r, echo = F, out.width = '100%'} knitr::include_graphics("https://raw.githubusercontent.com/AlanInglis/vivid/master/vignettes/vig/rf_r_heat-1.png") ```
Figure 1: Heatmap of a random forest regression fit displaying 2-way interaction strength on the off diagonal and individual variable importance on the diagonal.
#### PDP ```{r, rf_r_pdp, out.width='100%', eval = F} pdpPairs(data = aq, fit = rf, response = "Ozone", nmax = 500, gridSize = 20, nIce = 100) ``` ```{r, echo = F, out.width = '100%'} knitr::include_graphics("https://raw.githubusercontent.com/AlanInglis/vivid/master/vignettes/vig/rf_r_pdp-1.png") ```
Figure 2: Generalized pairs partial dependence plot for a random forest regression fit.
### Classification ```{r, eval = F} # Load the iris dataset data(iris) # Train rf <- randomForest(Species ~ ., data = iris) vi <- vivi(data = iris, fit = rf, response = 'Species', class = 'setosa') ``` #### Heatmap ```{r, rf_c_heat, out.width = '100%', eval = F} viviHeatmap(mat = vi) ``` ```{r, echo = F, out.width = '100%'} knitr::include_graphics("https://raw.githubusercontent.com/AlanInglis/vivid/master/vignettes/vig/rf_c_heat-1.png") ```
Figure 3: Heatmap of a random forest classification fit displaying 2-way interaction strength on the off diagonal and individual variable importance on the diagonal.
#### PDP ```{r, rf_c_pdp, out.width='100%', eval = F} pdpPairs(data = iris, fit = rf, response = "Species", nmax = 50, gridSize = 4, nIce = 10, class = 'setosa') ``` ```{r, echo = F, out.width = '100%'} knitr::include_graphics("https://raw.githubusercontent.com/AlanInglis/vivid/master/vignettes/vig/rf_c_pdp-1.png") ```
Figure 4: Generalized pairs partial dependence plot for a random forest classification fit.