% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/rf.R
\name{rf}
\alias{rf}
\title{Random forest models with Moran's I test of the residuals}
\usage{
rf(
  data = NULL,
  dependent.variable.name = NULL,
  predictor.variable.names = NULL,
  distance.matrix = NULL,
  distance.thresholds = NULL,
  xy = NULL,
  ranger.arguments = NULL,
  scaled.importance = FALSE,
  seed = 1,
  verbose = TRUE,
  n.cores = parallel::detectCores() - 1,
  cluster = NULL
)
}
\arguments{
\item{data}{Data frame with a response variable and a set of predictors. Default: \code{NULL}}

\item{dependent.variable.name}{Character string with the name of the response variable. Must be in the column names of \code{data}. If the dependent variable is binary with values 1 and 0, the argument \code{case.weights} of \code{ranger} is populated by the function \code{\link[=case_weights]{case_weights()}}. Default: \code{NULL}}

\item{predictor.variable.names}{Character vector with the names of the predictive variables. Every element of this vector must be in the column names of \code{data}. Optionally, the result of \code{\link[=auto_cor]{auto_cor()}} or \code{\link[=auto_vif]{auto_vif()}}. Default: \code{NULL}}

\item{distance.matrix}{Squared matrix with the distances among the records in \code{data}. The number of rows of \code{distance.matrix} and \code{data} must be the same. If not provided, the computation of the Moran's I of the residuals is omitted. Default: \code{NULL}}

\item{distance.thresholds}{Numeric vector with neighborhood distances. All distances in the distance matrix below each value in \code{dustance.thresholds} are set to 0 for the computation of Moran's I. If \code{NULL}, it defaults to seq(0, max(distance.matrix), length.out = 4). Default: \code{NULL}}

\item{xy}{(optional) Data frame or matrix with two columns containing coordinates and named "x" and "y". It is not used by this function, but it is stored in the slot \code{ranger.arguments$xy} of the model, so it can be used by \code{\link[=rf_evaluate]{rf_evaluate()}} and \code{\link[=rf_tuning]{rf_tuning()}}. Default: \code{NULL}}

\item{ranger.arguments}{Named list with \link[ranger]{ranger} arguments (other arguments of this function can also go here). All \link[ranger]{ranger} arguments are set to their default values except for 'importance', that is set to 'permutation' rather than 'none'. The ranger arguments \code{x}, \code{y}, and \code{formula} are disabled. Please, consult the help file of \link[ranger]{ranger} if you are not familiar with the arguments of this function.}

\item{scaled.importance}{Logical, if \code{TRUE}, the function scales \code{data} with \link[base]{scale} and fits a new model to compute scaled variable importance scores. This makes variable importance scores of different models somewhat comparable. Default: \code{FALSE}}

\item{seed}{Integer, random seed to facilitate reproducibility. If set to a given number, the returned model is always the same. Default: \code{1}}

\item{verbose}{Boolean. If TRUE, messages and plots generated during the execution of the function are displayed. Default: \code{TRUE}}

\item{n.cores}{Integer, number of cores to use. Default: \code{parallel::detectCores() - 1}}

\item{cluster}{A cluster definition generated with \code{parallel::makeCluster()}. This function does not use the cluster, but can pass it on to other functions when using the \verb{\%>\%} pipe. It will be stored in the slot \code{cluster} of the output list. Default: \code{NULL}}
}
\value{
A ranger model with several extra slots:
\itemize{
\item \code{ranger.arguments}: Stores the values of the arguments used to fit the ranger model.
\item \code{importance}: A list containing a data frame with the predictors ordered by their importance, a ggplot showing the importance values, and local importance scores (difference in accuracy between permuted and non permuted variables for every case, computed on the out-of-bag data).
\item \code{performance}: performance scores: R squared on out-of-bag data, R squared (cor(observed, predicted) ^ 2), pseudo R squared (cor(observed, predicted)), RMSE, and normalized RMSE (NRMSE).
\item \code{residuals}: residuals, normality test of the residuals computed with \code{\link[=residuals_test]{residuals_test()}}, and spatial autocorrelation of the residuals computed with \code{\link[=moran_multithreshold]{moran_multithreshold()}}.
}
}
\description{
A convenient wrapper for \link[ranger]{ranger} that completes its output by providing the Moran's I of the residuals for different distance thresholds, the rmse and nrmse (as computed by \code{\link[=root_mean_squared_error]{root_mean_squared_error()}}), and variable importance scores based on a scaled version of the data generated by \link[base]{scale}.
}
\details{
Please read the help file of \link[ranger]{ranger} for further details. Notice that the \code{formula} interface of \link[ranger]{ranger} is supported through \code{ranger.arguments}, but variable interactions are not allowed (but check \code{\link[=the_feature_engineer]{the_feature_engineer()}}).
}
\examples{
if(interactive()){

 #loading example data
 data("plant_richness_df")
 data("distance_matrix")

 #fittind random forest model
 out <- rf(
   data = plant_richness_df,
   dependent.variable.name = "richness_species_vascular",
   predictor.variable.names = colnames(plant_richness_df)[5:21],
   distance.matrix = distance_matrix,
   distance.thresholds = 0,
   n.cores = 1
 )

 class(out)

 #data frame with ordered variable importance
 out$importance$per.variable

 #variable importance plot
 out$importance$per.variable.plot

 #performance
 out$performance

 #spatial correlation of the residuals
 out$spatial.correlation.residuals$per.distance

 #plot of the Moran's I of the residuals for different distance thresholds
 out$spatial.correlation.residuals$plot

 #predictions for new data as done with ranger models:
 predicted <- stats::predict(
   object = out,
   data = plant_richness_df,
   type = "response"
 )$predictions

 #alternative data input methods
 ###############################

 #ranger.arguments can contain ranger arguments and any other rf argument
 my.ranger.arguments <- list(
 data = plant_richness_df,
 dependent.variable.name = "richness_species_vascular",
 predictor.variable.names = colnames(plant_richness_df)[8:21],
 distance.matrix = distance_matrix,
 distance.thresholds = c(0, 1000)
 )

 #fitting model with these ranger arguments
 out <- rf(
   ranger.arguments = my.ranger.arguments,
   n.cores = 1
   )

}
}
