| Type: | Package |
| Title: | An Implementation of Isolation Forest |
| Version: | 1.1.3 |
| Description: | Isolation forest is anomaly detection method introduced by the paper Isolation based Anomaly Detection (Liu, Ting and Zhou <doi:10.1145/2133360.2133363>). |
| URL: | https://github.com/talegari/solitude |
| BugReports: | https://github.com/talegari/solitude/issues |
| Imports: | ranger (≥ 0.11.0), data.table (≥ 1.11.4), igraph (≥ 1.2.2), future.apply (≥ 0.2.0), R6 (≥ 2.4.0), lgr (≥ 0.3.4), |
| Depends: | R (≥ 3.5.0), |
| Suggests: | tidyverse, uwot, mlbench, rsample |
| License: | GPL-3 |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.1.1 |
| NeedsCompilation: | no |
| Packaged: | 2021-07-29 19:14:19 UTC; dattachidambara |
| Author: | Komala Sheshachala Srikanth [aut, cre], David Zimmermann [ctb] |
| Maintainer: | Komala Sheshachala Srikanth <sri.teach@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2021-07-29 20:00:02 UTC |
An Implementation of Isolation Forest
Description
Isolation forest is an anomaly detection method introduced by the paper Isolation based Anomaly Detection (Liu, Ting and Zhou <doi:10.1145/2133360.2133363>)
Author(s)
Srikanth Komala Sheshachala
See Also
Useful links:
Check for a single integer
Description
for a single integer
Usage
is_integerish(x)
Arguments
x |
input |
Value
TRUE or FALSE
Examples
## Not run: is_integerish(1)
Fit an Isolation Forest
Description
'solitude' class implements the isolation forest method
introduced by paper Isolation based Anomaly Detection (Liu, Ting and Zhou
<doi:10.1145/2133360.2133363>). The extremely randomized trees (extratrees)
required to build the isolation forest is grown using
ranger function from ranger package.
Design
$new() initiates a new 'solitude' object. The
possible arguments are:
-
sample_size: (positive integer, default = 256) Number of observations in the dataset to used to build a tree in the forest -
num_trees: (positive integer, default = 100) Number of trees to be built in the forest -
replace: (boolean, default = FALSE) Whether the sample of observations should be chosen with replacement when sample_size is less than the number of observations in the dataset -
seed: (positive integer, default = 101) Random seed for the forest -
nproc: (NULL or a positive integer, default: NULL, means use all resources) Number of parallel threads to be used by ranger -
respect_unordered_factors: (string, default: "partition")See respect.unordered.factors argument inranger -
max_depth: (positive number, default: ceiling(log2(sample_size))) See max.depth argument inranger
$fit() fits a isolation forest for the given dataframe or sparse matrix, computes
depths of terminal nodes of each tree and stores the anomaly scores and
average depth values in $scores object as a data.table
$predict() returns anomaly scores for a new data as a data.table
Details
Parallelization:
rangeris parallelized and by default uses all the resources. This is supported when nproc is set to NULL. The process of obtaining depths of terminal nodes (which is excuted with$fit()is called) may be parallelized separately by setting up a future backend.
Methods
Public methods
Method new()
Usage
isolationForest$new( sample_size = 256, num_trees = 100, replace = FALSE, seed = 101, nproc = NULL, respect_unordered_factors = NULL, max_depth = ceiling(log2(sample_size)) )
Method fit()
Usage
isolationForest$fit(dataset)
Method predict()
Usage
isolationForest$predict(data)
Method clone()
The objects of this class are cloneable with this method.
Usage
isolationForest$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
Examples
## Not run:
library("solitude")
library("tidyverse")
library("mlbench")
data(PimaIndiansDiabetes)
PimaIndiansDiabetes = as_tibble(PimaIndiansDiabetes)
PimaIndiansDiabetes
splitter = PimaIndiansDiabetes %>%
select(-diabetes) %>%
rsample::initial_split(prop = 0.5)
pima_train = rsample::training(splitter)
pima_test = rsample::testing(splitter)
iso = isolationForest$new()
iso$fit(pima_train)
scores_train = pima_train %>%
iso$predict() %>%
arrange(desc(anomaly_score))
scores_train
umap_train = pima_train %>%
scale() %>%
uwot::umap() %>%
setNames(c("V1", "V2")) %>%
as_tibble() %>%
rowid_to_column() %>%
left_join(scores_train, by = c("rowid" = "id"))
umap_train
umap_train %>%
ggplot(aes(V1, V2)) +
geom_point(aes(size = anomaly_score))
scores_test = pima_test %>%
iso$predict() %>%
arrange(desc(anomaly_score))
scores_test
## End(Not run)
Depth of each terminal node of all trees in a ranger model
Description
Depth of each terminal node of all trees in a ranger model is returned as a three column tibble with column names: 'id_tree', 'id_node', 'depth'. Note that root node has the node_id = 0.
Usage
terminalNodesDepth(model)
Arguments
model |
A ranger model |
Details
This function may be parallelized using a future backend.
Value
A tibble with three columns: 'id_tree', 'id_node', 'depth'.
Examples
rf = ranger::ranger(Species ~ ., data = iris, num.trees = 100)
terminalNodesDepth(rf)
Depth of each terminal node of a single tree in a ranger model
Description
Depth of each terminal node of a single tree in a ranger model. Note that root node has the id_node = 0.
Usage
terminalNodesDepthPerTree(treelike)
Arguments
treelike |
Output of 'ranger::treeInfo' |
Value
data.table with two columns: id_node and depth
Examples
## Not run:
rf = ranger::ranger(Species ~ ., data = iris)
terminalNodesDepthPerTree(ranger::treeInfo(rf, 1))
## End(Not run)