🌲 Implementation of the Robust Random Cut Forest Algorithm for anomaly detection on streams
View the Project on GitHub kLabUM/rrcf
The (robust) random cut tree is the core data structure of the rrcf
library and is represented by the rrcf.RCTree
class.
A (robust) random cut tree is a binary search tree that can be used to detect outliers in a point set.
Points located nearer to the root of the tree are more likely to be outliers.
RCTree
RCTree
from existing dataA (robust) random cut tree can be instantiated from a point set \((n \times d)\), where \(n\) is the number of points and \(d\) is the dimension of each point.
In this case, we instantiate a tree from a collection of 6 two-dimensional points distributed according to the standard normal distribution.
import numpy as np
import rrcf
X = np.random.randn(6, 2)
tree = rrcf.RCTree(X)
The resulting tree can be inspected simply by calling the instance object:
>>> tree
─+
├───+
│ ├───+
│ │ ├──(0)
│ │ └───+
│ │ ├──(5)
│ │ └──(4)
│ └───+
│ ├──(2)
│ └──(3)
└──(1)
RCTree
A random cut tree can also be instantiated with no points
tree = rrcf.RCTree()
The tree contains the following attributes:
root
leaves
ndim
The tree is composed of two types of nodes: branches and leaves.
Note that the root node will have no parent node (i.e. the parent will be None
).
A branch is a node that represents a partition (cut) in the point set and is represented by the rrcf.Branch
class.
The branch class contains the following attributes.
q
p
l
r
u
None
)n
b
A leaf is a node that represents an individual point in the point set and is represented by the rrcf.Leaf
class.
The leaf class contains the following attributes.
i
d
u
x
n
b