🌲 Implementation of the Robust Random Cut Forest Algorithm for anomaly detection on streams
View the Project on GitHub kLabUM/rrcf
This section enumerates all the methods of the RCTree
class
insert_point
(point, index, tolerance=None)
Inserts a point into the tree, creating a new leaf with given index
Parameters:
Argument | Type | Description |
---|---|---|
point |
numpy ndarray (1 x d) |
Data point |
index |
any hashable type | Identifier for new leaf in tree |
tolerance |
float | Tolerance for determining duplicate points |
Returns:
Object | Type | Description |
---|---|---|
leaf |
Leaf instance |
New leaf in tree |
Example:
>>> tree = rrcf.RCTree()
# Insert a point
>>> x = np.random.randn(2)
>>> tree.insert_point(x, index=0)
Leaf(0)
forget_point
(index)
Delete leaf from tree
Parameters:
Argument | Type | Description |
---|---|---|
index |
any hashable type | Index of leaf in tree |
Returns:
Object | Type | Description |
---|---|---|
leaf |
Leaf instance |
Deleted leaf |
Example:
# Create empty RCTree
>>> tree = rrcf.RCTree()
# Insert a point
>>> x = np.random.randn(2)
>>> tree.insert_point(x, index=0)
# Forget point
>>> tree.forget_point(0)
Leaf(0)
query
(point, node=None)
Search for leaf nearest to point
Parameters:
Argument | Type | Description |
---|---|---|
point |
numpy ndarray (1 x d) |
Data point |
node |
Branch instance |
Node to begin traversal (defaults to root) |
Returns:
Object | Type | Description |
---|---|---|
leaf |
Leaf instance |
Leaf nearest to queried point in tree |
Example:
# Create RCTree
>>> X = np.random.randn(10, 2)
>>> tree = rrcf.RCTree(X)
# Insert new point
>>> new_point = np.array([4, 4])
>>> tree.insert_point(new_point, index=10)
# Query tree for point with added noise
>>> tree.query(new_point + 1e-5)
Leaf(10)
get_bbox
(branch=None)
Compute bounding box of all points underneath a given branch.
Parameters:
Argument | Type | Description |
---|---|---|
branch |
Branch instance |
Branch to begin traversal (defaults to root) |
Returns:
Object | Type | Description |
---|---|---|
bbox |
numpy ndarray (2 x d) |
Bounding box of all points underneath branch |
Example:
# Create RCTree and compute bbox
>>> X = np.random.randn(10, 3)
>>> tree = rrcf.RCTree(X)
>>> tree.get_bbox()
array([[-0.8600458 , -1.69756215, -1.16659065],
[ 2.48455863, 1.02869042, 1.09414144]])
find_duplicate
(point, tolerance=None)
If point is a duplicate of existing point in the tree, return the leaf containing the point, else return None.
Parameters:
Argument | Type | Description |
---|---|---|
point |
numpy ndarray (1 x d) |
Point to query in tree |
tolerance |
float | Tolerance for determining whether or not point is a duplicate |
Returns:
Object | Type | Description |
---|---|---|
duplicate |
Leaf instance or None |
If point is a duplicate, returns the leaf containing the point. If point is not a duplicate, return None. |
Example:
# Create RCTree
>>> X = np.random.randn(10, 2)
>>> tree = rrcf.RCTree(X)
# Insert new point
>>> new_point = np.array([4, 4])
>>> tree.insert_point(new_point, index=10)
# Search for duplicates
>>> tree.find_duplicate((3, 3))
>>> tree.find_duplicate((4, 4))
Leaf(10)
codisp
(leaf)
Compute collusive displacement at leaf.
Parameters:
Argument | Type | Description |
---|---|---|
leaf |
Any hashable type or Leaf instance |
Index of leaf or Leaf instance |
Returns:
Object | Type | Description |
---|---|---|
codisplacement |
float | Collusive displacement if leaf is removed |
Example:
# Create RCTree
>>> X = np.random.randn(100, 2)
>>> tree = rrcf.RCTree(X)
>>> new_point = np.array([4, 4])
>>> tree.insert_point(new_point, index=100)
# Compute collusive displacement
>>> tree.codisp(100)
31.667
disp
(leaf)
Compute displacement at leaf.
Parameters:
Argument | Type | Description |
---|---|---|
leaf |
Any hashable type or Leaf instance |
Index of leaf or Leaf instance |
Returns:
Object | Type | Description |
---|---|---|
displacement |
float | Displacement if leaf is removed |
Example:
# Create RCTree
>>> X = np.random.randn(100, 2)
>>> tree = rrcf.RCTree(X)
>>> new_point = np.array([4, 4])
>>> tree.insert_point(new_point, index=100)
# Compute displacement
>>> tree.disp(100)
12
map_leaves
(node, op=(lambda x: None), *args, **kwargs)
Traverse tree recursively, calling operation given by op on leaves
Parameters:
Argument | Type | Description |
---|---|---|
node |
Leaf or Branch instance |
Node in RCTree |
op |
function | Function to call on each leaf (defaults to no-op) |
*args |
any | Positional arguments to op |
**kwargs |
any | Keyword arguments to op |
Returns:
None
Example:
# Use map_leaves to print leaves in postorder
>>> X = np.random.randn(10, 2)
>>> tree = RCTree(X)
>>> tree.map_leaves(tree.root, op=print)
Leaf(5)
Leaf(9)
Leaf(4)
Leaf(0)
Leaf(6)
Leaf(2)
Leaf(3)
Leaf(7)
Leaf(1)
Leaf(8)
map_branches
(node, op=(lambda x: None), *args, **kwargs)
Traverse tree recursively, calling operation given by op on branches
Parameters:
Argument | Type | Description |
---|---|---|
node |
Leaf or Branch instance |
Node in RCTree |
op |
function | Function to call on each leaf (defaults to no-op) |
*args |
any | Positional arguments to op |
**kwargs |
any | Keyword arguments to op |
Returns:
None
Example:
# Use map_branches to collect all branches in a list
>>> X = np.random.randn(10, 2)
>>> tree = RCTree(X)
>>> branches = []
>>> tree.map_branches(tree.root,
op=(lambda x, stack:
stack.append(x)),
stack=branches)
>>> branches
[Branch(q=0, p=-0.53),
Branch(q=0, p=-0.35),
Branch(q=1, p=-0.67),
Branch(q=0, p=-0.15),
Branch(q=0, p=0.23),
Branch(q=1, p=0.29),
Branch(q=1, p=1.31),
Branch(q=0, p=0.62),
Branch(q=1, p=0.86)]
to_dict
()
Serializes RCTree to a nested dict that can be written to disk or sent over a network (e.g. as json).
Returns:
Object | Type | Description |
---|---|---|
obj |
dict | Nested dictionary representing all nodes in the RCTree. |
Example:
# Create RCTree
>>> X = np.random.randn(4, 3)
>>> tree = rrcf.RCTree(X)
# Write tree to dict
>>> obj = tree.to_dict()
>>> print(obj)
# Write dict to file
>>> import json
>>> with open('tree.json', 'w') as outfile:
json.dump(obj, outfile)
load_dict
(obj)
Deserializes a nested dict representing an RCTree and loads into the RCTree instance. Note that this will delete all data in the current RCTree and replace it with the loaded data.
Parameters:
Argument | Type | Description |
---|---|---|
obj |
dict | Nested dictionary representing all nodes in the RCTree. |
Returns:
None
Example:
# Load dict (see to_dict method for more info)
>>> import json
>>> with open('tree.json', 'r') as infile:
obj = json.load(infile)
# Create empty RCTree and load data
>>> tree = rrcf.RCTree()
>>> tree.load_dict(obj)
# View loaded data
>>> print(tree)
>>>
─+
├───+
│ ├──(3)
│ └───+
│ ├──(2)
│ └──(0)
└──(1)
from_dict
(obj)
Deserializes a nested dict representing an RCTree and creates a new RCTree instance from the loaded data.
Parameters:
Argument | Type | Description |
---|---|---|
obj |
dict | Nested dictionary representing all nodes in the RCTree. |
Returns:
Object | Type | Description |
---|---|---|
newinstance |
rrcf.RCTree | A new RCTree instance based on the loaded data. |
Example:
# Load dict (see to_dict method for more info)
>>> import json
>>> with open('tree.json', 'r') as infile:
obj = json.load(infile)
# Create empty RCTree and load data
>>> tree = rrcf.RCTree.from_dict(obj)
# View loaded data
>>> print(tree)
>>>
─+
├───+
│ ├──(3)
│ └───+
│ ├──(2)
│ └──(0)
└──(1)