Mondrian Forests: Efficient Online Random Forests

Lakshminarayanan B, Roy DM, Teh YW

Ensembles of randomized decision trees, usually referred to as random
forests, are widely used for classification and regression tasks in machine
learning and statistics. Random forests achieve competitive predictive
performance and are computationally efficient to train and test, making them
excellent candidates for real-world prediction tasks. The most popular random
forest variants (such as Breiman's random forest and extremely randomized
trees) operate on batches of training data. Online methods are now in greater
demand. Existing online random forests, however, require more training data
than their batch counterpart to achieve comparable predictive performance. In
this work, we use Mondrian processes (Roy and Teh, 2009) to construct ensembles
of random decision trees we call Mondrian forests. Mondrian forests can be
grown in an incremental/online fashion and remarkably, the distribution of
online Mondrian forests is the same as that of batch Mondrian forests. Mondrian
forests achieve competitive predictive performance comparable with existing
online random forests and periodically re-trained batch random forests, while
being more than an order of magnitude faster, thus representing a better
computation vs accuracy tradeoff.

Keywords:

stat.ML

,

stat.ML

,

cs.LG