CNum 0.2.1
CPU-optimized ML library for C++
Loading...
Searching...
No Matches
CNum::Model::Tree Namespace Reference

Tree-based models. More...

Classes

class  GBModel
 A gradient-boosting model for any child of the TreeBooster class. More...
class  TreeBooster
 A decision tree used in various gradient-boosting models as a weak learner. More...
class  TreeBoosterNode
 A node used in a TreeBooster used for gather and storing information about the decision making process. More...
struct  Histogram
 Holds the total gradients and hessians for all bins. More...
struct  DataPartition
 A data partition for the set of samples a tree node has to work with during the tree building process. More...
struct  Split
 Holds data associated with the decision making process in a TreeBoosterNode. More...
class  XGTreeBooster
 A tree booster modeled after Chen & Guestrin's XGBoost tree booster. More...

Typedefs

using json = ::nlohmann::json
using SubsampleFunction = ::std::function< void(size_t *, size_t, size_t, size_t, ::CNum::DataStructs::Matrix<double>) >
using SplitValuePair = std::pair<double, double>
using DataMatrix = std::variant< CNum::DataStructs::Matrix<int>, CNum::DataStructs::Matrix<double> >

Enumerations

enum  SplitAlg { GREEDY , HIST }
 The algorithm used for tree finding splits in tree building. More...
enum  split_dir { LEFT , RIGHT }
 Signifies the direction of a node resultant of a split in relation to its parent. More...

Variables

SubsampleFunction default_subsample
constexpr int N_BINS = 256
 Number of bins used in the Tree models.

Detailed Description

Tree-based models.

Typedef Documentation

◆ DataMatrix

◆ json

using CNum::Model::Tree::json = ::nlohmann::json

◆ SplitValuePair

using CNum::Model::Tree::SplitValuePair = std::pair<double, double>

◆ SubsampleFunction

using CNum::Model::Tree::SubsampleFunction = ::std::function< void(size_t *, size_t, size_t, size_t, ::CNum::DataStructs::Matrix<double>) >

Enumeration Type Documentation

◆ split_dir

Signifies the direction of a node resultant of a split in relation to its parent.

Enumerator
LEFT 
RIGHT 

◆ SplitAlg

The algorithm used for tree finding splits in tree building.

GREEDY is the exact greedy method (available in 0.3.0) HIST is the histogram method

Enumerator
GREEDY 
HIST 

Variable Documentation

◆ default_subsample

SubsampleFunction CNum::Model::Tree::default_subsample
inline
Initial value:
= [] (size_t *pos_ptr,
size_t low,
size_t high,
size_t n_samples,
const ::CNum::DataStructs::Matrix<double> y) -> void {
if (low == 0 && high == n_samples) {
::std::iota(pos_ptr, pos_ptr + n_samples, low);
} else {
::CNum::Utils::Rand::generate_n_unique_rand_in_range<size_t>(low, high - 1, pos_ptr, n_samples, 1);
}
}
void generate_n_unique_rand_in_range(size_t low_bound, size_t high_bound, T *out, size_t n, uint64_t logical_id=0)
Generate n unique random integers.
Definition RandUtils.h:3

◆ N_BINS

int CNum::Model::Tree::N_BINS = 256
constexpr

Number of bins used in the Tree models.

The value 256 was chosen for vgather optimizations. If there are 256 bins then the bin number fits in one byte, and we can gather more gradients and hessians associated with bin numbers in parrallel when searching for the best split.