CNum 0.2.1
CPU-optimized ML library for C++
Loading...
Searching...
No Matches
CNum::Data Namespace Reference

Tools used for gathering and grouping datasets. More...

Classes

struct  Bin
 A bin for quantile and uniform binning. More...
struct  Shelf
 Contains bins and the ranges of values they represent. More...

Functions

std::array< CNum::DataStructs::Matrix< double >, 2 > get_data (std::string data_path, char seperator=',')
 Get data from a _SV file with last column being the labels.
void PCA (std::string input_path, std::string output_path)
 Principle component analysis.
std::shared_ptr< Shelf[]> uniform_bin (const CNum::DataStructs::Matrix< double > &data, size_t num_bins=256)
 Uniform binning of data.
std::shared_ptr< Shelf[]> quantile_bin (const CNum::DataStructs::Matrix< double > &data, size_t num_bins=256)
 Quantile sketch not exact quantile bins.
CNum::DataStructs::Matrix< int > apply_quantile (const CNum::DataStructs::Matrix< double > &data, std::shared_ptr< Shelf[]> shelves)
 Construct data matrix of bin values.

Detailed Description

Tools used for gathering and grouping datasets.

Function Documentation

◆ apply_quantile()

CNum::DataStructs::Matrix< int > CNum::Data::apply_quantile ( const CNum::DataStructs::Matrix< double > & data,
std::shared_ptr< Shelf[]> shelves )

Construct data matrix of bin values.

Parameters
dataThe dataset
shelvesThe bins and the boundaries associated with them
Returns
The matrix of bin values

◆ get_data()

std::array< CNum::DataStructs::Matrix< double >, 2 > CNum::Data::get_data ( std::string data_path,
char seperator = ',' )

Get data from a _SV file with last column being the labels.

Parameters
data_pathThe path to the data file
seperatorThe delimiter of the columns
Returns
The data and labels [data, labels]

◆ PCA()

void CNum::Data::PCA ( std::string input_path,
std::string output_path )

Principle component analysis.

Available next release

◆ quantile_bin()

std::shared_ptr< Shelf[]> CNum::Data::quantile_bin ( const CNum::DataStructs::Matrix< double > & data,
size_t num_bins = 256 )

Quantile sketch not exact quantile bins.

As perfect quantile binning is an expensive operation I used a quantile binning approximation

Parameters
dataThe dataset
num_binsThe number of bins to distribute the data among
Returns
The bins and the boundaries associated with them

◆ uniform_bin()

std::shared_ptr< Shelf[]> CNum::Data::uniform_bin ( const CNum::DataStructs::Matrix< double > & data,
size_t num_bins = 256 )

Uniform binning of data.

Parameters
dataThe dataset
num_binsThe number of bins to distribute the data among
Returns
The bins and the boundaries associated with them