|
CNum 0.2.1
CPU-optimized ML library for C++
|
Tools used for gathering and grouping datasets. More...
Classes | |
| struct | Bin |
| A bin for quantile and uniform binning. More... | |
| struct | Shelf |
| Contains bins and the ranges of values they represent. More... | |
Functions | |
| std::array< CNum::DataStructs::Matrix< double >, 2 > | get_data (std::string data_path, char seperator=',') |
| Get data from a _SV file with last column being the labels. | |
| void | PCA (std::string input_path, std::string output_path) |
| Principle component analysis. | |
| std::shared_ptr< Shelf[]> | uniform_bin (const CNum::DataStructs::Matrix< double > &data, size_t num_bins=256) |
| Uniform binning of data. | |
| std::shared_ptr< Shelf[]> | quantile_bin (const CNum::DataStructs::Matrix< double > &data, size_t num_bins=256) |
| Quantile sketch not exact quantile bins. | |
| CNum::DataStructs::Matrix< int > | apply_quantile (const CNum::DataStructs::Matrix< double > &data, std::shared_ptr< Shelf[]> shelves) |
| Construct data matrix of bin values. | |
Tools used for gathering and grouping datasets.
| CNum::DataStructs::Matrix< int > CNum::Data::apply_quantile | ( | const CNum::DataStructs::Matrix< double > & | data, |
| std::shared_ptr< Shelf[]> | shelves ) |
Construct data matrix of bin values.
| data | The dataset |
| shelves | The bins and the boundaries associated with them |
| std::array< CNum::DataStructs::Matrix< double >, 2 > CNum::Data::get_data | ( | std::string | data_path, |
| char | seperator = ',' ) |
Get data from a _SV file with last column being the labels.
| data_path | The path to the data file |
| seperator | The delimiter of the columns |
| void CNum::Data::PCA | ( | std::string | input_path, |
| std::string | output_path ) |
Principle component analysis.
Available next release
| std::shared_ptr< Shelf[]> CNum::Data::quantile_bin | ( | const CNum::DataStructs::Matrix< double > & | data, |
| size_t | num_bins = 256 ) |
Quantile sketch not exact quantile bins.
As perfect quantile binning is an expensive operation I used a quantile binning approximation
| data | The dataset |
| num_bins | The number of bins to distribute the data among |
| std::shared_ptr< Shelf[]> CNum::Data::uniform_bin | ( | const CNum::DataStructs::Matrix< double > & | data, |
| size_t | num_bins = 256 ) |
Uniform binning of data.
| data | The dataset |
| num_bins | The number of bins to distribute the data among |