SMILearn Tutorial 3: Discretization
From DSL
This tutorial shows how to discretize (convert from continuous to discrete) a variable. Discretization algorithms are implemented in the class DSL_discretizer. This class expects an object of class DSL_dataset as its input. Hence, first we create a continuous variable:
DSL_dataset d;
vector<float> x;
x.push_back(12.3f);
x.push_back(11.3f);
x.push_back(2.1f);
x.push_back(1.3f);
x.push_back(6.3f);
x.push_back(0.3f);
x.push_back(7.3f);
x.push_back(9.3f);
d.AddFloatVar("",&x);
PrintDataset(d);
cout << endl;
Now we can create a DSL_discretizer object that takes as input a std::vector<DSL_dataElement>. It is important to note, that by default it assumes elements of type float. The code below creates a DSL_discretizer object and performs discretization procedure. We discretize the input data in 3 bins using MethodType hierarchical.
DSL_discretizer disc(d.GetVariableData(0)); disc.Discretize(3,DSL_discretizer::MethodType::Hierarch);
The result of the discretization procedure can be obtained by taking bin edges:
cout << "Bin edges: " << endl; vector<double> be = disc.GetBinEdges(); unsigned i; for (i=0;i<be.size();i++) cout << be[i] << " "; cout << endl << endl;
or by obtaining bin assignments for the input data:
cout << "Discretized values: " << endl; vector<int> result; result = disc.GetDiscretized(); for (i=0;i<result.size();i++) cout << result[i] << " "; cout << endl;
The frame below present the output of the code presented above:
===================
-- variable info --
number of variables = 1
Variable 0
id:
is continuous
Missing element value: -1.#IND
-- data records --
number of records = 8
12.3
11.3
2.1
1.3
6.3
0.3
7.3
9.3
Bin edges:
0.3 4.2 10.3 12.3
Discretized values:
2 2 0 0 1 0 1 1
