SMILearn Tutorial 3: Discretization
This tutorial shows how to discretize (convert from continuous to discrete) a variable. Discretization algorithms are implemented in the class DSL_discretizer. This class expects an object of class DSL_dataset as its input. Hence, first we create a continuous variable:
DSL_dataset d;
vector<float> x;
x.push_back(12.3f);
x.push_back(11.3f);
x.push_back(2.1f);
x.push_back(1.3f);
x.push_back(6.3f);
x.push_back(0.3f);
x.push_back(7.3f);
x.push_back(9.3f);
d.AddFloatVar("var-name");
d.SetNumberOfRecords(x.size());
unsigned y;
for(y=0;y<x.size();++y)
d.SetFloat(0,y,x[y]);
PrintDataset(d);
cout << endl;
Now we can create a DSL_discretizer object that takes as input a std::vector<DSL_dataElement>. It is important to note, that by default it assumes elements of type float. The code below creates a DSL_discretizer object and performs discretization procedure. We discretize the input data in 3 bins using MethodType hierarchical.
DSL_discretizer disc(d.GetFloatData(0)); std::vector<double> be; disc.Discretize(DSL_discretizer::MethodType::Hierarchical,3,be);
The result of the discretization procedure can be obtained by taking bin edges:
cout << "Bin edges: " << endl; unsigned i; for (i=0;i<be.size();i++) cout << be[i] << " "; cout << endl << endl;
or by obtaining bin assignments for the input data:
cout << "Discretized values: " << endl; vector<int> result; disc.Discretize(DSL_discretizer::MethodType::Hierarchical,3,result); for (i=0;i<result.size();i++) cout << result[i] << " "; cout << endl;
The frame below present the output of the code presented above:
=================== -- variable info -- number of variables = 1 Variable 0 id: is continuous Missing element value: -1.#IND -- data records -- number of records = 8 12.3 11.3 2.1 1.3 6.3 0.3 7.3 9.3 Bin edges: 0.3 4.2 10.3 12.3 Discretized values: 2 2 0 0 1 0 1 1
Discretization can also be done within the DSL_dataset, but this will modify the dataset.
be.clear(); d.Discretize(0,DSL_dataset::DiscretizeAlgorithm::Hierarchical,3,"State_",be); PrintDataset(d); for (i=0;i<be.size();i++) cout << be[i] << " "; cout << endl << endl;