SMILearn Tutorial 3: Discretization

From DSL
Jump to: navigation, search

This tutorial shows how to discretize (convert from continuous to discrete) a variable. Discretization algorithms are implemented in the class DSL_discretizer. This class expects an object of class DSL_dataset as its input. Hence, first we create a continuous variable:

  DSL_dataset d;
  vector<float> x;
  x.push_back(12.3f);
  x.push_back(11.3f);
  x.push_back(2.1f);
  x.push_back(1.3f);
  x.push_back(6.3f);
  x.push_back(0.3f);
  x.push_back(7.3f);
  x.push_back(9.3f);
  
  d.AddFloatVar("var-name");
  d.SetNumberOfRecords(x.size());
  unsigned y;
  
  for(y=0;y<x.size();++y)
   d.SetFloat(0,y,x[y]); 
  
  PrintDataset(d);
  cout << endl;

Now we can create a DSL_discretizer object that takes as input a std::vector<DSL_dataElement>. It is important to note, that by default it assumes elements of type float. The code below creates a DSL_discretizer object and performs discretization procedure. We discretize the input data in 3 bins using MethodType hierarchical.

 DSL_discretizer disc(d.GetFloatData(0));
 std::vector<double> be;
 disc.Discretize(DSL_discretizer::MethodType::Hierarchical,3,be);

The result of the discretization procedure can be obtained by taking bin edges:

 cout << "Bin edges: " << endl;
  
 unsigned i;
 for (i=0;i<be.size();i++)
  cout << be[i] << " ";
 cout << endl << endl;

or by obtaining bin assignments for the input data:

 cout << "Discretized values: " << endl;
 vector<int> result;
 disc.Discretize(DSL_discretizer::MethodType::Hierarchical,3,result);
  
 for (i=0;i<result.size();i++)
  cout << result[i] << " ";
 cout << endl;

The frame below present the output of the code presented above:

  ===================
  -- variable info --
  number of variables = 1
  Variable 0
  id:
  is continuous
  Missing element value: -1.#IND
  -- data records --
  number of records = 8
  12.3
  11.3
  2.1
  1.3
  6.3
  0.3
  7.3
  9.3
  Bin edges:
  0.3 4.2 10.3 12.3
  Discretized values:
  2 2 0 0 1 0 1 1

Discretization can also be done within the DSL_dataset, but this will modify the dataset.

 be.clear();
 d.Discretize(0,DSL_dataset::DiscretizeAlgorithm::Hierarchical,3,"State_",be);
 PrintDataset(d);
  
 for (i=0;i<be.size();i++)
  cout << be[i] << " ";
 cout << endl << endl;
Personal tools