SMILearn Tutorial 3: Discretization

From DSL

Jump to: navigation, search

This tutorial shows how to discretize (convert from continuous to discrete) a variable. Discretization algorithms are implemented in the class DSL_discretizer. This class expects an object of class DSL_dataset as its input. Hence, first we create a continuous variable:

  DSL_dataset d;
  vector<float> x;
  x.push_back(12.3f);
  x.push_back(11.3f);
  x.push_back(2.1f);
  x.push_back(1.3f);
  x.push_back(6.3f);
  x.push_back(0.3f);
  x.push_back(7.3f);
  x.push_back(9.3f);
  
  d.AddFloatVar("",&x);
  PrintDataset(d);
  cout << endl;

Now we can create a DSL_discretizer object that takes as input a std::vector<DSL_dataElement>. It is important to note, that by default it assumes elements of type float. The code below creates a DSL_discretizer object and performs discretization procedure. We discretize the input data in 3 bins using MethodType hierarchical.

  DSL_discretizer disc(d.GetVariableData(0));
  disc.Discretize(3,DSL_discretizer::MethodType::Hierarch);

The result of the discretization procedure can be obtained by taking bin edges:

  cout << "Bin edges: " << endl;
  vector<double> be = disc.GetBinEdges();
  unsigned i;
  for (i=0;i<be.size();i++)
  cout << be[i] << " ";
  cout << endl << endl;

or by obtaining bin assignments for the input data:

  cout << "Discretized values: " << endl;
  vector<int> result;
  result = disc.GetDiscretized();
  for (i=0;i<result.size();i++)
  cout << result[i] << " ";
  cout << endl;

The frame below present the output of the code presented above:

   ===================
  -- variable info --
  number of variables = 1
  Variable 0
        id:
        is continuous
        Missing element value: -1.#IND
  -- data records --
  number of records = 8
  12.3
  11.3
  2.1
  1.3
  6.3
  0.3
  7.3
  9.3
  
  
  Bin edges:
  0.3 4.2 10.3 12.3
  
  Discretized values:
  2 2 0 0 1 0 1 1
Personal tools