SMILearn Tutorial 2: Parsing a Text File
From DSL
This tutorial shows how to import information form a text file containing data to a DSL_dataset object. Let us assume we want to parse text file "data.txt" which is presented below:
A B C D E ordinal discrete ordinal discrete continuous 0 0 0 true 1.1 1 0 1 false 2.1 * * 0 true 0.1 1 1 1 * 3
The first step is to create DSL_textParser object and set appropriate parameters. Our text file contains a header with names of the variables (A, B, C, ...) and explicitly defined types of these variables (ordinal, discrete and continuous). We need to inform the parser that it should expect this information in the file. The default marker of a missing data element is "*" and, therefore, we do not need set this parameter explicitly. Here is the code that creates the parser object:
DSL_textParser parser; parser.SetUseHeader(true); parser.SetTypesSpecified(true);
Once the parser object is correctly initialized, one should call Parse method to perform actual reading and interpreting the file:
if (parser.Parse("data.txt")!=DSL_OKAY)
cout << "Parsing failed!" << endl;
It is always worth testing for result of this method, as file operations are often likely to fail. Finally, we want to create a data set object that contains parsed data. The code below shows how to do that:
DSL_dataset d = parser.GetDataset();
If we decide to print the content of the data set using function PrintDataset introduced in Tutorial 1, we will obtain following information:
===================
-- variable info --
number of variables = 5
Variable 0
id: Column_0
is continuous
Missing element value: -1.#IND
Variable 1
id: Column_1
is discrete
Missing element value: -1
State names: 0 1
Variable 2
id: Column_2
is continuous
Missing element value: -1.#IND
Variable 3
id: Column_3
is discrete
Missing element value: -1
State names: d1 d2
Variable 4
id: Column_4
is continuous
Missing element value: -1.#IND
-- data records --
number of records = 4
0 0 0 0 1.1
1 0 1 1 2.1
* * 0 0 0.1
1 1 1 * 3
