C Sharp Tutorial 1: Creating a Bayesian Network
In this tutorial we will go through the steps required to create a very simple Bayesian network. Imagine a venture capitalist that considers a risky investment in a startup company. A major source of uncertainty about her investment is the success of the company. She is aware of the fact that only around 20% of all startup companies succeed. She can reduce this uncertainty somewhat by asking expert opinion. Her expert, however, is not perfect in his forecasts. Of all startup companies that eventually succeed, he judges about 40% to be good prospects, 40% to be moderate prospects, and 20% to be poor prospects. Of all startup companies that eventually fail, he judges about 10% to be good prospects, 30% to be moderate prospects, and 60% to be poor prospects.
How can our investor use the information from the expert? What is the chance for success if the expert judges the prospects for success to be good? What if he judges them to be poor? We will create a Bayesian network that will allow us to determine the exact numerical implications of the expert's opinion on the investor's expectation of success of the venture. The Bayesian network will contain two nodes representing random variables: Success of the venture and Expert forecast.
We will go step by step explaining what each line of the code does. The full code listing can be found in the appendices.
We start by declaring an instance of a network.
Network net = new Network();
Now we are going to create a node Success. The node will be of the Cpt type (see Network class definition for details regarding node types), which means that it represents a random discrete event.
net.AddNode(Network.NodeType.Cpt, "Success");
Following the description of the problem, we need this node to have two states. Let us name them Success and Failure.
net.SetOutcomeId("Success", 0, "Success");
net.SetOutcomeId("Success", 1, "Failure");
Note that AddNode() method creates a default node having two states State0 and State1. We change the Ids of those states. We can achieve the same effect by first adding two new states using AddOutcome() method, and then deleting two default states using DeleteOutcome() method. A Cpt-typed node (and de facto not only this one) has to have at least two states so an attempt to first delete the existing states will cause an exception to be thrown.
The number and name of the outcomes belongs to the definition of the node. We can (and must) define them before any useful inference can be made on the network. Following the same procedure for the node Forecast, we create a new node and let it have three states. As described above we first add new states and then delete two existing ones. Note that the second of the default states (State1) have an index equal 1, but after the deletion of the first state (State0) it changes to 0. That is why we delete an outcome having index 0 twice.
net.AddNode(Network.NodeType.Cpt, "Forecast");
net.AddOutcome("Forecast", "Good");
net.AddOutcome("Forecast", "Moderate");
net.AddOutcome("Forecast", "Poor");
net.DeleteOutcome("Forecast", 0);
net.DeleteOutcome("Forecast", 0);
Now, according to the model, we must add an arc from Success to Forecast to represent the conditional dependence of the latter on the former.
net.AddArc("Success", "Forecast");
Now we need to fill in the distribution of the nodes. We know that the node Success has two states and no parents, so we just need two numbers that represent the probability of each of the states coming true. These two numbers are (from the problem description):
P("Success" = Success) = 0.2
P("Success" = Failure) = 0.8
Note that these two values must add up to one.
double[] aSuccessDef = {0.2, 0.8};
net.SetNodeDefinition("Success", aSuccessDef);
Note that SetDefinition() method does not perform any checking. That is why we need to set the size of the data structure to the exact value. If this is not done correctly, an attempt to change the definition of a node will raise an exception.
Now we have to fill the distribution of the node Forecast conditioned on the node Success. Definition matrix in this case, will have two dimensions: one for the states of the parent (Success) and one for the states of the child (Forecast). The probabilities we have to fill in are:
P("Forecast" = Good | "Success" = Success) = 0.4
P("Forecast" = Moderate | "Success" = Success) = 0.4
P("Forecast" = Poor | "Success" = Success) = 0.2
P("Forecast" = Good | "Success" = Failure) = 0.1
P("Forecast" = Moderate | "Success" = Failure) = 0.3
P("Forecast" = Poor | "Success" = Failure) = 0.6
The order of these probabilities is given by considering the state of the first parent of the node as the most significant (thinking of the coordinates in terms of bits) coordinate, then the second parent, then the third (and so on), and finally considering the coordinate of the node itself as the least significant one.
double[] aForecastDef = {0.4, 0.4, 0.2, 0.1, 0.3, 0.6};
net.SetNodeDefinition("Forecast", aForecastDef);
After the network has been created we make it more spatially organized and change a few visual nodes' attributes.
// Changing the nodes' spacial and visual attributes:
net.SetNodePosition("Success", 20, 20, 80, 30);
net.SetNodeBgColor("Success", Color.Tomato);
net.SetNodeTextColor("Success", Color.White);
net.SetNodeBorderColor("Success", Color.Black);
net.SetNodeBorderWidth("Success", 2);
net.SetNodePosition("Forecast", 30, 100, 60, 30);
We finally store the network in a file called "tutorial_a.xdsl" so we can retrieve it later. The format of the file will be XML.
net.WriteFile("tutorial_a.xdsl");
We have created our first Bayesian network.