贝叶斯软件使用指南.doc_三一办公31ppt.com

资源描述

《贝叶斯软件使用指南.doc》由会员分享，可在线阅读，更多相关《贝叶斯软件使用指南.doc（75页珍藏版）》请在三一办公上搜索。

1、A. Introduction to Bayes NetsCopyright ? 2009 Norsys Software Corp.1. What is a Bayes net? A Bayes net is a model. It reflects the states of some part of a world that is being modeled and it describes how those states are related by probabilities. The model might be of your house, or your car, your

2、body, your community, an ecosystem, a stock-market, etc. Absolutely anything can be modeled by a Bayes net. All the possible states of the model represent all the possible worlds that can exist, that is, all the possible ways that the parts or states can be configured. The car engine can be running

3、normally or giving trouble. Its tires can be inflated or flat. Your body can be sick or healthy, and so on. So where do the probabilities come in? Well, typically some states will tend to occur more frequently when other states are present. Thus, if you are sick, the chances of a runny nose are high

4、er. If it is cloudy, the chances of rain are higher, and so on. Here is a simple Bayes net that illustrates these concepts. In this simple world, let us say the weather can have three states: sunny, cloudy, or rainy, also that the grass can be wet or dry, and that the sprinkler can be on or off. Now

5、 there are some causal links in this world. If it is rainy, then it will make the grass wet directly. But if it is sunny for a long time, that too can make the grass wet, indirectly, by causing us to turn on the sprinkler. When actual probabilities are entered into this net that reflect the reality

6、of real weather, lawn, and sprinkler-use-behavior, such a net can be made to answer a number of useful questions, like, if the lawn is wet, what are the chances it was caused by rain or by the sprinkler, and if the chance of rain increases, how does that affect my having to budget time for watering

7、the lawn. Here is another simple Bayes net called Asia. It is an example which is popular for introducing Bayes nets and is from Lauritzen&Spiegelhalter88. Note, it is for example purposes only, and should not be used for real decision making.It is a simplified version of a network that could be use

8、d to diagnose patients arriving at a clinic. Each node in the network corresponds to some condition of the patient, for example, Visit to Asia indicates whether the patient recently visited Asia. The arrows (also called links) between any two nodes indicate that there are probability relationships t

9、hat are know to exist between the states of those two nodes. Thus, smoking increases the chances of getting lung cancer and of getting bronchitis. Both lung cancer and bronchitis increase the chances of getting dyspnea (shortness of breath). Both lung cancer and tuberculosis, but not usually bronchi

10、tis, can cause an abnormal lung x-ray. And so on. The direction of the link arrows roughly corresponds to causality. That is the nodes higher up in the diagram tend to influence those below rather than, or, at least, more so than the other way around. In a Bayes net, the links may form loops, but th

11、ey may not form cycles. This is not an expressive limitation; it does not limit the modeling power of these nets. It only means we must be more careful in building our nets. In the left diagram below, there are numerous loops. These are fine. In the right diagram, the addition of the link from D to

12、B creates a cycle, which is not permitted. A valid Bayes netNot a Bayes netThe key advantage of not allowing cycles it that it makes possible very fast update algorithms, since there is no way for probabilistic influence to cycle around indefinitely.To diagnose a patient, values could be entered for

13、 some of nodes when they are known. This would allow us to re-calculate the probabilities for all the other nodes. Thus if we take a chest x-ray and the x-ray is abnormal, then the chances of the patient having TB or lung-cancer rise. If we further learn that our patient visited Asia, then the chanc

14、es that they have tuberculosis would rise further, and of lung-cancer would drop (since the X-ray is now better explained by the presence of TB than of lung-cancer). We will see how this is done in a later section. Summary In this section we learned that a Bayesian network is a model, one that repre

15、sents the possible states of a world. We also learned that a Bayes net possesses probability relationships between some of the states of the world.1.1. Why are Bayes nets useful? 1.1.1 modeling reality A model is generally useful if it helps us to greater understand the world we are modeling, and if

16、 it allows us to make useful predictions about how the world will behave. It is often easier to experiment with the model as compared to reality. In the past, when scientists, engineers, and economists wanted to build probabilistic models of worlds, so that they could attempt to predict what was lik

17、ely to happen when something else happened, they would typically try to represent what is called the joint distribution. This is a table of all the probabilities of all the possible combinations of states in that world model. Such a table can become huge, since it ends up storing one probability val

18、ue for every combination of states, this is the multiplication of all the numbers of states for each node. In the Weather model above, this would be 3 x 2 x 2 = 12 probabilities. In the Asia model it would be 2 x 2 x 2 x 2 x 2 x 2 x 2 x 2 = 28 = 256 probabilities. For models of any reasonable comple

19、xity, the joint distribution can end up with millions, trillions, or unbelievably many entries. Clearly a better way is needed. Bayesian nets are one such way. Because a Bayes net only relates nodes that are probabilistically related by some sort of causal dependency, an enormous saving of computati

20、on can result. There is no need to store all possible configurations of states, all possible worlds, if you will. All that is needed to store and work with is all possible combinations of states between sets of related parent and child nodes (families of nodes, if you will). This makes for a great s

21、aving of table space and computation. (Of course, some models are still too large for todays Bayes net algorithms. But new algorithms are being developed and breakthroughs are promising. This is a hotly researched area of modern computer science.) A second reason Bayesian nets are proving so useful

22、is that they are so adaptable. You can start them off small, with limited knowledge about a domain, and grow them as you acquire new knowledge. Furthermore, when you go to apply them, you dont need complete knowledge about the instance of the world you are applying it to. You can use as much knowled

23、ge as is available and the net will do as good a job as is possible with the available knowledge. To illustrate this, let us return to our Asia net, that we saw in section 1 above. Let us suppose that you are a newly graduated medical doctor in Los Angeles, a specialist in lung diseases, and you dec

24、ide to set up a chest clinic, one that handles serious lung-related disease. From your text-book studies you know something about the rates of lung cancer, tuberculosis, and bronchitis, and their causes and symptoms, so you can setup a basic Bayes net with some of that theoretical knowledge. For exa

25、mple, lets say according to your textbooks: 30% of the US population smokes. Lung cancer can be found in about 70 people per 100,000. TB occurs in about 10 people per 100,000. Bronchitis can be found in about 800 people per 100,000. Dyspnea can be found in about 10% of people, but most of that is du

26、e to asthma and causes other than TB, lung cancer, or bronchitis. Armed with these statistics you could set up the following Bayes net: Unfortunately, this net is not very helpful to you, because it really doesnt reflect the population of people that seek help from your clinic. Most of them have bee

27、n referred by their family physicians, and so the incidences of lung disease amongst that population is much higher, you would imagine. So you really should not use the above Bayes net in your practice. You need more data. As your clinic grows and you handle hundreds of patient cases, you learn that

28、 while the text books may have described the North American situation, the reality of your clinic and its population of patients is very different. This is what your data collection efforts reveal: 50% of your patients smoke. 1% have TB. 5.5% have lung cancer. 45% have some form of mild or chronic b

29、ronchitis. You enter these new figures into your net, and now you have a practical Bayes net, one that really describes the kind of patient you typically deal with. So, let us see how we would use this net in our daily medical practice. The first thing we should note is that the above net describes

30、a new patient, one whom has just been referred to us, and for whom we have no knowledge whatsoever, other than that they are from our target population. As we acquire knowledge specific to each particular patient, the probabilities in the net will automatically adjust. This is the great beauty and p

31、ower of Bayesian inference in action. And the great strength of the Bayes net approach is that the probabilities that result at each stage of knowlege buildup are mathematically and scientifically sound. In other words, given whatever knowledge we have about our patient, then based on the best mathe

32、matical and statistical knowledge to date, the net will tell us what we can legitimately conclude. This is a very powerful tool, indeed. Take a moment to think on it. You as a doctor are not just relying on hunches, or an intuitive sense of the likelihood of illness, as you may have in the past, but

33、, rather, on a scientifically and provably accurate estimate of the likelihood of illness, one that gets more and more accurate as you gain knowledge about the particular patient, or about the particular population that the patient comes from. So, let us see how adding knowledge about a particular p

34、atient adjusts the probabilities. Let us say a woman walks in, a new patient, and we begin talking to her. She tells us that she is often short of breath (dyspnea). So, we enter that finding into our net. With Netica we shall see, this is as simple as pointing your mouse at a node and clicking on it

35、 once, whereupon a list of available states pops up, and you then click on the correct item in the list. After doing that, this is what the net looks like. Notice how the Dyspnea box is grayed, indicating that we have evidence for it being in one of its states. In this case, because our patient appe

36、ars trustworthy, we say we are 100% certain that our patient has dyspnea. It is easy with Netica to enter an uncertain finding (also called a likelihood finding), say of 90% Present, but lets keep things simple for now.Observe how with this new finding, that our patient has dyspnea, that the probabi

37、lities for all three illnesses has increased. Why is this? Well, since all those illnesses have dyspnea as a symptom, because our patient is indeed exhibiting this symptom, it only makes sense that our belief in the possible presence of those illnesses should increase. Basically, the presence of the

38、 symptom has increased our belief that she might be seriously ill. Lets look at those inferences more closely. 1. The most significant jump is for Bronchitis, from 45% to 83.4%. Why such a large jump? Well, bronchitis is far more common than cancer or TB. So, once we have evidence for serious lung i

39、llness, it becomes our most likely candidate diagnosis. 2. The chances that our patient is a smoker has now increased substantially, from 50% to 63.4%. 3. The chances that she recently visited Asia has increased very slightly: from 1% to 1.03%, which is insignificant. 4. The chances of our getting a

40、n abnormal X-Ray from our patient has also gone up marginally, from 11% to 16%. If you think about this expansion of our knowledge, it is truly quite helpful. We have only entered one finding, the presence of Dyspnea, and this knowledge has propagated or spread its influence around the net, accurate

41、ly updating all the other possible beliefs. Some of our beliefs are increased substantially, others hardly at all. And the beauty of it is that the amounts are precisely quantified. We still do not know what precisely is ailing our patient. Our current best belief is that she suffers from Bronchitis

42、 (probability of Present=83.4%). However, we would like to increase our chances of a correct diagnosis. If we stop here and diagnose her with Bronchitis and she really has Cancer, we would be a poor doctor indeed. We really need more information. So, being thorough, we run through our standard check

43、-list of questions. We ask her if she has been to Asia recently. Surprisingly, she answers yes. Now, let us see how this knowledge affects the net.Suddenly, the chances of tuberculosis has increased substantially, from 2% to 9%. Note, interestingly, that the chances of lung cancer, bronchitis, or of

44、 our patient being a smoker all have decreased. Why is this? Well, this is because the explanation of dyspnea is now more strongly explained by tuberculosis than before (although bronchitis still remains the best candidate diagnosis). And because cancer and bronchitis are now less probable, so is sm

45、oking. This phenomenon is called explaining away in Bayes net circles. It says that when you have competing possible causes for some event, and the chances of one of those causes increases, the chances of the other causes must decline since they are being explained away by the first explanation. To

46、continue with our example, suppose we ask more questions and find out that our patient is indeed a smoker. Here is the updated net.Note that our current best hypothesis still remains that the patient is suffering from Bronchitis, and not TB or lung cancer. But to be sure, we order a diagnostic X-Ray

47、 be performed. Let us say that the X-ray turns out normal. The result is:Note how this more strongly confirms Bronchitis and disconfirms TB or lung cancer.But suppose the X-ray were abnormal. The result is:Note the big difference. TB or Lung Cancer has shot up enormously in probability. Bronchitis i

48、s still the most probable of the three separate illnesses, but it is less than the combination hypothesis of TB or Lung Cancer. So, we would then decide to perform further tests, order blood tests, lung tissue biopsies, and so forth. Our current Bayes net does not cover those tests, but it would be

49、easy to extend it by simply adding extra nodes as we acquire new statistics for those diagnostic procedures. And we do not need to throw away any part of the previous net. This is another powerful feature of Bayes nets. They are easily extended (or reduced, simplified) to suit your changing needs and your changing knowledge. Summary In this section we learned that a Bayesian network is a mathematically rigo

展开阅读全文