《概率图模型导论——概率论与图论相结合.ppt》由会员分享,可在线阅读,更多相关《概率图模型导论——概率论与图论相结合.ppt(30页珍藏版)》请在三一办公上搜索。
1、第十讲 概率图模型导论 Chapter 10 Introduction to Probabilistic Graphical Models,Weike Pan,and Congfu Xupanweike,Institute of Artificial Intelligence College of Computer Science,Zhejiang UniversityOctober 12,2006,浙江大学计算机学院人工智能引论课件,References,An Introduction to Probabilistic Graphical Models.Michael I.Jordan.ht
2、tp:/www.cs.berkeley.edu/jordan/graphical.html,Outline,PreparationsProbabilistic Graphical Models(PGM)Directed PGMUndirected PGMInsights of PGM,Outline,PreparationsPGM“is”a universal modelDifferent thoughts of machine learningDifferent training approachesDifferent data typesBayesian FrameworkChain ru
3、les of probability theoryConditional IndependenceProbabilistic Graphical Models(PGM)Directed PGMUndirected PGMInsights of PGM,Different thoughts of machine learning,Statistics(modeling uncertainty,detailed information)vs.Logics(modeling complexity,high level information)Unifying Logical and Statisti
4、cal AI.Pedro Domingos,University of Washington.AAAI 2006.Speech:Statistical information(Acoustic model+Language model+Affect model)+High level information(Expert/Logics),Different training approaches,Maximum Likelihood Training:MAP(Maximum a Posteriori)vs.Discriminative Training:Maximum Margin(SVM)S
5、peech:classical combination Maximum Likelihood+Discriminative Training,Different data types,Directed acyclic graph(Bayesian Networks,BN)Modeling asymmetric effects and dependencies:causal/temporal dependence(e.g.speech analysis,DNA sequence analysis)Undirected graph(Markov Random Fields,MRF)Modeling
6、 symmetric effects and dependencies:spatial dependence(e.g.image analysis),PGM“is”a universal model,To model both temporal and spatial data,by unifyingThoughts:Statistics+LogicsApproaches:Maximum Likelihood Training+Discriminative Training Further more,the directed and undirected models together pro
7、vide modeling power beyond that which could be provided by either alone.,Bayesian Framework,What we care is the conditional probability,and its is a ratio of two marginal probabilities.,A posteriori probability,Likelihood,Priori probability,Class i,Normalization factor,Observation,Problem descriptio
8、n Observation Conclusion(classification or prediction),Bayesian rule,Chain rules of probability theory,Conditional Independence,Outline,PreparationsProbabilistic Graphical Models(PGM)Directed PGMUndirected PGMInsights of PGM,PGM,Nodes represent random variables/statesThe missing arcs represent condi
9、tional independence assumptions The graph structure implies the decomposition,Directed PGM(BN),Representation,Conditional Independence,Probability Distribution,Queries,Implementation,Interpretation,Probability Distribution,Definition of Joint Probability Distribution,Check:,Representation,Graphical
10、models represent joint probability distributions more economically,using a set of“local”relationships among variables.,Conditional Independence(basic),Assert the conditional independence of a node from its ancestors,conditional on its parents.,Interpret missing edges in terms of conditional independ
11、ence,Conditional Independence(3 canonical graphs),Classical Markov chain“Past”,“present”,“future”,Common causeY“explains”all the dependencies between X and Z,Marginal Independence,Common effect Multiple,competing explanation,Conditional Independence,Conditional Independence(check),One incoming arrow
12、 and one outgoing arrow,Two outgoing arrows,Two incoming arrows,Check through reachability,Bayes ball algorithm(rules),Outline,PreparationsProbabilistic Graphical Models(PGM)Directed PGMUndirected PGMInsights of PGM,Undirected PGM(MRF),Representation,Conditional Independence,Probability Distribution
13、,Queries,Implementation,Interpretation,Probability Distribution(1),CliqueA clique of a graph is a fully-connected subset of nodes.Local functions should not be defined on domains of nodes that extend beyond the boundaries of cliques.Maximal cliquesThe maximal cliques of a graph are the cliques that
14、cannot be extended to include additional nodes without losing the probability of being fully connected.We restrict ourselves to maximal cliques without loss of generality,as it captures all possible dependencies.Potential function(local parameterization):potential function on the possible realizatio
15、ns of the maximal clique,Probability Distribution(2),Maximal cliques,Probability Distribution(3),Joint probability distribution Normalization factor,Boltzman distribution,Conditional Independence,Its a“reachability”problem in graph theory.,Representation,Outline,PreparationsProbabilistic Graphical M
16、odels(PGM)Directed PGMUndirected PGMInsights of PGM,Insights of PGM(Michael I.Jordan),Probabilistic Graphical Models are a marriage between probability theory and graph theory.A graphical model can be thought of as a probabilistic database,a machine that can answer“queries”regarding the values of se
17、ts of random variables.We build up the database in pieces,using probability theory to ensure that the pieces have a consistent overall interpretation.Probability theory also justifies the inferential machinery that allows the pieces to be put together“on the fly”to answer the queries.In principle,all“queries”of a probabilistic database can be answered if we have in hand the joint probability distribution.,Insights of PGM(data structure&algorithm),A graphical model is a natural/perfect tool for representation(数据结构)andinference(算法).,Thanks!,