济南大学计算智能实验室陈月辉15499.ppt

资源描述

《济南大学计算智能实验室陈月辉15499.ppt》由会员分享，可在线阅读，更多相关《济南大学计算智能实验室陈月辉15499.ppt（140页珍藏版）》请在三一办公上搜索。

1、1,分布估计算法,济南大学计算智能实验室陈月辉,http:/,2023/2/12,2,思想,遗传算法中的交叉、变异等操作有可能破坏已经优化好的个体。为了避免这种现象，一种新的演化算法分布估计算法应运而生。分布估计算法中没有交叉和变异。主要用到是好的个体的一种概率模型，以及根据此模型抽样产生下一代。,3,GA to EDA,4,基于种群的增强式学习,Population based Incremental Learning(PBIL,Baluja,1994),Populations based search,such as GACreate a probability vector by

2、counting the number of 1s and 0s in each gene positionGenerate new population using the probability vectorNo information is carried from generation to generation!Supervised Competitive learning,e.g.LVQWinner-take-all reinforcement learning in ANNWinner is a kind of prototype of the sample presentedP

3、BIL=GA+CLCapture the trend from the best performer,5,Basic PBIL,P initialize probability vector(each position=0.5)while(generations+limit)for each vector i dofor each position j dogenerate Vi(j)according to P(j)end-doevaluate f(Vi)end-doVmax=max(f(Vi)update P according to Vmaxif random(0,1 Pmutatemu

4、tate Pend-ifend-while,6,Details,Population replaced by probability vectorP=p1,p2,ppi:Probability of 1 in the ith bitGenerate n individualsUpdate P using the best individualpi(t+1)=xi+pi(t)(1-),i=1,2,Mutate P:pi(t+1)=mU0,1)+pi(t+1)(1-m),7,PBIL Example,t=0,P=0.5,0.5,0.5,0.5Generate 5 individuals1010,1

5、100,0100,0111,0001Fitness:2,2,1,3,1Best individual:0111;=0.1Update Pp1=0.5*(1-0.1)=0.45p2=p3=p4=0.1*1+0.5*(1-0.1)=0.55,8,Some applications,Function optimizationJob-shop schedulingTSPBin-packingKnapsack ProblemNeural Network weight training,9,分布估计算法：总的框架,Estimation of Distribution Algorithms do just

6、that!Typically they operate as follows:Step 0:Randomly generate a set of individuals(t=0)Step 1:Evaluate the individuals While(not done)Step 2:Select individuals(where)to be parentsDevelop a probability distribution/density function,pt,based on the parentsStep 3:Create offspring using ptStep 4:Evalu

7、ate the offspringStep 5:The offspring replace the parents(t=t+1)Step 6:Goto While,10,Flowchart,11,What Models to use?,Start with Probability vector for binary stringsGaussian distributionLaterDependency tree models(COMIT)Bayesian Network,12,Probability Vector PMBGAs,13,分布估计算法：概率向量,The EDA is known a

8、s the Univariate Marginal Distribution AlgorithmLets try to solve the following problemf(x)=x2,where-2.0 x 2.0,Let l=7,therefore our mapping function will bed(2,-2,7,c)=4*decode(c)/127-2,14,EDA：概率向量,Randomly Generate an Initial Population Genotype Phenotype FitnessPerson 1:1001010 0.331Fit:?Person 2

9、:0100101-0.835Fit:?Person 3:1101010 1.339Fit:?Person 4:0110110-0.300 Fit:?Person 5:1001111 0.488Fit:?Person 6:0001101-1.591Fit:?,15,EDA：概率向量,Evaluate Population at t=0 Genotype Phenotype FitnessPerson 1:1001010 0.331Fit:0.109Person 2:0100101-0.835Fit:0.697Person 3:1101010 1.339Fit:1.790Person 4:0110

10、110-0.300 Fit:0.090Person 5:1001111 0.488Fit:0.238Person 6:0001101-1.591Fit:2.531,16,EDA：概率向量,Select the best 3(truncation selection)individuals to be parents.Genotype Phenotype FitnessPerson 1:1001010 0.331Fit:0.109Person 2:0100101-0.835Fit:0.697Person 3:1101010 1.339Fit:1.790Person 4:0110110-0.300

11、 Fit:0.090Person 5:1001111 0.488Fit:0.238Person 6:0001101-1.591Fit:2.531,17,EDA：概率向量,Construct a joint probability distribution function,p0,given the three parents.Genotype Phenotype FitnessPerson 2:0100101-0.835Fit:0.697Person 3:1101010 1.339Fit:1.790Person 6:0001101-1.591Fit:2.531Since our genotyp

12、e only has two values,we only need to be concerned with the probability the value of a gene is 1.The probability that a value of a gene is 0 is 1 minus the probability that it is a 1.,18,EDA：概率向量,Construct a joint probability distribution function,p0,given the three parents.Genotype Phenotype Fitnes

14、,0,1,0,1,0,1,1,1,1,20,分布估计算法：高斯分布,Evaluate the Offspring Genotype Phenotype FitnessChild 1:0001010-1.685Fit:2.839Child 2:1101101 1.433Fit:2.054Child 3:0101101-0.583Fit:0.340Child 4:0001011-1.654Fit:2.736Child 5:1100110 1.213Fit:1.470Child 6:0100101-0.835Fit:0.697,21,EDA：高斯分布,Replace the parents with

15、 the Offspring Genotype Phenotype FitnessPerson 2:0100101-0.835 Fit:0.697Person 3:1101010 1.339 Fit:1.790Person 6:0001101-1.591 Fit:2.531 Genotype Phenotype FitnessChild 1:0001010-1.685 Fit:2.839Child 2:1101101 1.433 Fit:2.054Child 3:0101101-0.583 Fit:0.340Child 4:0001011-1.654 Fit:2.736Child 5:1100

16、110 1.213 Fit:1.470Child 6:0100101-0.835 Fit:0.697,22,EDA：高斯分布,Select the best 3 for Parent1 Genotype Phenotype FitnessPerson 1:0001010-1.685Fit:2.839Person 2:1101101 1.433Fit:2.054Person 3:0101101-0.583Fit:0.340Person 4:0001011-1.654Fit:2.736Person 5:1100110 1.213Fit:1.470Person 6:0100101-0.835Fit:

17、0.697,23,高斯分布,Estimation of Distribution Algorithms:An Example Run(by hand)Randomly Generate an Initial Population Phenotype FitnessPerson 1:0.331 Fit:?Person 2:-0.835 Fit:?Person 3:1.339 Fit:?Person 4:-0.300 Fit:?Person 5:0.488 Fit:?Person 6:-1.591 Fit:?,24,高斯分布,Evaluate Initial Population Phenotyp

18、e FitnessPerson 1:0.331 Fit:0.110Person 2:-0.835 Fit:0.697Person 3:1.339 Fit:1.793Person 4:-0.300 Fit:0.900Person 5:0.488 Fit:0.238Person 6:-1.591 Fit:2.531,25,高斯分布,Select best 3 of 6 Phenotype FitnessPerson 1:0.331 Fit:0.110Person 2:-0.835 Fit:0.697Person 3:1.339 Fit:1.793Person 4:-0.300 Fit:0.900P

19、erson 5:0.488 Fit:0.238Person 6:-1.591 Fit:2.531,26,高斯分布,Calculate joint density function(Gaussian)Phenotype FitnessPerson 3:1.339 Fit:1.793Person 4:-0.300 Fit:0.900Person 6:-1.591 Fit:2.531 xavg=-0.184,=1.199,27,高斯分布,Create new population of 6 Phenotype FitnessChild 1:1.015 Fit:1.030Child 2:-1.383

20、Fit:1.913Child 3:-0.784 Fit:0.615Child 4:0.416 Fit:0.173Child 5:0.116 Fit:0.013Child 6:-1.284 Fit:1.649,28,高斯分布,Replace Parents with Children;select best 3 of 6 Phenotype FitnessPerson 1:1.015 Fit:1.030Person 2:-1.383 Fit:1.913Person 3:-0.784 Fit:0.615Person 4:0.416 Fit:0.173Person 5:0.116 Fit:0.013

21、Person 6:-1.284 Fit:1.649,29,高斯分布,Calculate new joint density function Phenotype FitnessPerson 1:1.015 Fit:1.030Person 2:-1.383 Fit:1.913Person 6:-1.284 Fit:1.649 xavg=-0.551;=1.11,30,Whats Next?,Use tree model(COMIT)Cluster bits into groups(Extended compact GA)Use Bayesian Network(BOA),31,Beyond si

22、ngle bits:COMIT,32,How to learn a Tree Model?,33,Prims Algorithms,Start with a graph with no edgesAdd arbitrary node to the treeIterate-Hang a new node to the current tree-Prefer addition of edges with large mutual information(greedy approach)Complexity:O(n2),34,Variants of PMBGAs with Tree Models,3

23、5,Beyond Pairwise Dependencies:ECGA,36,Learning the Model in ECGA,37,How to Compute Model Quality?,38,最小描述长度（MDL）基本概念,最小描述长度准则解释一组数据的最好理论，应该使得下面两项之和最小：描述理论所需要的比特长度；在理论的协助下，对数据编码所需要的比特长度。（最小描述长度也称为给定数据的随机复杂性)ECGA中的最小描述长度准则我们应该寻求这样一种合理且较小的结构，使得训练样本的大多数数据符合这个结构，把样本中不符合的数据作为例外编码，使得下面两项最小：编码群组结构所需的比特，它

24、代表了猜想；编码例外实例所需要的比特。,39,Sampling Model in ECGA,Sample groups of bits at a time Based on observed probabilities/proportions But can also apply population based crossover similar to uniform but w.r.t.model.,40,Whats Next?,We sawProbability vector(no edges)Tree models(some edges)Marginal product models(

25、groups of variables)NextBayesian networks can represent all above and more,41,Joint Probability Distribution(JPD),Solution as a set of random variablesJoint probability Distribution(JPD)Exponential to the number of variables,therefore not feasible to calculate in most casesNeeds Simplification!,42,F

26、actorisation of JPD,Univariate model:No interaction:Simplest modelBivariate model:Pair-wise interaction Multivariate Model:interaction of more than two variables,43,Typical estimation and sampling of JPD in EDAs,Learn the interaction between variables in the solutionLearn the probabilities associate

27、d with interacting variablesThis specifies the JPD:p(x)Sample the JPD(i.e.learned probabilities),44,Probabilistic Graphical Models,Efficient tool to represent the factorisation of JPDMarriage between probability theory and Graph theoryConsist of Two componentsStructureParametersTwo types of PGMDirec

28、ted PGM(Bayesian Networks)Undirected PGM(Markov Random Field),45,Directed PGM(Bayesian networks),Structure:Directed Acyclic Graph(DAG)Independence relationship:A variable is conditionally independent of rest of the variables given its parentsParameters:Conditional probabilities,46,Bayesian networks,

29、The factorisation of JPD encoded in terms of conditional probabilities isJPD for BN,47,Estimating a Bayesian network,Estimate structure Estimate parametersThis completely specifies the JPDJPD can then be Sampled,48,BN based EDAs,Initialise parent solutionsSelect a set from parent solutionsEstimate a

30、 BN from selected setEstimate structure Estimate parametersSample BN to generate new populationReplace parents with new set and go to 2 until termination criteria satisfies,49,How to estimate and sample BN in EDAs,Estimating structureScore+Search techniques Conditional independence test Estimating p

31、arametersTrivial in EDAs:Dataset is completeEstimate probabilities of parents before childSamplingProbabilistic Logical Sampling(Sample parents before child),50,BN based EDAs,Well established approach in EDAsBOA,EBNA,LFDA,MIMIC,COMIT,BMDAReferencesLarraiaga and Lozano 2002Pelikan 2002,51,贝叶斯网络,全联合概率

32、计算复杂性十分巨大朴素贝叶斯太过简单现实需要一种自然、有效的方式来捕捉和推理不确定性知识变量之间的独立性和条件独立性可大大减少为了定义全联合概率分布所需的概率数目,52,贝叶斯网络的定义,是一个有向无环图(DAG)随机变量集组成网络节点，变量可离散或连续一个连接节点对的有向边或箭头集合每节点Xi都有一个条件概率分布表：P(Xi|Parents(Xi)，量化其父节点对该节点的影响,53,贝叶斯网络的别名,信念网(Belief Network)概率网络(Probability Network)因果网络(Causal Network)知识图(Knowledge Map)图模型(Graphical M

33、odel)或概率图模型(PGM)决策网络(Decision Network)影响图(Influence Diagram),54,独立和条件独立,Weather和其它3个变量相互独立给定Cavity后，Toothache和Catch条件独立,Weather,Cavity,Catch,Toothache,55,Independency,56,Joint probability distribution,Joint probability distribution of(X,Y,Z):x,y,z,57,Marginal distribution:p(x),p(y),p(z),58,Second-o

34、rder joint probability functions,59,Conditional probability functions,60,Conditional probability functions,61,Conditional probability functions,62,贝叶斯网络的语义,贝叶斯网络的两种含义对联合概率分布的表示构造网络对条件依赖性语句集合的编码设计推理过程贝叶斯网络的语义P(x1,.,xn)=P(x1|parent(x1).P(xn|parent(xn),63,贝叶斯网络示例,Burglary,Earthquake,MaryCalls,JohnCal

35、ls,Alarm,64,贝叶斯网络的语义公式计算示例,试计算：报警器响了，但既没有盗贼闯入，也没有发生地震，同时John和Mary都给你打电话的概率。解：P(j,m,a,b,e)=P(j|a)P(m|a)P(a|b,e)P(b)P(e)=0.90.70.0010.9990.998=0.00062=0.062%,65,Learning Bayes Nets From Data,What is a Bayesian Network?,Directed acyclic graphNodes are variables(discrete or continuous)Arcs indicate depe

36、ndence between variables.Conditional Probabilities(local distributions)Missing arcs implies conditional independenceIndependencies+local distributions=modular specification of a joint distribution,66,Why Bayesian Networks?,Expressive languageFinite mixture models,Factor analysis,HMM,Kalman filter,In

37、tuitive languageCan utilize causal knowledge in constructing modelsDomain experts comfortable building a networkGeneral purpose“inference”algorithmsP(Bad Battery|Has Gas,Wont Start)Exact:Modular specification leads to large computational efficienciesApproximate:“Loopy”belief propagation,67,Overview,

38、Learning Probabilities(local distributions)Introduction to Bayesian statistics:Learning a probabilityLearning probabilities in a Bayes netLearning Bayes-net structureBayesian model selection/averagingApplications,68,极大似然估计,极大似然思想有两个射手，一人的命中率为0.9,另一人的命中率为0.1,现在他们中的一个向目标射击了一发，结果命中了，估计是谁射击的？,一般说，事件A发生

39、的概率与参数有关，取值不同，则P(A)也不同。因而应记事件A发生的概率为P(A|).若A发生了，则认为此时的值应是在中使P(A|)达到最大的那一个。这就是极大似然思想,69,极大似然估计,似然函数与极大似然估计,为该总体的似然函数。,定义：若有,使得,则称为的极大似然估计.记为,70,求极大似然估计的步骤,(1)做似然函数,(2)做对数似然函数,(3)列似然方程,令,若该方程有解，则其解就是,71,例子1,设X1，Xn为取自参数为的泊松分布总体的样本，求的极大似然估计.,令,72,例子2,设X1，Xn为取自总体的样本，求参数的极大似然估计,令,73,例子2,为的极大似然估计.,74,估

40、计量的评价标准,估计量的评价标准：无偏性，有效性，一致性无偏性：E()=有效性：D()小，更有效一致性：样本数趋于无穷时，依概率趋于：,75,贝叶斯估计-最大后验概率,用一组样本集K=x1,x2,xN估计未知参数未知参数视为随机变量，先验分布为 p()，而在已知样本集K出现的条件下的后验概率为p(|K)最大后验概率估计-Maximum a posteriori(MAP),76,贝叶斯(最小风险)估计,参数估计的条件风险：给定x条件下，估计量的期望损失,参数估计的风险：估计量的条件风险的期望,贝叶斯估计：使风险最小的估计,77,贝叶斯(最小风险)估计,损失函数定义为误差平方：,定理:如果定义损失

41、函数为误差平方函数，则有：,78,贝叶斯估计的步骤,确定的先验分布 p()由样本集K=x1,x2,xN求出样本联合分布：p(K|)计算的后验分布计算贝叶斯估计,79,一元正态分布例解,总体分布密度为：,均值未知，的先验分布为：,用贝叶斯估计方法求的估计量,样本集：K=x1,x2,xN,80,计算的后验分布：,计算的贝叶斯估计：,81,Learning Probabilities:Classical Approach,Simple case:Flipping a thumbtack(图钉),True probability q is unknown,Given iid data,estimate

42、 q using an estimator with good properties:low bias,low variance,consistent(e.g.,ML estimate)iid-independent and identically distributed(i.i.d.),82,Learning Probabilities:Bayesian Approach,True probability q is unknownBayesian probability density for q We observed M IID toss results:D=(x1,x2,xm)We w

43、ant to estimate,83,Bayesian Approach:use Bayes rule to compute a new density for q given data,prior,likelihood,posterior,84,The Likelihood,“binomial distribution”,85,Example:Application of Bayes rule to the observation of a single heads,86,The probability of heads on the next toss,How good is a part

44、icular Likelihood indicates how likely the observed data is generated.,ML-Maximum likelihood,87,Overview,Learning ProbabilitiesIntroduction to Bayesian statistics:Learning a probabilityLearning probabilities in a Bayes netLearning Bayes-net structureBayesian model selection/averagingApplications,88,

45、From thumbtacks to Bayes nets,Thumbtack problem can be viewed as learningthe probability for a very simple BN:,X,heads/tails,89,The next simplest Bayes net,90,The next simplest Bayes net,91,The next simplest Bayes net,parameter independence,92,The next simplest Bayes net,93,A bit more difficult.,Thr

46、ee probabilities to learn:qX=headsqY=heads|X=headsqY=heads|X=tails,94,A bit more difficult.,95,A bit more difficult.,96,A bit more difficult.,QX,X1,X2,QY|X=heads,Y1,Y2,case 1,case 2,QY|X=tails,heads,tails,3 separate thumbtack-like problems,97,In general,Learning probabilities in a BN is straightforw

47、ard ifLikelihoods from the exponential family(multinomial,poisson,gamma,.)Parameter independenceConjugate priorsComplete data,98,Incomplete data makes parameters dependent,QX,X1,QY|X=heads,Y1,QY|X=tails,99,Incomplete data,Incomplete data makes parameters dependentParameter Learning for incomplete da

48、taMonte-Carlo integrationInvestigate properties of the posterior and perform predictionLarge-sample Approx.(Laplace/Gaussian approx.)Expectation-maximization(EM)algorithm and inference to compute mean and variance.Variational methods,100,Overview,Learning ProbabilitiesIntroduction to Bayesian statis

49、tics:Learning a probabilityLearning probabilities in a Bayes netLearning Bayes-net structureBayesian model selection/averagingApplications,101,Difficulties,如果网络结构未知,则构造贝叶斯模型是非常困难的事情.因为如果属性的数目是n,那么可能的结构数目至少是n的指数阶.在这样巨大的搜索空间寻找出合理的网络结构是十分耗时的,必须通过一些评价标准从中进行挑选.常用的评价标准有MDL(minimum description length),BIC(

50、Bayesian information criteria)以及Be(Bayesian likelihood equivalent)等.MDL和BIC的基础都是信息论,其核心都是要构建较短的编码长度.而Be尺度适用于参数符合特定分布的情况,其中最著名的有BDe(Bayesian dirichlet likelihood equivalent)和BGe(Bayesian Gaussian likelihood equivalent).常用的搜索算法有贪心搜索(greedy search)、模拟退火(simulated annealing)、最好优先搜索(best-first search)等.H

展开阅读全文