CLIQUE算法的基本思路.ppt

上传人:小飞机 文档编号:5421902 上传时间:2023-07-05 格式:PPT 页数:25 大小:233KB
返回 下载 相关 举报
CLIQUE算法的基本思路.ppt_第1页
第1页 / 共25页
CLIQUE算法的基本思路.ppt_第2页
第2页 / 共25页
CLIQUE算法的基本思路.ppt_第3页
第3页 / 共25页
CLIQUE算法的基本思路.ppt_第4页
第4页 / 共25页
CLIQUE算法的基本思路.ppt_第5页
第5页 / 共25页
点击查看更多>>
资源描述

《CLIQUE算法的基本思路.ppt》由会员分享,可在线阅读,更多相关《CLIQUE算法的基本思路.ppt(25页珍藏版)》请在三一办公上搜索。

1、CLIQUE算法的基本思路,采用基于密度的算法聚类(cluster)就是一个区域,满足该区域中的点的密度大于与之相邻的区域。把数据空间分割成网格单元(unit),将落到某个单元中的点的个数当成这个单元的密度(density)。可以指定一个数值,当某个单元中的点的个数大于该数值时,我们就说这个单元格是稠密(dense)的。聚类也就定义为连通的所有的稠密单元格的集合。,基本概念,设A=A1,A2,Ad是n个域的集合,那么S=A1A2Ad就是一个d维空间,我们将A1,A2,Ad看成是S的维(属性);算法的输入是一个n维空间中的点集,设为V=v1,v2,vm,其中vi=vi1,vi2,vid。vi的第

2、j个分量vijAj;通过一个输入参数,可以将空间S的每一维分成相同的个区间,从而将整个空间分成了有限个不相交的类矩形单元(units),每一个这样的矩形单元可以描述为u1,u2,ud,其中ui=li,hi)是一个前闭后开区间;,基本概念,一个点v=v1,v2,vd落入一个单元u=u1,u2,ud中,当且仅当对于每一个ui都有li。密度阈值是另一个输入参数;,基本概念,对于S的任何子空间,例如子空间Sub=At1At2Atk,(kd,并且当ij时有titj成立),可以在该子空间中定义单元格,选择率等相同概念。,基本概念,一个聚类(cluster)可以定义为,在k维空间中由一些连通的稠密单元组成的

3、最大单元集;两个k维中的单元格u1,u2称为连通的(connected)当且仅当:(1)这两个单元格有一个公共的面;或者(2)u1,u2都跟另一个单元格u3连通;两个单元格u1=rt1,rt2,rtk,u2=rt1,rt2,rtk有一个公共的面是指,存在k-1个维度(不妨设这k-1维就是At1,At2,Atk-1),有rtj=rtj成立(j=1,2,k-1),并且对于第Atk维有htk=ltk,或者htk=ltk成立;,基本概念,区域(region)是指一个每一边都与坐标轴平行的类矩形。也就是说这类区域是由单元格组成的且具有规则的形状,这样一个区域就可以用区间的交的形式表示出来;区域R包含于一

4、个聚类C,当且仅当RC=R;进一步我们称这样的R是最大的(maximal)当且仅当没有一个R的超集R也包含于C;一个聚类C的最小描述是上述最大区域(maximal region)的一个集合R,R中的最大区域刚好覆盖C,集合r中的最大区域是没有冗余的,即R的任何子集都不能覆盖C;,例子,d-demensional spaceNumber of intervalsunitselectivity of a unitdensity threshold Dense unitClusterRegion maximal regionminimal description of a cluster,例子,su

5、bspace,问题描述,Given a set of data points and the input parameters,and,find clusters in all subspaces of the original data space and present a mimimal description of each cluster in the form of a DNF expression.,CLIQUE算法,Identification of subspace that contain clustersIdentification of clustersGenerati

6、on of minimal description for the clusters,第一步:识别含有聚类的子空间,A bottom-up algorithm to find dense unitsDetermines 1-dimensional dense units by making a pass over the dataHaving determined(k-1)-dimensional dense units,the candidate k-dimensional units are determined using candidate generation procedure.M

7、DL-based pruningTo decide which subspaces(and the corresponding dense units)are interesting.MDL-Minimal Description Length,candidate generation procedure,Input:Dk-1,the set of all(k-1)-dimensional dense unitOutput:a superset of the set of all k-dimensional dense unitsAlgorithm:,MDL-based pruning,Cov

8、erage of subspace sjSort the subspaces in the descending order of their coverageDivide the sorted list of subspaces into two sets:the selected set I and the pruned set PHow to arrive at the cut point,MDL-based pruning,The code length is minimized to determine the optimal cut point i,MDL-based prunin

9、g,第二步:识别聚类,Input:a set of dense units D,all in the same k-dimensional space SOutput:a partition of D into D 1,D q,such that all units in D i are connected and no two units u iD i,u jD j with ij are connected.Each such partition is a clusterMethod:depth-first search algorithmStart with some unit u in

10、 D,assign it the first cluster number,and find all the units it is connected toIf there still are units in D that have not yet been visited,find one and repeat the procedure.,depth-first search algorithm,第三步:产生最小聚类描述,Input:disjoint sets of connected k-dimensional units in the same subspace,each such

11、 set is a clusterOutput:a concise description for each clusterMethod:Covering with maximal regionsMinimal cover,Concept:Cover of a cluster,For a cluster C in a k-dimensional subspace S,a set W of regions in the same subspace S is a cover of C if every region RW is contained in C,and each unit in C i

12、s contained in at least one of the region in W.,1.Covering with maximal regions,Input:a set C of connected dense units in the same k-dimensional space SOutput:a set W of maximal region such that W is a cover of CMethod:Greedy growth algorithm,Greedy growth algorithm,Begin with an arbitrary dense uni

13、t u1 C and greedily grow a maximal region R1 that covers u1.Add R1 to WFind another unit u2 C that is not yet covered by any of maximal region in W.greedily grow a maximal region R2 that covers u2.Add R2 to WRepeat this procedure until all units in C are covered by some maximal region in R,Obtain a

14、maximal region covering a dense unit u,Start with u and grow it along dimension a1,as much as possible in both directions(to the left and to the right of the unit),using connected dense units contained in CGrow this region along a2Repeated for all the dimensions,yielding a maximal region covering u,

15、2.minimal cover,Input:a cover for each clusterOutput:a minimal cover(minimality is defined in terms of the number of maximal regions required to cover the cluster)Method:Remove from the cover the smallest(in number of units)maximal region which is redundantRepeat the procedure until no maximal regio

16、n can be removed.,算法小结,第1步:根据delta的值将原数据表的每一维划分成相等的区间;将每一维上区间的定义保存到“Interval_Define”表中;第2步:n=1;这时所有单元都为候选稠密单元;第3步:扫描原数据表,找出n维子空间中落在每个候选稠密单元的数据点数;第4步:根据select thresh的值找出n维子空间中的稠密单元;第5步:用MDL-based算法修剪子空间;第6步:由n维子空间中的稠密单元集求出n+1维子空间中的侯选稠密单元集,若n+1维子空间中的侯选稠密单元集不为空,跳转第3步第6步:用depth-first-search algorithm找出n维空间中的聚类;第7步:用greedy growth algorithm求覆盖每个聚类的最大区域集;第8步:求出每个聚类的最小覆盖;第9步:将聚类信息保存到“Minning_Result_XB”表中。,

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 生活休闲 > 在线阅读


备案号:宁ICP备20000045号-2

经营许可证:宁B2-20210002

宁公网安备 64010402000987号