
上传人:小飞机 文档编号:5421902 上传时间:2023-07-05 格式:PPT 页数:25 大小:233KB
返回 下载 相关 举报
第1页 / 共25页
第2页 / 共25页
第3页 / 共25页
第4页 / 共25页
第5页 / 共25页





4、个聚类C,当且仅当RC=R;进一步我们称这样的R是最大的(maximal)当且仅当没有一个R的超集R也包含于C;一个聚类C的最小描述是上述最大区域(maximal region)的一个集合R,R中的最大区域刚好覆盖C,集合r中的最大区域是没有冗余的,即R的任何子集都不能覆盖C;,例子,d-demensional spaceNumber of intervalsunitselectivity of a unitdensity threshold Dense unitClusterRegion maximal regionminimal description of a cluster,例子,su

5、bspace,问题描述,Given a set of data points and the input parameters,and,find clusters in all subspaces of the original data space and present a mimimal description of each cluster in the form of a DNF expression.,CLIQUE算法,Identification of subspace that contain clustersIdentification of clustersGenerati

6、on of minimal description for the clusters,第一步:识别含有聚类的子空间,A bottom-up algorithm to find dense unitsDetermines 1-dimensional dense units by making a pass over the dataHaving determined(k-1)-dimensional dense units,the candidate k-dimensional units are determined using candidate generation procedure.M

7、DL-based pruningTo decide which subspaces(and the corresponding dense units)are interesting.MDL-Minimal Description Length,candidate generation procedure,Input:Dk-1,the set of all(k-1)-dimensional dense unitOutput:a superset of the set of all k-dimensional dense unitsAlgorithm:,MDL-based pruning,Cov

8、erage of subspace sjSort the subspaces in the descending order of their coverageDivide the sorted list of subspaces into two sets:the selected set I and the pruned set PHow to arrive at the cut point,MDL-based pruning,The code length is minimized to determine the optimal cut point i,MDL-based prunin

9、g,第二步:识别聚类,Input:a set of dense units D,all in the same k-dimensional space SOutput:a partition of D into D 1,D q,such that all units in D i are connected and no two units u iD i,u jD j with ij are connected.Each such partition is a clusterMethod:depth-first search algorithmStart with some unit u in

10、 D,assign it the first cluster number,and find all the units it is connected toIf there still are units in D that have not yet been visited,find one and repeat the procedure.,depth-first search algorithm,第三步:产生最小聚类描述,Input:disjoint sets of connected k-dimensional units in the same subspace,each such

11、 set is a clusterOutput:a concise description for each clusterMethod:Covering with maximal regionsMinimal cover,Concept:Cover of a cluster,For a cluster C in a k-dimensional subspace S,a set W of regions in the same subspace S is a cover of C if every region RW is contained in C,and each unit in C i

12、s contained in at least one of the region in W.,1.Covering with maximal regions,Input:a set C of connected dense units in the same k-dimensional space SOutput:a set W of maximal region such that W is a cover of CMethod:Greedy growth algorithm,Greedy growth algorithm,Begin with an arbitrary dense uni

13、t u1 C and greedily grow a maximal region R1 that covers u1.Add R1 to WFind another unit u2 C that is not yet covered by any of maximal region in W.greedily grow a maximal region R2 that covers u2.Add R2 to WRepeat this procedure until all units in C are covered by some maximal region in R,Obtain a

14、maximal region covering a dense unit u,Start with u and grow it along dimension a1,as much as possible in both directions(to the left and to the right of the unit),using connected dense units contained in CGrow this region along a2Repeated for all the dimensions,yielding a maximal region covering u,

15、2.minimal cover,Input:a cover for each clusterOutput:a minimal cover(minimality is defined in terms of the number of maximal regions required to cover the cluster)Method:Remove from the cover the smallest(in number of units)maximal region which is redundantRepeat the procedure until no maximal regio

16、n can be removed.,算法小结,第1步:根据delta的值将原数据表的每一维划分成相等的区间;将每一维上区间的定义保存到“Interval_Define”表中;第2步:n=1;这时所有单元都为候选稠密单元;第3步:扫描原数据表,找出n维子空间中落在每个候选稠密单元的数据点数;第4步:根据select thresh的值找出n维子空间中的稠密单元;第5步:用MDL-based算法修剪子空间;第6步:由n维子空间中的稠密单元集求出n+1维子空间中的侯选稠密单元集,若n+1维子空间中的侯选稠密单元集不为空,跳转第3步第6步:用depth-first-search algorithm找出n维空间中的聚类;第7步:用greedy growth algorithm求覆盖每个聚类的最大区域集;第8步:求出每个聚类的最小覆盖;第9步:将聚类信息保存到“Minning_Result_XB”表中。,


当前位置:首页 > 生活休闲 > 在线阅读



宁公网安备 64010402000987号