Introduction to Spatial Data Mining.ppt

上传人:文库蛋蛋多 文档编号:2574287 上传时间:2023-02-20 格式:PPT 页数:65 大小:1.61MB
返回 下载 相关 举报
Introduction to Spatial Data Mining.ppt_第1页
第1页 / 共65页
Introduction to Spatial Data Mining.ppt_第2页
第2页 / 共65页
Introduction to Spatial Data Mining.ppt_第3页
第3页 / 共65页
Introduction to Spatial Data Mining.ppt_第4页
第4页 / 共65页
Introduction to Spatial Data Mining.ppt_第5页
第5页 / 共65页
点击查看更多>>
资源描述

《Introduction to Spatial Data Mining.ppt》由会员分享,可在线阅读,更多相关《Introduction to Spatial Data Mining.ppt(65页珍藏版)》请在三一办公上搜索。

1、Introduction to Spatial Data Mining,7.1 Pattern Discovery7.2 Motivation7.3 Classification Techniques7.4 Association Rule Discovery Techniques7.5 Clustering7.6 Outlier Detection,Learning Objectives,Learning Objectives(LO)LO1:Understand the concept of spatial data mining(SDM)Describe the concepts of p

2、atterns and SDMDescribe the motivation for SDM LO2:Learn about patterns explored by SDMLO3:Learn about techniques to find spatial patterns Focus on concepts not procedures!Mapping Sections to learning objectivesLO1-7.1LO2-7.2.4LO3-7.3-7.6,Examples of Spatial Patterns,Historic Examples(section 7.1.5,

3、pp.186)1855 Asiatic Cholera in London:A water pump identified as the sourceFluoride and healthy gums near Colorado riverTheory of Gondwanaland-continents fit like pieces of a jigsaw puzlleModern ExamplesCancer clusters to investigate environment health hazardsCrime hotspots for planning police patro

4、l routesBald eagles nest on tall trees near open waterNile virus spreading from north east USA to south and westUnusual warming of Pacific ocean(El Nino)affects weather in USA,What is a Spatial Pattern?,What is not a pattern?Random,haphazard,chance,stray,accidental,unexpected Without definite direct

5、ion,trend,rule,method,design,aim,purpose Accidental-without design,outside regular course of things Casual-absence of pre-arrangement,relatively unimportant Fortuitous-What occurs without known causeWhat is a Pattern?A frequent arrangement,configuration,composition,regularity A rule,law,method,desig

6、n,description A major direction,trend,prediction A significant surface irregularity or unevenness,What is Spatial Data Mining?,Metaphors Mining nuggets of information embedded in large databasesNuggets=interesting,useful,unexpected spatial patternsMining=looking for nuggetsNeedle in a haystackDefini

7、ng Spatial Data MiningSearch for spatial patternsNon-trivial search-as“automated”as possiblereduce human effort Interesting,useful and unexpected spatial pattern,What is Spatial Data Mining?-2,Non-trivial search for interesting and unexpected spatial pattern Non-trivial SearchLarge(e.g.exponential)s

8、earch space of plausible hypothesisExample-Figure 7.2,pp.186Ex.Asiatic cholera:causes:water,food,air,insects,;water delivery mechanisms-numerous pumps,rivers,ponds,wells,pipes,.InterestingUseful in certain application domainEx.Shutting off identified Water pump=saved human lifeUnexpectedPattern is n

9、ot common knowledge May provide a new understanding of worldEx.Water pump-Cholera connection lead to the“germ”theory,What is NOT Spatial Data Mining?,Simple Querying of Spatial Data Find neighbors of Canada given names and boundaries of all countriesFind shortest path from Boston to Houston in a fre

10、eway mapSearch space is not large(not exponential)Testing a hypothesis via a primary data analysisEx.Female chimpanzee territories are smaller than male territoriesSearch space is not large!SDM:secondary data analysis to generate multiple plausible hypothesesUninteresting or obvious patterns in spat

11、ial data Heavy rainfall in Minneapolis is correlated with heavy rainfall in St.Paul,Given that the two cities are 10 miles apart.Common knowledge:Nearby places have similar rainfallMining of non-spatial dataDiaper sales and beer sales are correlated in eveningsGPS product buyers are of 3 kinds:outdo

12、ors enthusiasts,farmers,technology enthusiasts,Why Learn about Spatial Data Mining?,Two basic reasons for new workConsideration of use in certain application domainsProvide fundamental new understandingApplication domainsScale up secondary spatial(statistical)analysis to very large datasets Describe

13、/explain locations of human settlements in last 5000 yearsFind cancer clusters to locate hazardous environments Prepare land-use maps from satellite imageryPredict habitat suitable for endangered species Find new spatial patternsFind groups of co-located geographic featuresExercise.Name 2 applicatio

14、n domains not listed above.,Why Learn about Spatial Data Mining?-2,New understanding of geographic processes for Critical questionsEx.How is the health of planet Earth?Ex.Characterize effects of human activity on environment and ecologyEx.Predict effect of El Nino on weather,and economyTraditional a

15、pproach:manually generate and test hypothesis But,spatial data is growing too fast to analyze manuallySatellite imagery,GPS tracks,sensors on highways,Number of possible geographic hypothesis too large to explore manuallyLarge number of geographic features and locations Number of interacting subsets

16、 of features grow exponentiallyEx.Find tele connections between weather events across ocean and land areasSDM may reduce the set of plausible hypothesisIdentify hypothesis supported by the dataFor further exploration using traditional statistical methods,Spatial Data Mining:Actors,Domain Expert-Iden

17、tifies SDM goals,spatial dataset,Describe domain knowledge,e.g.well-known patterns,e.g.correlatesValidation of new patternsData Mining AnalystHelps identify pattern families,SDM techniques to be usedExplain the SDM outputs to Domain ExpertJoint effortFeature selectionSelection of patterns for furthe

18、r exploration,The Data Mining Process,Fig.7.1,pp.184,Choice of Methods,2 Approaches to mining Spatial Data1.Pick spatial features;use classical DM methods2.Use novel spatial data mining techniques Possible Approach:Define the problem:capture special needsExplore data using maps,other visualizationTr

19、y reusing classical DM methods If classical DM perform poorly,try new methodsEvaluate chosen methods rigorouslyPerformance tuning as needed,Learning Objectives,Learning Objectives(LO)LO1:Understand the concept of spatial data mining(SDM)LO2:Learn about patterns explored by SDMRecognize common spatia

20、l pattern familiesUnderstand unique properties of spatial data and patternsLO3:Learn about techniques to find spatial patterns Focus on concepts not procedures!Mapping Sections to learning objectivesLO1-7.1LO2-7.2.4LO3-7.3-7.6,7.2.4 Families of SDM Patterns,Common families of spatial patterns Locati

21、on Prediction:Where will a phenomenon occur?Spatial Interaction:Which subsets of spatial phenomena interact?Hot spots:Which locations are unusual?Note:Other families of spatial patterns may be defined SDM is a growing field,which should accommodate new pattern families,7.2.4 Location Prediction,Ques

22、tion addressedWhere will a phenomenon occur?Which spatial events are predictable?How can a spatial events be predicted from other spatial events?Equations,rules,other methods,Examples:Where will an endangered bird nest?Which areas are prone to fire given maps of vegetation,draught,etc.?What should b

23、e recommended to a traveler in a given location?Exercise:List two prediction patterns.,7.2.4 Spatial Interactions,Question addressedWhich spatial events are related to each other?Which spatial phenomena depend on other phenomenon?Examples:Exercise:List two interaction patterns.,7.2.4 Hot spots,Quest

24、ion addressedIs a phenomenon spatially clustered?Which spatial entities or clusters are unusual?Which spatial entities share common characteristics?Examples:Cancer clusters CDC to launch investigationsCrime hot spots to plan police patrolsDefining unusualComparison group:neighborhood entire populati

25、on Significance:probability of being unusual is high,7.2.4 Categorizing Families of SDM Patterns,Recall spatial data model concepts from Chapter 2 Entities-Categories of distinct,identifiable,relevant things Attribute:Properties,features,or characteristics of entities Instance of an entity-individua

26、l occurrence of entitiesRelationship:interactions or connection among entities,e.g.neighbor Degree-number of participating entities Cardinality-number of instance of an entity in an instance of relationship Self-referencing-interaction among instance of a single entityInstance of a relationship-indi

27、vidual occurrence of relationships Pattern families(PF)in entity relationship models Relationships among entities,e.g.neighbor Value-based interactions among attributes,e.g.Value of Student.age is determined by Student.date-of-birth,7.2.4 Families of SDM Patterns,Common families of spatial patterns

28、Location Prediction:Determination of value of a special attribute of an entity is by values of other attributes of the same entity Spatial Interaction:N-ry interaction among subsets of entities N-ry interactions among categorical attributes of an entity Hot spots:self-referencing interaction among i

29、nstances of an entity.Note:Other families of spatial patterns may be defined SDM is a growing field,which should accommodate new pattern families,Unique Properties of Spatial Patterns,Items in a traditional data are independent of each other,whereas properties of locations in a map are often“auto-co

30、rrelated”.Traditional data deals with simple domains,e.g.numbers and symbols,whereas spatial data types are complexItems in traditional data describe discrete objects whereas spatial data is continuousFirst law of geography Tobler:Everything is related to everything,but nearby things are more relate

31、d than distant things.People with similar backgrounds tend to live in the same areaEconomies of nearby regions tend to be similarChanges in temperature occur gradually over space(and time),Example:Clusterng and Auto-correlation,Note clustering of nest sites and smooth variation of spatial attributes

32、(Figure 7.3,pp.188 includes maps of two other attributes)Also see Fig.7.4(pp.189)for distributions with no autocorrelation,Morans I:A measure of spatial autocorrelation,Given sampled over n locations.Moran I is defined as Where and W is a normalized contiguity matrix.,Fig.7.5,pp.190,Moran I-example,

33、Pixel value set in(b)and(c)are same Moran I is different.Q?Which dataset between(b)and(c)has higher spatial autocorrelation?,Figure 7.5,pp.190,Basic of Probability Calculus,Given a set of events,the probability P is a function from into 0,1 which satisfies the following two axioms and If A and B are

34、 mutually exclusive events then P(AB)=P(A)P(B)Conditional Probability:Given that an event B has occurred the conditional probability that event A will occur is P(A|B).A basic rule is P(AB)=P(A|B)P(B)=P(B|A)P(A)Bayes rule:allows inversions of probabilitiesWell known regression equationallows derivati

35、on of linear models,Learning Objectives,Learning Objectives(LO)LO1:Understand the concept of spatial data mining(SDM)LO2:Learn about patterns explored by SDMLO3:Learn about techniques to find spatial patterns Mapping SDM pattern families to techniquesclassification techniquesAssociation Rule techniq

36、uesClustering techniquesOutlier Detection techniquesFocus on concepts not procedures!Mapping Sections to learning objectivesLO1-7.1LO2-7.2.4LO3-7.3-7.6,Mapping Techniques to Spatial Pattern Families,Overview There are many techniques to find a spatial pattern familiy Choice of technique depends on f

37、eature selection,spatial data,etc.Spatial pattern families vs.Techniques Location Prediction:Classification,function determination Interaction:Correlation,Association,Colocations Hot spots:Clustering,Outlier Detection We discuss these techniques nowWith emphasis on spatial problemsEven though these

38、techniques apply to non-spatial datasets too,Given:1.Spatial Framework 2.Explanatory functions:3.A dependent class:4.A family of function mappings:Find:Classification model:Objective:maximizeclassification_accuracy Constraints:Spatial Autocorrelation exists,Nest locations,Distance to open water,Vege

39、tation durability,Water depth,Location Prediction as a classification problem,Color version of Fig.7.3,pp.188,Techniques for Location Prediction,Classical method:logistic regression,decision trees,bayesian classifierassumes learning samples are independent of each otherSpatial auto-correlation viola

40、tes this assumption!Q?What will a map look like where the properties of a pixel was independent of the properties of other pixels?(see below-Fig.7.4,pp.189)New spatial methodsSpatial auto-regression(SAR),Markov random field bayesian classifier,Spatial Autoregression Model(SAR)y=Wy+X+W models neighbo

41、rhood relationships models strength of spatial dependencies error vectorSolutions and-can be estimated using ML or Bayesian stat.e.g.,spatial econometrics package uses Bayesian approach using sampling-based Markov Chain Monte Carlo(MCMC)method.Likelihood-based estimation requires O(n3)ops.Other alte

42、rnatives divide and conquer,sparse matrix,LU decomposition,etc.,Spatial AutoRegression(SAR),Model Evaluation,Confusion matrix M for 2 class problems2 Rows:actual nest(True),actual non-nest(False)2 Columns:predicted nests(Positive),predicted non-nest(Negative)4 cells listing number of pixels in follo

43、wing groupsFigure 7.7(pp.196)Nest is correctly predictedTrue Positive(TP)Model can predict nest where there was noneFalse Positive(FP)No-nest is correctly classified-(True Negative)(TN)No-nest is predicted at a nest-(False Negative)(FN),Model evaluationcont,Outcomes of classification algorithms are

44、typically probabilitiesProbabilities are converted to class-labels by choosing a threshold level b.For example probability b is“nest”and probability b is“no-nest”TPR is the True Positive Rate,FPR is the False Positive Rate,Comparing Linear and Spatial Regression,The further the curve away from the t

45、he line TPR=FPR the betterSAR provides better predictions than regression model.(Fig.7.8,pp.197),Markov Random Field based Bayesian ClassifiersPr(li|X,Li)=Pr(X|li,Li)Pr(li|Li)/Pr(X)Pr(li|Li)can be estimated from training dataLi denotes set of labels in the neighborhood of si excluding labels at siPr

46、(X|li,Li)can be estimated using kernel functionsSolutionsstochastic relaxation GemanIterated conditional modes BesagGraph cut Boykov,MRF Bayesian Classifier,SAR can be rewritten as y=(QX)+Qwhere Q=(I-W)-1,a spatial transform.SAR assumes linear separability of classes in transformed feature spaceMRF

47、model may yields better classification accuracies than SAR,if classes are not linearly separable in transformed space.The relationship between SAR and MRF are analogous to the relationship between logistic regression and Bayesian classifiers.,Comparison(MRF-BC vs.SAR),MRF vs.SAR(Summary),Learning Ob

48、jectives,Learning Objectives(LO)LO1:Understand the concept of spatial data mining(SDM)LO2:Learn about patterns explored by SDMLO3:Learn about techniques to find spatial patterns Mapping SDM pattern families to techniquesclassification techniquesAssociation Rule techniquesClustering techniquesOutlier

49、 Detection techniquesFocus on concepts not procedures!Mapping Sections to learning objectivesLO1-7.1LO2-7.2.4LO3-7.3-7.6,Techniques for Association Mining,Classical method:Association rule given item-types and transactionsassumes spatial data can be decomposed into transactionsHowever,such decomposi

50、tion may alter spatial patternsNew spatial methodsSpatial association rulesSpatial co-locations Note:Association rule or co-location rules are fast filters to reduce the number of pairs for rigorous statistical analysis,e.g correlation analysis,cross-K-function for spatial interaction etc.Motivating

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 建筑/施工/环境 > 项目建议


备案号:宁ICP备20000045号-2

经营许可证:宁B2-20210002

宁公网安备 64010402000987号