资讯检索策略与技巧.ppt

上传人:牧羊曲112 文档编号:4947394 上传时间:2023-05-25 格式:PPT 页数:36 大小:225.99KB
返回 下载 相关 举报
资讯检索策略与技巧.ppt_第1页
第1页 / 共36页
资讯检索策略与技巧.ppt_第2页
第2页 / 共36页
资讯检索策略与技巧.ppt_第3页
第3页 / 共36页
资讯检索策略与技巧.ppt_第4页
第4页 / 共36页
资讯检索策略与技巧.ppt_第5页
第5页 / 共36页
点击查看更多>>
资源描述

《资讯检索策略与技巧.ppt》由会员分享,可在线阅读,更多相关《资讯检索策略与技巧.ppt(36页珍藏版)》请在三一办公上搜索。

1、1,資訊檢索策略與技巧,黃慕萱,Chap.6Harter,Chap.7,2,檢索策略v.s.檢索技巧,最早為軍方用語各家看法1979,Marcia Bates,”Information Search Tactics”Hartly如何避免找到不相關文章的方法處理找到過多或過少相關文章的可能對策Palmer指分區組合檢索和引用文獻滾雪球法Pao指布林邏輯、引用文獻及機率檢索策略檢索策略(search strategy)針對一檢索問題之通盤考量或全面性之規劃如分區組合檢索法、引用文獻滾雪球法.等檢索技巧(search heuristics)為完成特定目的所採取的行動,3,Briefsearch簡易檢

2、索,最常見的檢索方式快速簡單fast and inexpensive但常是低recall,低precision適用主題明確想瞭解資料庫製作者所使用的敘述語和索引詞彙確認書目資料已知書名、作者等,4,Building Blocks Search分區組合檢索法,亦有人稱為“block building”或“building block”檢索方式將索引問題分解成數個主題層面(facets)確定主題層面間的關係通常facets間的關係為”AND”,出現”OR”或”NOT”的情況較少找出可代表各主題層面的檢索詞彙利用布林邏輯”OR”做聯集,以求完整性使用率最高,早期參考晤談表格常依此設計,5,Build

3、ing Blocks Search Strategy-1/4,Conduct reference interviewsFormulate search objectivesHigh recallHigh precisionModerate levels of recall and precisionSelect database(s)and search systemIdentify major concepts or facets and their logical relationships with one another,6,Building Blocks Search Strateg

4、y-2/4,Identify search strings that represent the conceptsWordsFull-text phrasesPieces of wordsDescriptorsIdentifiersCodesNon-semantic bibliographic characteristics非主題相關的欄位,如資料類型、語言、年代等包括同義詞、類同義詞、狹義詞、相關詞fields to be searched,7,Building Blocks Search Strategy-3/4,For each distinct facet of the search,

5、a set of postings will be created for each search string within that facet.The sets are then combined into a single set representing that facet using Boolean ORFollowing setp#6,the facets sets themselves will be combined with Boolean AND and NOTPlan alternatives,8,Building Blocks Search Strategy-4/4

6、,Formulate the initial statements of the search in the command language of the systemLogon and put the search to the systemEvaluate the intermediate resultsIterateUse the interactive features of the system to carry out search heuristics tactics,maneuvers,strategies,tricks,devices,approaches,to try t

7、o improve search results,9,Building blocks approach,Facet A,Facet B,Term A1 ORTerm A2 OR.Term Ap,Term B1 ORTerm B2 OR.Term Bq,Fact C,Term C1 ORTerm C2 OR.Term Cr,Answer Set,Boolean combination of facets(AND,OR,NOT),10,Building Blocks search sample,Measurement of Risk Tendencies(looking for high reca

8、ll),Boolean Combination:(RISK AND MEASUREMENT)OR RISK AVERSION OR BEHAVIORAL DECISION THEORY)NOT INSURANCE,11,檢討結果重新檢索,想增加recall時find additional concepts or search terms to add to one or more facetsdelete a facet想增加precision時delete some of the more broader or more ambiguous terms in the facetsadd an

9、 additional facet to be intersected with the others,12,Successive facet strategies主題層面連續檢索法 1/3,其他名稱fewest postings first(最少筆數優先)most specific concept first(最精確概念優先)successive fractions(非以主題層面開始的連續檢索)分區v.s.主題層面分區檢索法使用所有主題層面主題層面連續檢索法設法動用最少的主題層面決定檢索問題的主題層面後,需確定其優先順序,視結果決定是否要繼續進行檢索,13,Successive facet

10、strategies-2/3,FirstFacet,SecondFacet,(optional),OtherFacet,(optional),OtherFacet,Solution Set,(optional),AND,AND,例1:“members and activities of 4-H clubs”例2:”the emotional,physical,and intellectual characteristics of children who have studied violin with the Suzuki method”,14,Successive facet strate

11、gies-3/3,適用情況當所有的主題層面以布林運算元結合,很可能產生零筆資料時當檢索問題中有一至兩個主題層面涵義相當模糊時當檢索問題具備其他非主題之檢索條件,如資料類型、語言、或出版年代等,可將此非主題檢索條件視為第一個檢索概念時當檢索者寧願忍受誤引而不願失去相關文章時當加入其他主題層面所花費的時間和金錢,可能會超越直接列印檢索結果時當相關文獻過少,檢索者願意檢視一些相關度較低的文章時,15,Pairwise Facets主題層面配對法1/3,將主題層面兩兩配對並取其交集,而後再聯集之適用情形所有主題層面都同樣重要主題層面之精確性或模糊性相差不大將所有主題層面結合會導致零筆資料注意:主題層面

12、過多時,盡量以3-4個為執行交集的基本單位,以免混淆,16,Pairwise Facets2/3,分區組合檢索,主題層面配對檢索,A AND B AND C,(A AND B)OR(A AND C)OR(B AND C),17,Pairwise Facets3/3,Facet#1,Facet#2,Facet#3,SolutionSet B,Solution Set A,Sample:A doctoral student wants a high recall bibliography prepared on the relationship between facial musculature

13、 and the physiological(autonomic)responding of emotions,e.g.,fear.,SolutionSet C,FINAL SOLUTION SET:A OR B OR C,AND,AND,AND,18,Citation Pearl Growing,引用文獻滾雪球法以high precision 為目的由100%precision(相關的文章),反推追求recall不斷從已知相關的文獻中,獲取檢索所需的descriptors、identifiers、words,重新進行檢索 適用情形資料庫無索引典或詞彙集新興學科常需重複多次檢索,不適於初學者,

14、19,Other facet strategies,Multiple Briefsearch利用不同的database,盡量取得high recallInteractive Scanningmost time-consuming and interactive如使用classification codes,natural languageImplied Concepts掌握隱含性概念,視資料庫之主題性質,選用不同詞彙例:possible health hazards from foods cooked using microwave ovens,20,Citation indexing str

15、ategies,利用引用(citing)與被引用(cited)文獻之間的關係,建構檢索策略Offer highly interdisciplinary and multidisciplinary approaches to online searching檢索策略Cited publication、Cited Author、Cocited Authors國科會人文學研究中心人文學引用文獻資料庫(THCI)http:/,21,Non-subject,fact,and multiple database searching,Non-subject searchingDocument type、ye

16、ar of publication、language、author、corporate sourcedoublelimitingFact searchingSearch for a known itemMultiple database searching注意收錄欄位和控制語言用法,22,檢索技巧(Heuristics),Language HeuristicsCommand Language,Database and File Structure HeuristicsRecall and Precision HeuristicsHeuristics for Increasing RecallH

17、euristics for Increasing PrecisionPersonal Heuristics,23,Language Heuristics1/2,當有下列情形,應使用自然語言檢索 One or more of the concepts of interest involves a subtle nuance of meaningOne or more of the concepts of interest is highly specificOne or more of the concepts is relatively new and appropriate terms in

18、 the controlled vocabulary don not existA highly comprehensive search is desired(high recall)The literature to be searched is“soft”,24,Language Heuristics2/2,當有下列情形,應使用控制詞彙檢索The concepts of interest can be expressed precisely and unambiguously in the controlled vocabularyA limited search retrieving

19、a limited number of highly pertinent items is desiredThe literature to be searched is“hard”,25,Command Language,Database and File Structure Heuristics1/2,Know the stop words used by the search systemKnow the sort order associated with the binary coding system used by the host computerKnow which fiel

20、ds are searched by default,if search fields are not explicitly specified,26,Command Language,Database and File Structure Heuristics2/2,Know the parsing rule used to index each field searched瞭解基本索引檔所包含的欄位Always question null sets注意檢索欄位所使用的索引法,如單字或片語Understand Boolean operations with the null set and

21、make use of this knowledge in reformulating search statements,27,Questions to ask in low recall1/2,Am I in the correct database?Have I overspecified the search problem?Is there anything done on the topic or problem?Is there a literature on this search problem?Have sufficient search terms been includ

22、ed to properly represent each concept of the search?,28,Questions to ask in low recall2/2,Where the proximity specifications placed on the search placed on the search terms too restrictive?Was Boolean logic used correctly?Did I make a technical error,e.g.,in spelling or command syntax?Should I be se

23、arching in natural language fields?Have all word forms of search terms bee used?Should truncation be employed?,29,Heuristics for Increasing Recall-1/2,Use additional synonyms and near synonyms combined with Boolean OR to represent search conceptsUse more generic terms in addition to specific terms t

24、o represent search conceptsUse natural language in addition to controlled vocabulary termsSearch additional subject fields,30,Heuristics for Increasing Recall-2/2,Delete AND and NOT facets form the formulationIncrease term truncationUse less restrictive proximity operators,e.g.,require that terms ap

25、pear in the same paragraph rather than the same sentenceRemove any restrictions from the formulation,e.g.,language,date of publication,type of publication,31,Questions to ask in low precision1/2,Am I in the correct database?Have I underspecified the search problem?Do I need to disambiguate a concept

26、 of the problem?Have I used Boolean logic correctly?Have I include vague or ambiguous terms,or terms that are too generic?,32,Questions to ask in low precision2/2,Should I restrict search terms to elements of a controlled vocabulary?Where the proximity specifications too loosely placed on the search

27、 terms?Are false drops resulting from concepts having an unintended relationship with one another?Has a search term been truncated too severely?,33,Heuristics for Increasing Precision-1/2,Delete near synonyms and potentially ambiguous termsUse more specific terms to represent conceptsUse controlled

28、vocabulary terms if a concept is precisely represented by them;delete controlled vocabulary terms that do not describe a concept preciselyIf multiple meaning does not appear to be a major problem,search natural language terms that represent the concepts of interest precisely,34,Heuristics for Increa

29、sing Precision-2/2,If none of the above conditions applies,search fewer subject fields,deleting fields in the approximate order;full text,abstract,title,identifier,and descriptorAdd additional facets with AND and NOTDecrease term truncationUse more restrictive proximity operatorsAdd restrictions to

30、the formulation,e.g.,by date of publication,type of publication,language,etc.,35,Personal Heuristics1/2,Be flexible;stay loose;be willing to look at a search in more than one way.Avoid rigidity in thought and action.Browse samples of retrieved citations to assess relevancy.Browse samples of retrieve

31、d citations to generate additional search terms.Be heuristic,interactive.Dont do“fast batch”searching.,36,Personal Heuristics2/2,Evaluate ones own work critically.Always be skeptical of search output.A mindless faith in controlled vocabularies is not always justified.Be critical of the adequacy of artificial languages for the representation of concepts in documents.,

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 生活休闲 > 在线阅读


备案号:宁ICP备20000045号-2

经营许可证:宁B2-20210002

宁公网安备 64010402000987号