语料库研究中的主题词分析方法及其扩展.ppt

上传人:sccc 文档编号:5642454 上传时间:2023-08-05 格式:PPT 页数:21 大小:831.54KB
返回 下载 相关 举报
语料库研究中的主题词分析方法及其扩展.ppt_第1页
第1页 / 共21页
语料库研究中的主题词分析方法及其扩展.ppt_第2页
第2页 / 共21页
语料库研究中的主题词分析方法及其扩展.ppt_第3页
第3页 / 共21页
语料库研究中的主题词分析方法及其扩展.ppt_第4页
第4页 / 共21页
语料库研究中的主题词分析方法及其扩展.ppt_第5页
第5页 / 共21页
点击查看更多>>
资源描述

《语料库研究中的主题词分析方法及其扩展.ppt》由会员分享,可在线阅读,更多相关《语料库研究中的主题词分析方法及其扩展.ppt(21页珍藏版)》请在三一办公上搜索。

1、语料库研究中的主题词分析方法及其扩展,中国外语教育研究中心 梁茂成,An extension tothe keyword approach in corpus analysis,主要内容,KeywordsApplications of corpus comparisonLimitations to the keyword approachKeywords+Demo,Keywords,Keywords:Keywords are words whose frequency is unusually high(or low)in comparison with some norm.(Scott,20

2、03),Keywords,Positive keywords:Words which occur more often than would be expected by chance in comparison with the reference corpus.,Keywords,Negative keywords:Words which occur less often than would be expected by chance in comparison with the reference corpus.,Keywords,Positive and negative keywo

3、rdsIn a corpus of business English,words such as business,profit and companies are likely to be positive keywords if the corpus is to be compared with a general corpus.,Keywords,Positive and negative keywordsIn a corpus of academic English,words such as morning,afternoon and evening are likely to be

4、 negative keywords if the corpus is to be compared with a general corpus.,Keywords,Calculating keyness(Rayson et al.2004,Oakes 1998)Chi-square,Keywords,Chi-square,Keywords,Chi-square with Yates correction,Keywords,LoglikelihoodReferences:http:/ucrel.lancs.ac.uk/llwizard.html,Keywords,Previous resear

5、ch has revealed that loglikelihood is a better measure than chi-square when comparing word frequencies in corpora.,Keywords,Ways to find keywords:Top-down:corpus-basedButtom-up:corpus-driven,Applicatons of,Comparison across usersComparison across genresComparison across timesComparison across(variet

6、ies of)languages,Applicatons of,Compiling a specialized dictionaryDetecting the topicGenre analysisContrastive Interlanguage Analysis,Limitations to,Keywords:Do keywords have to be single words?Phraseology seems more interesting!Do keywords have to be lexical words?POS tag sequences may also be inte

7、resting.Can we bring together the bottom-up approach and the top-down approach?,Limitations to,Top-down:the problem is I do not yet know what may be interesting.,Limitations to,Buttom-up:the problem is that I have been given a long list of keywords,only some of which are interesting,buried among many others which do not seem interesting at all.,Keywords+,Support multiword sequencesSupport online searchSupport POS tag sequencesSupport regex search,Demo,demo,Thank you.,

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 建筑/施工/环境 > 农业报告


备案号:宁ICP备20000045号-2

经营许可证:宁B2-20210002

宁公网安备 64010402000987号