《16910524苗云龙外文资料翻译.doc》由会员分享,可在线阅读,更多相关《16910524苗云龙外文资料翻译.doc(7页珍藏版)》请在三一办公上搜索。
1、毕业设计(论文)外文资料翻译题 目: 语音通信和语音信号处理 院系名称: 信息学院 专业班级: 电信1105班 学生姓名: 苗云龙 学 号: 201116910524 指导教师: 乔丽红 教师职称: 副教授 起止日期: 地 点: 附 件: 1.外文资料翻译译文;2.外文原文。 指导教师评语:外文选择合理,与毕业论文课题符合,并且体现了较强的专业英语水平,翻译工作较认真,准确性较高,能够表现原文的真实内容,翻译字词基本满足要求。 签名:乔丽红 2015年3月31日附件1:外文资料翻译译文语音通信和语音信号处理序言 像语音所携带的信息一样,与一个机器在常规模式下进行交流不仅是一个科技性的挑战,而且
2、还有我们对人们是如何如此不费吹灰之力进行沟通交流能力上的理解力的限制关键点在于去理解语音处理(看作是人们的沟通方式)和语音信号处理(看作是一种机制)之间的不同之处。当人们听到语音的时候, 他们会应用他们积累的语言知识与一种语言的关系来捕获信息。在这个过程中,注意到用经过很长一段时间学得的知识资源进行有选择的处理那些输入语音信号是非常有趣的,例如良好的声音单员, 声学语音学、韵律、词汇、语法、语义和语用这些知识资源,这种处理过程因人不同而不同,并且,对于任何一个个人去准确有利的表达出他或她在处理输入语音信号这个过程中是用什么原理是非常困难的。这也就使得通过写一段程序去通过机器来执行提取语音信号重
3、的信息的任务变得比较困难。应当被注意到的是,对于一种机器来说,在一个抽样序列的模式里,仅仅只有语音信号能够被提取到,而其他的一些包括在输入信号上的知识资源的鉴定以及对他们的调用都是一种科学上的挑战。这样语音信号的处理过程就是很多非常有趣的挑战之一,以至于引起了报错很多不同科学小组的好奇,包括语言学家,语言学者,心理学或声学专家,电子工程师,计算机科学家,和应用工程师。SADHANA的编辑文员会已经恰当的把这个主题认同为应当被定位为一个特殊的问题。他们已经让我采取首创的自发精神来搜集引导科学小组的观点,和这个特殊问题的论文的文章形式。我也的确非常幸运的已经能够劝说很多已经有很高成就的科学家,说服
4、他们在他们的领域内致力于这个特殊的问题,对这个特殊的额问题多做文章。这里,我表达一下我对这个特殊领域的一个简单明了的观点。 Sara Hawkins通过这篇文章恰当的阐述了声学语音信号的信息量的问题,并且解释出了语音信号在音韵学方面的特点和音素的一些标准化表达方式的长处和局限性。文章作者提出了一个可选择行的方式,被称为Firthian韵律音韵学的分析法方法,这种方法把更多的重点放在了一种表达方式的构成上面。这种分析方式表明迫使我们意识到每一个概念性的决定都是一个联系上下文和丛书特性的形式体系而已,并且因此也不可能有任何的先已决定的或者严格的序列是在一种严格串行的序列的语音处理过程的一种假设的结
5、果。在下一篇文章中,Period Bhaskara Rao强调了语音处理的另外一个重要的方面,以通过印度语言建立的关于文章声音规则的相似性的一种错误的表达方式命名,通过运用一些印刷插图,作何呈现出的是一个用印度语言给定的字母在语音学得实现上的分歧性,以及对那些语言来说,发展和完善语音心痛的重要性。接下来的三篇文章将注意力集中在声音源上,并且,特别的,会注意到声门开关的瞬间(GCI)和次数上的意义。Christophe and Nicolas 提出用最大线性振幅(LOMA)的时间尺度来检测声门开关的瞬间。时间尺度的分析执行和应用了微波转换。利用最大线性振幅法,醉着判断出像开放系数,声音振幅和激励
6、的强度这样的一些声音源的一些参数。Paavo Alku提供了一个典型的关于声门反滤波的审查方法来判断神门音量卷积和波形。这篇论文同时也讨论了参数化的方法来发展对生门激励的量化估计,同时包括潜在的声门反滤波的应用方法。在声音源分析章目中的第三篇文章是关于“新时代基础的语音信号分析方法”,在这篇文章中,Yegnanarayana 和Suryakanth 作者回顾不同的时代萃取出来的方法,并且描述所处时代是如何促进瞬时基本频率、分析伦巴第语音影响的作用,等等。作者同时也讨论了以时代为基础的分析一些可能的应用,例如语音加强方面和韵律处理方方面。语音的表示方法在很多语音应用中是一个很重要的问题。在他们关
7、于“类似听觉滤波器的组:一个对于高效率的人类语音交流平台可选择性的语音处理器”。 Ghosh et al 辩论称在人类耳朵里面的听觉滤波器存储功能对人与人之间进行高效交流来说是一个近似于依云处理器。他们运用共同的信息交流准则来设计规划最理想的滤波器组,这种滤波器组提供了在谈论着之间的来自x射线微光语音数据产生数据库的最大化的信息量。在接下来的文章中,Kawahara和Morise全面的讨论了关于高成功率语音修正工具STRAIGHT和TANDEM-STRAGHT的基本的技术基础。另一个语音信息所有效呈现出来的是通过调整光谱得到的,这种光谱描述了短时间的语音谱的光谱的包络,这种光谱包络是暂时性的,
8、动态的。Hynek Hermansky 在他的关于“从动态语音谱中识别语音信号”的论文中阐述了在自动语音识别系统(ARS)中探索修正语音频谱的姿势所作出的努力。在语音信号所呈现出的语音信号频谱中,快速傅里叶变换的相位变换是常常被忽略的一部分。Hema Murthy和Yegnanarayana提供了一个综合全面的阐述观点强调相位在语音处理过程中使用群延时函数中的重要性。他们讨论了在捕获频谱信息方面群延时函数功能的有效性,并且展示了群延时函数在语音系统中的一些应用。在语音中捕获信息的非线性模型的有效性由Sreenivasa Rao在他的论文“发展语音系统在神经网络模型中的较色”中进行从多种应用方面
9、进行阐述论证。作者用神经网络模型探索了捕获韵律特点的可能性,这种神经网络模型反过来可以运用到语音合成、语音识别、说话者识别、语言种类的识别和情感描述的一些文章课题中去。在过去的一些年里,以隐形Markov模型为基础的动态参数语音合成已经受到了普遍的欢迎,并且似乎提供了一个很好的可以选择性的行之有效的拼接技术。Simon King在他的以“一个动态参数语音合成”为主题的论文中提供了一个对于语音合成系统指导性的简介给予这种使用方法。最后三篇文章处理了关于实现实际机场对空监视雷达系统的问题。Umesh发表了关于在语音相互交流者多重性问题的演讲,同时阐述了在机场对空监视雷达内容主题上的研究成果,文章提
10、供了一种和对正常说话者的普遍的歪曲方式一样的元音标准化研究的概述,接下来的文章由Herve Boulard et al完成,文章阐述了在交互式语音处理方面的趋势。他们强调在当前趋势下的最初的原动力已经上升为动态机器转化能力。这个问题的最后一篇文章是由Steve renal在“多党派会议的动态分析”主题中提到的,这篇文章中,作者讨论了像声音,图像,和其他信号一样的多党会议捕获方面的识别,翻译方面的问题和挑战。这不仅仅是多种形式,多种党派,多种语言者的问题,还是在处理在多个参与者之间的自发的健谈的相互作用方面的挑战。特殊的问题处理者在语音处理方面的很多挑战性的问题。作为特邀的编辑者,我非常幸运的能
11、够把这些文章信息从我的朋友那里组合起来,他们已经接受我的邀请来致力于这个特殊问题的研究。我确实非常感激所以的作者以及他们所作出的努力。我也很感激下面的评论者,他们提供了对这些文章的及时的评论,者帮助了这些作者们来提高这些材料的表达能力:G V Anand,S Chandra Sekhar,Rohit SInha,Rajesh Hegde,K Sri Rama Murty,S Rmahadeva prasanna,k samudravijay,s p kishore, peri bhaskararao,douglas n honorof louis tenbosch,paavo alku 和ma
12、rk hasegawa Johnson.最后,我想感谢PROF.R NARAYANA IYENGAR,前编辑者和教授g n anand ,SADHANA的副编辑,感谢他接受在“语音交流和语音信号处理”主题上的特殊问题。附件2:外文原文(复印件)Speech Communication and Signal ProcessingFOREWORDCommunicating with a machine in a natural mode such as speech brings out not only severaltechnological challenges, but also limi
13、tations in our understanding of how people communicateso effortlessly. The key is to understand the distinction between speech processing (as is donein human communication) and speech signal processing (as is done in a machine). When peoplelisten to speech, they apply their accumulated knowledge of
14、speech in relation to a language tocapture the message. In this process, it is interesting to note that the input speech is processedselectively using the knowledge sources acquired over a period of time such as sound units,acoustic-phonetics, prosody, lexicon, syntax, semantics and pragmatics. This
15、 processing variesfrom person to person, and it is difficult for any individual to articulate the mechanism he/she isusing in processing the input speech. This makes it difficult to write a program to perform thetask of extracting message in speech by a machine. It should be noted that, for a machin
16、e, onlythe speech signal is available in the form of a sequence of samples, the rest of the mechanisminvolving identification of knowledge sources and invoking them on the input signal is a scientificchallenge. Thus speech signal processing is one of the most interesting challenges that arousescurio
17、sity among different scientific groups, such as linguists, phoneticians, (psycho)acousticians,electrical engineers, computer scientists and application engineers. The editorial board ofS ADHANA has rightly identified this topic to be addressed in a special issue. They have askedme to take the initia
18、tive to collect views of leading scientific groups, in the form of articles forthis special issue. I am indeed fortunate to have been able to persuade several highly accomplishedscientists in their field to contribute papers to this special issue. Here I present a briefoverview of this special issue
19、.The paper by Saraha Hawkins rightly questions the informativeness of the acoustic speechsignal, and explains the strengths and limitations of the standard representation of the speechsignal in the phonological features and phonemes. The author proposes an alternative approach,called Firthian prosod
20、ic analysis, which places more emphasis on the formation of an utterance.This approach suggests formalism that forces us to recognize that every perceptual decision iscontext- and task-dependent, and hence there cannot be any predetermined or rigid sequence thatis a result of the assumption that spe
21、ech processing proceeds in a strictly serial order. In the nextpaper, Peri Bhaskara Rao highlights another important aspect of speech production, namely thefalse impression of similarity of text-to-sound rule sets across Indian languages. Using severalillustrations, the author shows the divergence i
22、n the phonetic realizations of a given letter acrossIndian languages, and the importance of this for developing speech systems in these languages.The next three papers focus on the voice source analysis, and in particular, the significance ofthe glottal closure instants (GCI) or epochs. Christophe a
23、nd Nicolas propose time-scale Lines ofMaximum Amplitude (LoMA) for the detection of GCI. The time-scale analysis is implementedusing wavelet transforms. Using the LoMA the authors estimate the voice source parameterssuch as open quotient, amplitude of voicing and strength of excitation. Paavo Alku p
24、rovides acritical review of the methods of glottal inverse filtering (GIF) for estimating the glottal volume551552 B Yegnanarayanavelocity waveform. The paper also discusses the parametrization method developed for quantificationof the estimated glottal excitations, and also potential applications o
25、f the GIF method.The third paper in the voice source analysis category is on Epoch-based analysis of speechsignals in which the authors Yegnanarayana and Suryakanth review different epoch extractionmethods, and describe how epoch locations can help in the estimation of instantaneous fundamentalfrequ
26、ency, analysis of Lombard effect speech, etc. The authors also discuss severalpossible applications of the epoch-based analysis such as in speech enhancement and prosodymanipulation.Representation of speech is an important issue in many speech applications. In their paperon Auditory-like filter bank
27、: An optimal speech processor for efficient human speech communication,Ghosh et al argue that the auditory filter bank in human ear is a near-optimal speechprocessor for efficient speech communication between human beings. They use mutual informationcriterion to design the optimal filter bank that p
28、rovides maximum information on talkersarticulatory gestures derived from X-ray microbeam speech production database. In the nextpaper, Kawahara and Morise comprehensively discuss the technical foundations of the highlysuccessful speech modification tools STRAIGHT and TANDEM-STRAGHT. Another effectiv
29、erepresentation of speech information is through modulation spectrum that describes the temporaldynamics of the spectral envelope of short-time speech spectrum. Hynek Hermansky in hispaper on Speech recognition from spectral dynamics reviews the efforts to exploit the featuresof modulation spectrum
30、in automatic speech recognition (ASR).Fourier transform (FT) phase is normally ignored in the spectral representation of speechinformation. Hema Murthy and Yegnanarayana provide a comprehensive review emphasizingthe importance of phase in speech processing using group delay functions. They discuss t
31、heeffectiveness of group-delay functions in capturing the spectral information, and show severalapplications of the group delay functions in speech systems. The effectiveness of nonlinear modelsin capturing information in speech is demonstrated for various applications by SreenivasaRao in the paper
32、on Role of neural network models for developing speech systems. The authorexplores the possibility of capturing the prosody features using neural network models, which inturn could be used for text to speech synthesis, speech recognition, speaker recognition, languageidentification and characterizat
33、ion of emotion.Statistical parametric speech synthesis based on hidden Markov model has become popularover the past few years, and seems to provide a good alternative to the well-established concatenativetechniques. Simon King provides a tutorial introduction to this practical approach tospeech synt
34、hesis in his paper on An introduction to statistical parametric speech synthesis.The last three papers deal with issues on realizing practical ASR systems. Umesh addressesthe issue of interspeaker variability in speech, and reviews the studies made on this topic inthe context of ASR. The paper provi
35、des an overview of vowel normalization studies as well asuniversal warping approach to speaker normalization. The next paper by Herve Bourlard et al,addresses the trends in multilingual speech processing. They emphasize that the prime moverbehind the current trends has been the rise of statistical m
36、achine translation. The last paperof this issue is by Steve Renals on Automatic analysis of multiparty meetings, in which theauthor discusses the issues and challenges in the recognition and interpretation of multipartymeetings captured as audio, video and other signals. It is not only a multimodal,
37、 multipartyand multispeaker problem, but the challenge is in dealing with spontaneous and conversationalinteraction among a number of participants.The special issue thus deals with many challenging issues in speech processing. As guesteditor, I am very fortunate to have been able to gather this info
38、rmation from my friends whohave readily accepted my invitation to contribute papers for this special issue. I am indeedSpeech Communication and Signal Processing 553grateful to all the authors for their efforts. I am also grateful to the following reviewers who haveprovided a timely review of the pa
39、pers, which helped the authors to improve the presentationof the material: G V Anand, S Chandrasekhar, C Chandra Sekhar, Rohit Sinha, Rajesh Hegde,K Sri Rama Murty, S R Mahadeva Prasanna, K Samudravijay, S P Kishore, Peri Bhaskararao,Douglas N Honorof, Louis Ten Bosch, Paavo Alku and Mark Hasegawa J
40、ohnson. Finally, Iwould like to thank Prof. R Narayana Iyengar, former Editor and Prof. G V Anand, AssociateEditor of S ADHANA for conceiving the idea of the special issue on the topic of Speech Communicationand Signal Processing.October 2011 B YEGNANARAYANAGuest EditorInternational Institute of Information Technology,Gachibowli, Hyderabad 500 032, Indiaemail: yegnaiiit.ac.in