音频信号处理基础篇.ppt_三一办公31ppt.com

资源描述

《音频信号处理基础篇.ppt》由会员分享，可在线阅读，更多相关《音频信号处理基础篇.ppt（44页珍藏版）》请在三一办公上搜索。

1、音频信号处理（基础篇）,参考文献,1)本领域的学科发展2)本领域的技术发展,0 开胃酒,参考文献,网络,哪些素质（能力）是重要的？,一个项目的研发过程,有什么,是什么,为什么,怎么做,英语,数学,工具,“物理”概念思路,1 入手：实验的原材料,Wav文件,例子：keep friends with.wav,偏移地址字节数数据类型内容 00H 4 char RIFF标志 04H 4 long 文件长度，File length-8,so,is data length+0 x24(File length=data length+0 x2c)08H 4 char WAVE标志 0CH 4 ch

2、ar fmt标志 10H 4 过渡字节（不定）14H 2 int 格式类别（10H为PCM形式的声音数据)16H 2 int 通道数，单声道为1，双声道为2 18H 4 long 采样率（每秒样本数）1CH 4 long 波形音频数据传送速率，其值为通道数每秒数据位数每样本的数据位数8。播放软件利用此值可以估计缓冲区的大小。,20H 2 int 数据块的调整数（按字节算的），其值为通道数每样本的数据位值8。播放软件需要一次处理多个该值大小的字节数据，以便将其值用于缓冲区的调整。22H 2 每样本的数据位数，表示每个声道中各个样本的数据位数。如果有多个声道，对每个声道而言，样本大小都一样。24H

3、 4 char 数据标记符data 28H 4 long 语音数据的长度,typedef struct char Riff4;unsigned long sizeOfFile;char WAVEfmt8;unsigned long sizeOfFmt;short int wFormatTag;short int nChannels;unsigned long nSamplesPerSec;unsigned long navgBytesPerSec;short int nBlockAlign;unsigned short nBitPerSample;char Cdata4;unsigned lo

4、ng sizeOfData;HeadOfWave;,几个说明。,*文件长度和数据长度,*关键量：采样率/声道数/量化模式/量化bit,*navgBytesPerSec和nBlockAlign的计算,*程序举例和说明,2 基本概念,采样率,量化bit,2.1 采样率,48k/44k/32k/22k/16k/11k/8kHz,两条线：44k/22k/11k 32k/16k/8k,为什么是这些值？,代表频率，32是22kHz,2.2 音频信号的带宽,文件 keep_friend_with.wav（采样率44kHz）,7kHz,22kHz,4kHz,文件 keep_friend_with_8k.w

5、av（采样率8kHz）,4kHz,上述文件很特殊。采集环境很好。,一般认为：,*语音（speech）3003400kHz，采样率8kHz,*宽带语音（wide-band speech）带宽7kHz（50-7k），采样率16kHz,*音频（audio）带宽20kHz（20-20k），采样率44.1kHz，48kHz,2.2 音频信号的带宽,采样率为什么是那些值？,Nyquist Sampling Theorem,为什么44.1kHz？,20kHz-(Nyquist)40kHz-(Rolloff from passband to stopband)44kHz-44.1kHz?,At the tim

6、e the choice was made,only recorders capable of storing such high rates were VCRs.NTSC:490 lines/frame,3 samples/line,30 frames/s=44100 samples/sPAL:588 lines/frame,3 samples/line,25 frames/s=44100 samples/s,Prof.Brian L.EvansDept.of Electrical and Computer EngineeringThe University of Texas at Aust

7、in,Listen to the sounds,keep_friends_with(44k_mono).wav,keep_friends_with(22k_mono).wav,keep_friends_with(16k_mono).wav,keep_friends_with(11k_mono).wav,keep_friends_with(8k_mono).wav,对语音信号，8kHz/11kHz 采样率是一个效果；16kHz采样率以上是一个效果。,所以，对语音信号而言，分为voice/wideband speech就可以了。,2.2 量化bits,线性量化/非线性量化,量化信噪比：6b dB。

8、,6.02b+1.76,复读机规范：声音从磁带上复读到芯片上，再用耳机听芯片上的声音时有用信号和噪声之间的幅度差，标准规定34dB。,Listen to the sounds,keep_friends_with(16k_mono).wav,keep_friends_with(16k_mono)_8b.wav,8bit线性量化的文件，明显带了背景噪声。,从经验出发，可接受的量化bit，应该是？,入手：实验的原材料,16kHz or 8kHz采样率的语音文件；,16bit or 14bit 线性量化；,44.1kHz采样率的音乐文件；,3 我常用的音频处理的工具,VC6.0,using c;,ma

9、tlab,cooledit,Matlab(Mathworks),Math.environmentSignal processing toolbox:filter-design,spectral analysis,waveform generation,linear predictionvoicebox,Matlab(Mathworks),pros:open,powerful,scripting,excellent plottingcons:poor speech community,standards,not designed for big files,其它的语音分析工具？,Goldwave

10、(audio editor)Esps Xwaves(routines+visual.)Praat(speech analysis)Wavesurfer(speech editor)Transcriber(annotation tool)OGI speech tools(routines+app.dev.)winpitch,pitchworks,phonedit.,Goldwave,self-defined as“top rated,professional digital audio editor”,Goldwave,pros:edition(good gestion of memory fo

11、r big files),many FX,noise reduction,real-time spectrum and VU meters,various formats,batch conversion,chain effects,easy interfacecons:nothing for speech(pitch,formant),windows only,no scriptingGood for file edition not for speech,Esps-Waves,Developed by Entropic+AT&T.Now publicComp.speech FAQ says

12、:Esps:comprehensive set of speech analysis/processing toolsWaves is a graphical front-end for speech processing(waveforms,spectrograms,pitch)includes a signal labeling utility,Esps waves,pros:powerful,designed for big files,cons:UNIX only(free BSD),not standard formats,requires programming skills,de

13、velopment has stopped,Praat,Developed by P.Boersma and D.Weenink at the Institute of Phonetic Sciences,University of Amsterdamgeneral purpose speech tool:edition,segmentation and labeling,prosodic manipulation,Praat,pros:designed for speech analysis(not only sound edition or spectrogram visualizatio

14、n),nice GUI,scripting,active development and community,prosodic manipulationcons:limited scripting language,native format of transcription and pitch files,WaveSurfer,Open Source tool for sound visualization and manipulationspeech/sound analysis and sound annotation/transcriptionplatform for more adv

15、anced/specialized applications:extending WaveSurfer with new custom plug-ins or embedding WaveSurfer visualization components in other applicationsRequires SnackToolKit,Transcriber,Authors:C.Barras,E.GeoffroisRelies on Snack(Tcl/tk)Good for annotationNice,simple GUINo speech analysis,OGI speech tool

16、s/CSLU Toolkit,development started in 1992 in C on Unix,at Center for Spoken Language Understanding(CSLU)at OGIIncludes:An X windows display tool(LYRE)display,edit speech signal,spectrograms,phoneme labels,and other informationa set of C library routines(LIBNSPEECH),utilities for converting file for

17、mats,filtering,Neural Network training,vector-quantizer,database utility to automate speech database related enquiriesa set of PERL Scripts which have been used mainly to automate the use of the OGI Speech Tools.MAN PagesRAD rapid application developmentpoints of entry:Package(C),script(tcl),GUI(tk)levelsfree for research use,Summary,=yes but requires some dev.,

展开阅读全文