Optimization of EM Algorithm and Its Application to Spike.doc

资源描述

《Optimization of EM Algorithm and Its Application to Spike.doc》由会员分享，可在线阅读，更多相关《Optimization of EM Algorithm and Its Application to Spike.doc（12页珍藏版）》请在三一办公上搜索。

1、Optimization of EM Algorithm and Its Application to SpikeClassificationYin Haibing, Hu Dewen, Liu Yadong, Li Ming, Wan YuchengDepartment of Automatic Control, College of Mechatronics and Automation, National University of Defense Technology,Changsha (410073)E-mail: dwhuAbstractRecently, several meth

2、ods have been developed for automatic spike classification, including theExpectation Maximization (EM) clustering based on multivariate t-distribution mixture models. However, studies showed that EM algorithm has linearity convergence, and in terms of spike classification, this method could not be u

3、sed practically for the large time expending. In this study, we introduced an optimized EM algorithm for spike classification with multivariate t-distribution finite mixture models. Our algorithm optimizes the EM iterative algorithm with classic ascent gradient in high dimension characteristic space

4、 of spikes, thus the convergence becomes better than its original linearity. The application of our optimized EM algorithm in real spike data showed a faster and more robust performance in the convergence, and better capability of the spike classification.Keywords: Spike Classification, EM Algorithm

5、, Ascent Gradient1IntroductionOver last three decades, much development has been achieved in recording and analysis of neuron activities 12, and many of them concentrate on the area of spike sorting, which is the key way to explore the neuron and its activity 3. Scanning the thesis in the nearest de

6、cade, we could find that, there were much trouble with the neuroscientist, and one of the primary problems is spike sorting, including the issue of detection and classification of action potentials, decomposing overlaps and assigning spike shapes 2. It is known that, there are plenty of difficulties

7、 in these spike sorting areas. And the whys are thought mainly lie beneath these areas: threshold detection, types of detection errors, and ubiquitous misclassification error due to overlaps 3. Recent years, several methods were developed for spike classification, namely template matching 45, wavele

8、t transform 6, independent component analysis (ICA) 78, and finite mixture models with Expectation Maximum (EM) algorithm 9. Considered that the spike sorting could be thought as a likelihood estimation problem of statistics, and the powerful application of EM algorithm in statistical frames, in thi

9、s study, our research concentrates on the EM algorithm clustering with multivariate finite mixture models.The EM algorithm, which is firstly put forward by Dempster A.P. et al 10, has been applied broadly in statistical analysis area. Recently, based on EM algorithm and its extensions, the multivari

10、ate mixture model has many applications in clustering analysis and spike sorting 911-13. Classification methods based on EM algorithm have lots of merit, first of them is its monotone for converge to a local maximum. However, the convergence rate seems to be the primary drawbacks of EM algorithms, a

11、nd many researchers devoted their hard work for improvement of the algorithm convergence rate 14-16. Some of them concentrated the EM algorithm with Gaussian mixture models, and they had several harvest in improving the convergence rate 14. And, in the terms of spike classification, recent researche

12、s suggested that the multivariate t-distribution provides more accurate description of the spike statistics than multivariate Gaussian models 917. Thus, we need an investigation on the convergence improvement of primal EM algorithm when the multivariate t-distribution mixture model was applied to re

13、place of the Gaussian distribution.We optimized the original EM algorithm by using the classic ascent gradient with multivariate t-distribution models, and applied our algorithm to real spike data. The remainder of this study was organized as follows: In the second section, we briefly reviewed the o

14、riginal EM algorithm with multivariate t-distribution mixture models firstly, and then established our optimized EM algorithm.- 12 -After that, we gave a theoretical proof of the efficiency for our optimized EM algorithm. The simulation experiment was done to demonstrate the validity of our algorith

15、m. In the third section, we applied our method to real spikes data. The fourth and the fifth section are Discussion and Conclusion respectively.2TheoryIn spike classification problems, we could consider that they came from many particular neurons, and that the number of the neurons was unknown. Our

16、aim of classification is to obtain that which spikes train are the same kind, or which spikes train come from the same neuron, and that what is the probability of the arbitrary spike come from the same neuron.2.1 The statistical characteristic of spikeIn terms of statistics, every spike waveform cou

17、ld be thought as a point in high dimension space, thus, the spikes distribution could be characterized with multivariate probability density function. Where, the sample number of each spike could be considered as the dimension of multivariate space.That is, we have N spikes, and each spike has m sam

18、ple points, in suppose that these spikes come from g neurons along the whole experiment time, and the spikes from particular neuron could be described via a high multivariate probability density function. We used m dimensional vector xj to represent asampled spike waveform or a vector of features fr

19、om jth neuron ( 1 j g ). We could assume thateach neuron affords a proportion j (0 j 1) of the N spikes, and that the distribution of spikes fromneuron j has the parameter of j . Thus, the probability of the each spike was 3911:gP( xi ) = j p( xi | j )j =1(1)Fortunately, recent evidences have shown

20、that the t-distribution mixture models shows better than normal ones in the examination of Mahalanobis distribution 917. In this study, we used themultivariate t-distribution for our spike distribution, so we could set j = j , j , v , where j is themean value of the spikes belong to jth component, j

21、 is the covariance of the spikes belong to jthcomponent, and v is the degree of freedom (DOF) of the multivariate t-distribution. Then we had the following probabilistic model 9:N N gP( x) = p( xi ) = j p( xi | j , j , v)(2)wherei =1i =1j =1 v + mv + m ( 2 ) ( xi , j , j ) 2p( xi | j , j , v) =vm 1

22、1 +v( )( v) 2 22 (3)j j j j j ( x, , ) = ( x )T 1 ( x )(4)and is the Gamma function, j 0withg j = 1 , m is the dimension of spike data sets xi. Herej =1and also elsewhere in this study, the superscript T denotes the transpose of a vector or a matrix.2.2 The EM algorithmLLConsiderate the probabilisti

23、c model of equation (1), the best-fitting parameters 1, , g ,1, , g couldbe obtained by maximizing the model likelihood, or the model log likelihood. Thanks to the essence ofEM algorithm, we could get the analytic result of the maximization. For convenience of computing,instead of equation (2), we u

24、se the following log likelihood function as follows:N gL = log P( x | ) = log j p( xi | j , j , v)(5)i =1j =1and this function could be calculated via the following iterative EM algorithm:Expectation Step (E-step): updatezij (the membership of spike i to unit j),Pij (the probability of spikei to uni

25、t j), anduij (the weight indicating typicality of spike i to unit j) 11: j Pijzij =gl Pill =1( |( k 1) ,( k 1) ,( k 1) )(6)Pij = P xi j j v(7)Maximum Step (M-step): updatem + v( k 1)i j juij = ( x , ( k 1) , ( k 1) ) + v( k 1)j ( k ) (the proportions of unit j),j ( k ) (the mean of unit j),(8)j( k )

26、 (thecovariance of unit j), and v (estimation the degree of freedom: DOF) 911:( k ) n ziji =1 j =n(9)n = zij uij xi( k ) i =1 jn zij uiji =1(10)n( k ) ( k ) T zij uij ( xi j )( xi j ) =( k ) i =1 jn ziji =1(11)v( k ) :z ( m + v) log(2) ulog( v) 1( v) 0n g ( k 1) ( k ) ( k ) ij +( k 1) ij + =i =1 j =

27、12 ( xi , j , j ) + v22v( k ) =2+ 0.0416 1 + erf 0.6594 * log(2.1971) (12)ny + log y 1y + log y 1 i j j n y = g zijm + v( k 1)( ) + log(2) u i =1 j =1 2 ( x , ( k 1) , ( k 1) ) + v( k 1)ijSeeing about the formulas above, we found that the iterate calculation may be boring when we hadlarge classifica

28、tion data. For reducing the time consuming, we optimized the calculation in next section.2.3 The optimized EM algorithmAs soon as the EM algorithm appeared, for computation efficiency and hardware utilization, plenty of people have studied its convergence rate with the hope of improving its rate. Wi

29、th their works, a great improvement has been achieved, such as asymptotic convergence rate of the EM algorithm for Gaussian mixtures 15, and improving the linear convergence to hyper-linear with gradient ascent 14, et al. Elicitation from the gradient ascent with the EM algorithm in Gaussian normal

30、mixture models, we optimized the computation algorithm by establishing the gradient ascent with the primal EM algorithm. In the following of this section, we will present our optimization and proof of the efficiency for optimized EM algorithm with multivariate t-distribution in spike classification.

31、At each iterate in EM algorithm, we optimized the M-step as following:AA( k +1) A( k ) = P( k ) L(13)j jA A = A( k ) ( k +1) ( k ) = P( k ) L(14)jj j j = ( k ) Vec ( k +1) Vec ( k ) = P( k ) L(15)where j j j Vec j = ( k ) ( k )1 ( k ) ,( k ) ,( k ) j j( k ) ( k ) T PA =diag 1n( k ) 2 L g A A(16)P( k

32、 ) =jjn(17) zij uiji =1j jP( k ) =2jn ( k ) ( k ) (18) ziji =11 2 gand vector A denotes the mixing proportions , ,L, T, j indexes the mixture components(j=1,2,g), k denotes the iteration number, “VecB” denotes the vector obtained by stacking thecolumn vectors of the matrix B, and “ ” denotes the Kro

33、necker product. Moreover, given the( k ) constraints j 0andg( k ) j =1 j = 1 ,( k ) PAwasapositivedefinitematrixandthematrices P( k ) and P( k ) were positive definite with probability one for n sufficiently large. However, as it j jdidnt have apparent computing expression, the DOF used its original

34、 expression in computing as equation (12) 9. And the following of this section is our proof of the optimized algorithm theoretically.Considering the proof, we found that, the equation (13) and equation (14) are the same as Gaussian models 14, and here, we showed the last one as follows. Firstly, the

35、 gradient of the log likelihoodcould be calculated as: f ( xi | j ) Ln j n j f ( xi | j )1 T 1 g gj ij i j i j jj= 1u ( x )( x ) 1 1 ji =1 f ( x | )i =1 f ( x | ) 22nj =1j i j m =1m i m =z 1 1 u ( x )( x )T 1 1 (19)i =1ij 2j ij i j i j j j 11and we know that, in the original EM algorithm mentioned i

36、n Peel et al. 11, and then we could updatethe covariance of unit j like this 11:nT zij uij ( xi j )( xi j )( k +1) = ( k ) + i =1 ( k )j j n j ziji =1nz u ( k ) ( x )( x )T ( k ) 11 ij ij j i j i j j = ( k ) + ( k ) i =1 ( k ) ( k ) ( k ) ( k ) ( k ) ( k ) j j n ziji =1j j j j j j= ( k ) +2 ( k ) L

37、( k ) (20)jni =1jjzij( k ) jand, as we know that: Vec AXB = (BT A) Vec X , so we could get the matrix ofP( k ) :jj jP( k ) =2jn( k ) ( k ) (21) ziji =1Furthermore, for an arbitrary matrix U, we have jnVec U T P( k ) Vec U =2Vec U T ( ) Vec U =2 tr( U U T ) = ziji =12tr(U )T (U )n n ziji =1 ziji =12

38、Vec U T Vec U 0(22)n ziji =1Thus, we could say that, the matrix P( k ) was positive define as probability one. With the classicjoptimization theory, we know that the updated covariance was legal along the ascent of gradient.3ResultsWe applied our optimized EM algorithm in simulation data and real sp

39、ike data. For the convenience of comparison, we applied the primal EM algorithm with the identical data in each figure or table.3.1 SimulationThe simulation data were generated by the following way: we had four kinds of planar t-distribution and each component had the same probability of 0.25, and t

40、heir mean centers were (-4,0), (0,-4), (+4,0),(0,+4) respectively, and the DOF was 10, moreover, the correlation of the data were as follows: 10.7 10.71 = 3 = 0.7 1 2 = 4 = 0.71 The original simulation data was illustrated in Fig. 1. The result of classification with EM algorithm and our optimized algorithm, each data set has N=1000 samples, the ellipses indicated 2 areas, thus we had 95% of the identical class data in the ellipses of each component of all the data, and the label “+” means the center position of each kind, and the later figures in this study were the same meaning. We s

展开阅读全文