《机器学习介绍(英文版:备注里有中文翻译)ppt课件.pptx》由会员分享,可在线阅读,更多相关《机器学习介绍(英文版:备注里有中文翻译)ppt课件.pptx(26页珍藏版)》请在三一办公上搜索。
1、Machine Learning,2022年11月11日星期五,Machine learning, as a branch of artificial intelligence, is general terms of a kind of analytical method. It mainly utilizes computer simulate or realize the learned behavior of human.,2022年11月11日星期五,2022年11月11日星期五,1)Machine learning just like a true champion which g
2、o haughtily; 2)Pattern recognition in process of decline and die out; 3)Deep learning is a brand-new and rapidly rising field.,the Google search index of three concept since 2004,2022年11月11日星期五,The constructed machine learning system based on computer mainly contains two core parts:representation an
3、d generalization.The first step for data learning is to represent the data, i.e. detect the pattern of data. Establish a generalized model of data space according to a group of known data to predict the new data.The core target of machine learning is to generalize from known experience. Generalizati
4、on means a power of which the machine learning system to be learned for known data that could predict the new data.,Supervised learningInput data has labels. The common kind of learning algorithm is classification. The model has been trained via the correspondence between feature and label of input
5、data. Therefore, when some unknown data which has features but no label input, we can predict the label of unknown data according to the existing model.,2022年11月11日星期五,Unsupervised learningInput data has no labels. It relates to another learning algorithm, i.e. clustering. The basic definition is a
6、course that divide the gather of physical or abstract object into multiple class which consist of similar objects.,2022年11月11日星期五,If the output eigenvector marks come from a limited set that consist of class or name variable, then the kind of machine learning belongs to classification problem. If ou
7、tput mark is a continuous variable, then the kind of machine learning belongs to regression problem.,2022年11月11日星期五,Classification step,Feature extraction,Feature selection,Model training,Classification and prediction,Raw data,New data,2022年11月11日星期五,Feature selection( feature reduction ),Curse of D
8、imensionality:Usually refer to the problem that concerned about computation of vector. With the increase of dimension, calculated amount will jump exponentially.Cortical features of different brain regions exhibit variant effect during the classification process and may exist some redundant feature.
9、 In particular after the multimodal fusion, the increase of feature dimension will cause “curse of Dimensionality”.,2022年11月11日星期五,Principal Component Analysis, PCAPCA is the most common linear dimension reduction method. Its target is mapping the data of high dimension to low-dimension space via ce
10、rtain linear projection, and expect the variance of data that project the corresponding dimension is maximum. It can use fewer data dimension meanwhile retain the major characteristic of raw data.,2022年11月11日星期五,Linear discriminant analysis, LDAThe basic idea of LDA is projection, mapping the N dime
11、nsion data to low-dimension space and separate the between-groups as soon as possible. i.e. the optimal separability in the space. The benchmark is the new subspace has maximum between class distance and minimal inter-object distance.,2022年11月11日星期五,Independent component analysis, ICAThe basic idea
12、of ICA is to extract the independence signal from a group of mixed observed signal or use independence signal to represent other signal.,2022年11月11日星期五,Recursive feature elimination algorithm, RFERFE is a greedy algorithm that wipe off insignificance feature step by step to select the feature. First
13、ly, cyclic ordering the feature according to the weight of sub-feature in classification and remove the feature which rank at terminal one by one. Then, according to the final feature ordering list, select different dimension of several feature subset front to back. Assess the classification effect
14、of different feature subset and then get the optimal feature subset.,2022年11月11日星期五,Classification algorithm,Decision treeDecision tree is a tree structure. Each nonleaf node expresses the test of a feature property and each branch expresses the output of feature property in certain range and each l
15、eaf node stores a class. The decision-making course of decision tree is starting from root node, testing the corresponding feature property of waiting objects, selecting the output branch according to their values, until reaching the leaf node and take the class that leaf node store as the decision
16、result.,2022年11月11日星期五,Naive Bayes, NBNB classification algorithm is a classification method in statistics. It use probability statistics knowledge for classification. This algorithm could apply to large database and it has high classification accuracy and high speed.,2022年11月11日星期五,Artificial neura
17、l network, ANNANN is a mathematical model that apply a kind of structure which similar with synapse connection for information processing. In this model, a mass of node form a network, i.e. neural network, to reach the goal of information processing. Neural network usually need to train. The course
18、of training is network learning. The training change the link weight of network node and make it possess the function of classification. The network after training apply to recognize object.,2022年11月11日星期五,k-Nearest Neighbors, kNNkNN algorithm is a kind of classification method base on living exampl
19、e. This method is to find the nearest k training samples with unknown sample x and examine the most of k samples belong to which class, then x belongs to that class. kNN is a lazy learning method. It stores samples but proceed classification until need to classify. If sample set are relatively compl
20、ex, it may be lead to large computation overhead. So it cannot apply to strongly real-time occasion.,2022年11月11日星期五,support vector machine, SVMMapping the linearly inseparable data in low-dimension space to high-dimension space and make it linearly separable,2022年11月11日星期五,Cross validation, CV,The b
21、asic idea of CV is grouping the raw data in a sense. One part is taken as train set, the other part is taken as validation set. Primarily, the classifier is trained with train set, and then use validation set to test the received model by training.,2022年11月11日星期五,K-fold cross-validationInk-fold cros
22、s-validation, the original sample is randomly partitioned intokequal sized subsamples. Of theksubsamples, a single subsample is retained as the validation data for testing the model, and the remainingk1 subsamples are used as training data. The cross-validation process is then repeatedktimes (thefol
23、ds), with each of theksubsamples used exactly once as the validation data. Thekresults from the folds can then be averaged to produce a single estimation. The advantage of this method over repeated random sub-sampling is that all observations are used for both training and validation, and each obser
24、vation is used for validation exactly once. 10-fold cross-validation is commonly used.,2022年11月11日星期五,Leave-one-out cross-validation, LOOCVWhenk=n(the number of observations), thek-fold cross-validation is exactly the leave-one-out cross-validation.,2022年11月11日星期五,confusion matrix,TPgold standard an
25、d test affirm suffer from certain illness; TNgold standard and test affirm not suffer from certain illness; FPgold standard affirm not suffer from certain illness but test opposite; FNgold standard affirm suffer from certain illness but test opposite;,2022年11月11日星期五,Accuracy: (TP+TN)/(TP+TN+FP+FN),S
26、ensitivity: TP/(TP+FN),Specificity: TN/(FP+TN),2022年11月11日星期五,ROC与AUC,TPR=TP/(TP+FN)FPR=FP/(FP+TN),The ROC curve more approach upper left, the performance of classifier exhibits well.,2022年11月11日星期五,Area Under Curve, AUC,AUC is defined as the area under ROC curve. Obviously, its value cannot greater than 1. Take AUC as evaluation criterion because ROC cannot clearly illustrate which classifier has a better effect. But as a value, the classifier corresponding to the bigger AUC has a better effect.,2022年11月11日星期五,THANKS!,2022年11月11日星期五,