uci数据集大致情况翻译.doc

资源描述

《uci数据集大致情况翻译.doc》由会员分享，可在线阅读，更多相关《uci数据集大致情况翻译.doc（27页珍藏版）》请在三一办公上搜索。

1、来源：http:/archive.ics.uci.edu/ml/datasets.html?format=&task=&att=&area=&numAtt=&numIns=&type=&sort=nameUp&view=list206 Data SetsTable ViewList View1. Abalone: Predict the age of abalone from physical measurements鲍鱼DataSet：根据物理度量，预测鲍鱼的年龄。2. Abscisic Acid Signaling Network: The objective is to determin

2、e the set of boolean rules that describe the interactions of the nodes within this plant signaling network. The dataset includes 300 separate boolean pseudodynamic simulations using an asynchronous update scheme. 目标是测定布尔值的度量集合，以描述植物的信号网路节点。该数据集包括了300个独立的布尔值形式的虚拟动态模拟值，使用了异步更新的架构。3. Acute Inflammation

3、s: The data was created by a medical expert as a data set to test the expert system, which will perform the presumptive diagnosis of two diseases of the urinary system. 急性炎症DataSet：数据来源于一位医学专家的数据集，用以检测专家系统，可以推断出泌尿系统的两种疾病的诊断结果。4. Adult: Predict whether income exceeds $50K/yr based on census data. Als

4、o known as Census Income dataset.成人DataSet：根据户口普查资料，预测收入是否能超过50000美元/年。通常也被称为“收入普查”数据集。预测收入5. Annealing: Steel annealing data退火DataSet：训练退火数据。6. Anonymous Microsoft Web Data: Log of anonymous users of ; predict areas of the web site a user visited based on data on other areas the user visited.匿名微软网络

5、数据：微软网站的匿名用户记录；通过其他的用户访问区域数据，预测用户在web站点的访问区域。预测网络轨迹7. Arcene: ARCENEs task is to distinguish cancer versus normal patterns from mass-spectrometric data. This is a two-class classification problem with continuous input variables. This dataset is one of 5 datasets of the NIPS 2003 feature selection ch

6、allenge.ArceneDataSet：该数据集的任务是根据大量的观测数据，从正常的模式中辨别出癌症。这是一个根据不断输入的变量的二级分类问题。该数据集是从NIPS2003特征选择挑战比赛中的5个数据集之一。孤立点8. Arrhythmia: Distinguish between the presence and absence of cardiac arrhythmia and classify it in one of the 16 groups.心率失常DataSet：分辨是否出现心率失常，并将结果分类进16个组之一。9. Artificial Characters: Datase

7、t artificially generated by using first order theory which describes structure of ten capital letters of English alphabet人为性状DataSet：通过使用第一次序理论（该理论可以描述出英语字母表的十个开头字母的结构），自动生成的数据集。10. Audiology (Original): Nominal audiology dataset from Baylor原始AudiologyDataSet：来自Baylor的标称型的audiology数据集。11. Audiology

8、(Standardized): Standardized version of the original audiology database标准AudiologyDataSet：原始Audiology数据集的标准化版本。12. Australian Sign Language signs: This data consists of sample of Auslan (Australian Sign Language) signs. Examples of 95 signs were collected from five signers with a total of 6650 sign

9、samples.澳大利亚标记语言标记DataSet：这些数据包括了澳大利亚标记语言标记的样本。95个实例，均来自五个标识器，其中有6650个标记样本。13. Australian Sign Language signs (High Quality): This data consists of sample of Auslan (Australian Sign Language) signs. 27 examples of each of 95 Auslan signs were captured from a native signer using high-quality position

10、 trackers澳大利亚标记语言标记DataSet高品质版：该数据集包含了Auslan标记的样本。有27个实例，它们来自95个标记，这27个实例是使用高质量位置追踪器的当地标识器捕捉出来的。14. Auto MPG: Revised from CMU StatLib library, data concerns city-cycle fuel consumption自动MPGDataSet：来自CMU StatLib实验室的精品，是与城市循环能源消耗相关的数据集。15. Automobile: From 1985 Wards Automotive Yearbook汽车DataSet：来自19

11、85的沃德自动化年鉴。16. AutoUniv: AutoUniv is an advanced data generator for classifications tasks. The aim is to reflect the nuances and heterogeneity of real data. Data can be generated in .csv, ARFF or C4.5 formats.AutoUniv是一个高级数据生成器，可以用来处理分类任务。目标是反映现实数据的微妙与不同之处。数据可以在.csv中生成，采用ARFF或者C4.5的格式。17. Bach Chora

12、les: Time-series data based on chorales; challenge is to learn generative grammar; data in Lisp基于Chorales的时间序列数据集；可以用来挑战生成性的语法；数据放在Lisp中。序列模式分析18. Badges: Badges labeled with a + or - as a function of a persons name徽章DataSet：标记了“+”或“-”的符号的标记，可以作为一个人姓名的函数表达式。19. Bag of Words: This data set contains f

13、ive text collections in the form of bags-of-words.词语包DataSet：该数据集包含了5个文本集合，每个文本集合以词语包的形式展现。20. Balance Scale: Balance scale weight & distance database天平DataSet：天平的重量和距离数据库。21. Balloons: Data previously used in cognitive psychology experiment; 4 data sets represent different conditions of an experime

14、nt气球DataSet：曾经用在认知心理学实验中的数据；4个数据集代表了一个实验中的不同条件。22. Blood Transfusion Service Center: Data taken from the Blood Transfusion Service Center in Hsin-Chu City in Taiwan - this is a classification problem. 输血服务中心DataSet：来自台湾的Hsin-CHu市的输血服务中心的数据用以解决分类问题。分类问题23. Breast Cancer: Breast Cancer Data (Restricte

15、d Access)乳腺癌DataSet：乳腺癌数据（访问限制）。24. Breast Cancer Wisconsin (Diagnostic): Diagnostic Wisconsin Breast Cancer Database乳腺癌威斯康星洲（诊断数据）DataSet：威斯康星的乳腺癌诊断数据。25. Breast Cancer Wisconsin (Original): Original Wisconsin Breast Cancer Database乳腺癌威斯康星洲（原始数据）：原始的威斯康星州乳腺癌数据库。26. Breast Cancer Wisconsin (Prognost

16、ic): Prognostic Wisconsin Breast Cancer Database乳腺癌威斯康星洲（Prognostic版）：威斯康星州乳腺癌数据库。27. Breast Tissue: Dataset with electrical impedance measurements of freshly excised tissue samples from the breast.乳腺组织DataSet：乳腺的新鲜切除组织样本的电阻度量数据集。28. CalIt2 Building People Counts: This data comes from the main door

17、of the CalIt2 building at UCI.Calt2建筑的人数：该数据集来自UCI的Calts建筑的主要大门。29. Car Evaluation: Derived from simple hierarchical decision model, this database may be useful for testing constructive induction and structure discovery methods.汽车评估DataSet：来源于简单层次决策模型，该数据集可用于测试建设性的回归，和发现结构性方法。评估模型（eg：流失/忠诚模型）30. Car

18、diotocography: The dataset consists of measurements of fetal heart rate (FHR) and uterine contraction (UC) features on cardiotocograms classified by expert obstetricians.胎儿心率DataSet：该数据集包括胎儿心率（FHR），和基于产科专家医生分类的cardiotocograms子宫收缩（UC）特征。31. Census Income: Predict whether income exceeds $50K/yr based

19、on census data. Also known as Adult dataset.收入普查DataSet：基于普查数据，预测收入是否超过50000美元/年。也被称为“成人”数据集。32. Census-Income (KDD): This data set contains weighted census data extracted from the 1994 and 1995 current population surveys conducted by the U.S. Census Bureau.收入普查（KDD）DataSet：这个数据集包含了从19941995年的U.S普查局

20、的当前人口调查中提取出来的普查数据。预测33. Challenger USA Space Shuttle O-Ring: Task: predict the number of O-rings that experience thermal distress on a flight at 31 degrees F given data on the previous 23 shuttle flights挑战者号USA航天飞机O形圈DataSet：任务：基于前23次飞行数据，预测在一次31度热压F的状况中的飞行任务的O形圈的数目。34. Character Trajectories: Multi

21、ple, labelled samples of pen tip trajectories recorded whilst writing individual characters. All samples are from the same writer, for the purposes of primitive extraction. Only characters with a single pen-down segment were considered.字符轨迹DataSet：同时写出单个字幕的笔尖轨道的多个标记样本记录。为了保证初始的提取数据，所有的样本都来自于同一个书写人员。

22、仅仅考虑了单一落笔段的字符。35. Chess (Domain Theories): 6 different domain theories for generating legal moves of chess国际象棋（域理论）DataSet：产生国际象棋的规定路数的6个不同的域理论。36. Chess (King-Rook vs. King): Chess Endgame Database for White King and Rook against Black King (KRK).国际象棋（王RookVS王）DataSet：白国王与黑国王的象棋残局数据库。37. Chess (Kin

23、g-Rook vs. King-Knight): Knight Pin Chess End-Game Database Creator国际象棋（王Rook对战骑士）：骑士38. Chess (King-Rook vs. King-Pawn): King+Rook versus King+Pawn on a7 (usually abbreviated KRKPA7).国王Rook与国王Pawn的a7（通常简写为KAEPA7）。39. Cloud: Little Documentation小文档。40. CMU Face Images: This data consists of 640 blac

24、k and white face images of people taken with varying pose (straight, left, right, up), expression (neutral, happy, sad, angry), eyes (wearing sunglasses or not), and sizeCMU人脸图像DataSet：该数据集包含了640张黑白人脸图像，并且有直、左、右、上四个角度，中性、高兴、悲伤、生气四个表情，有的戴着太阳镜，有的没有，并且大小也不一。41. Coil 1999 Competition Data: This data set

25、 is from the 1999 Computational Intelligence and Learning (COIL) competition. The data contains measurements of river chemical concentrations and algae densities.Coil1999竞赛数据：该数据集来自1999年的计算机智能学习竞赛（简写为Coil）。该数据集包含了河流的化学浓度度量和藻类的密度度量。42. Communities and Crime: Communities within the United States. The

26、data combines socio-economic data from the 1990 US Census, law enforcement data from the 1990 US LEMAS survey, and crime data from the 1995 FBI UCR.社区与犯罪DataSet：美国的社区。该数据集包含了来自1990美国普查的社会经济数据、来自1990美国LEMAS调查的法律实施数据，还有来自1995年FBI UCR的犯罪数据。43. Communities and Crime Unnormalized: Communities in the US.

27、Data combines socio-economic data from the 90 Census, law enforcement data from the 1990 Law Enforcement Management and Admin Stats survey, and crime data from the 1995 FBI UCR社区和非标准化犯罪DataSet：美国的社区。数据包含了来自90年代普查的社会经济数据、来自1990年法律实施管理调查的法律实施数据，还有来自1995年FBI UCR的犯罪数据。44. Computer Hardware: Relative CPU

28、 Performance Data, described in terms of its cycle time, memory size, etc.计算机硬件：相关CPU运行数据，采用它的时间周期、内存大小来描述。45. Concrete Compressive Strength: Concrete is the most important material in civil engineering. The concrete compressive strength is a highly nonlinear function of age and ingredients. 混凝土抗压强度

29、DataSet：混凝土是土木工程中最重要的材料。抗压强度是混凝土年龄与组成非线性特征。预测、函数46. Concrete Slump Test: Concrete is a highly complex material. The slump flow of concrete is not only determined by the water content, but that is also influenced by other concrete ingredients.混凝土塌方度试验：混凝土是一种非常复杂的材料。它的塌落度流量不仅取决于含水量，也受其他具体成分的影响。预测、函数47

30、. Congressional Voting Records: 1984 United Stated Congressional Voting Records; Classify as Republican or Democrat国会投票记录DataSet：1984年美国国会投票记录；按照共和党与民主党分类。48. Connect-4: Contains connect-4 positions连接4：包含了连接4的位置。49. Connectionist Bench (Nettalk Corpus): The file nettalk.data contains a list of 20,00

31、8 English words, along with a phonetic transcription for each word. The task is to train a network to produce the proper phonemes连接工作台（Nettalk资料库）：文件“nettalk.data”包含了一个有20008个英语单词的列表，还有一个每个单词的phonetic副本。任务是训练一个网络，用来产生适当的phonemes。50. Connectionist Bench (Sonar, Mines vs. Rocks): The task is to train

32、a network to discriminate between sonar signals bounced off a metal cylinder and those bounced off a roughly cylindrical rock.连接工作台（声纳、矿产和岩石）：目标是训练一个网络，用来区别在金属圆柱体的反弹声纳信号，和在基本为圆柱体的岩石上的反弹信号。51. Connectionist Bench (Vowel Recognition - Deterding Data): Speaker independent recognition of the eleven stea

33、dy state vowels of British English using a specified training set of lpc derived log area ratios.连接工作台（元音识别Detering数据）：使用一个来源于一个比率的指定训练集的11个英式英语的稳定元音字母的独立识别扬声器。52. Contraceptive Method Choice: Dataset is a subset of the 1987 National Indonesia Contraceptive Prevalence Survey.避孕方法的选择：该数据集是1997年印度尼西亚全

34、国的避孕患病率调查的的一个子集。53. Corel Image Features: This dataset contains image features extracted from a Corel image collection. Four sets of features are available based on the color histogram, color histogram layout, color moments, and co-occurrenceCorel图像特征：该数据集包含了提取自一个Corel图像集合的图片特征。基于颜色直方图、颜色直方图布局、颜色的时机

35、和调和，可得到四个特征集合。54. Covertype: Forest CoverType dataset覆盖类型：森林覆盖类型数据集。55. Credit Approval: This data concerns credit card applications; good mix of attributes信贷审批：该数据集与信用卡的使用相关；是各种属性的集合。忠诚度？56. Cylinder Bands: Used in decision tree induction for mitigating process delays known as cylinder bands in rot

36、ogravure printing气缸带：使用判定树来归纳，减缓气缸带的凸版打印。57. Demospongiae: Marine sponges of the Demospongiae class classification domain.Demospongiae类别下的海绵分类域。58. Dermatology: Aim for this dataset is to determine the type of Eryhemato-Squamous Disease.皮肤科：该数据集用于判定Eryhemato鳞状疾病的类型。59. Dexter: DEXTER is a text class

37、ification problem in a bag-of-word representation. This is a two-class classification problem with sparse continuous input variables. This dataset is one of five datasets of the NIPS 2003 feature selection challenge. DETEX是一个用一个文字包来表现的文本分类问题。这是一个通过不断的输入参数的两层的分类问题。该数据集是NIPS2003年特征提取邀请赛的五个数据集中的一个。60.

38、DGP2 - The Second Data Generation Program: Generates application domains based on specific parameters, number of features, and proportion of positive to negative examplesDGP2第二个数据生成程序：基于具体的参数、特征的数量、和正面到负面例子的比率，产生应用域。61. Diabetes: This diabetes dataset is from AIM 94糖尿病：该糖尿病数据集来自AIM94。62. Document Un

39、derstanding: Five concepts, expressed as predicates, to be learned文件理解：要学习的五个概念，作为谓词来表现。63. Dodgers Loop Sensor: Loop sensor data was collected for the Glendale on ramp for the 101 North freeway in Los AngelesDodgers回路传感器：回路传感器数据集来自Gledale的斜坡（在洛杉矶的101个北高速公路）。64. Dorothea: DOROTHEA is a drug discover

40、y dataset. Chemical compounds represented by structural molecular features must be classified as active (binding to thrombin) or inactive. This is one of 5 datasets of the NIPS 2003 feature selection challenge.Dorothea是一个药物发现数据集。以结构分析特征来表现的化合物必须分类为活性的（绑定到凝血酶）或者非活性的。这是五个NIPS2003特征选择挑战赛数据集中的一个。65. E.

41、Coli Genes: Data giving characteristics of each ORF (potential gene) in the E. coli genome. Sequence, homology (similarity to other genes) and structural information, and function (if known) are provided.大肠杆菌基因：每个在E.coli基因组里面ORD(潜在基因)的特征数据集。提供序列、同源性（与其他基因的相似形）和结构信息。还有功能（如果知道的话）。66. EBL Domain Theori

42、es: Assorted small-scale domain theoriesEBL域理论：各种小规模的域理论。67. Echocardiogram: Data for classifying if patients will survive for at least one year after a heart attack超声心动图：该数据集用来分类是否病人在一次心脏病后，至少可以存活一年。68. Ecoli: This data contains protein localization sites该数据集包含了蛋白质本地化地址。69. Economic Sanctions: Doma

43、in Theory on Economic Sanctions; Undocumented经济制裁：经济制裁方面的域理论，无记录文档。70. EEG Database: This data arises from a large study to examine EEG correlates of genetic predisposition to alcoholism. It contains measurements from 64 electrodes placed on the scalp sampled at 256 HzEEG数据库：该数据集来源于一个检查EEG的、与易患酒精中毒的

44、基因体质相关的大型研究、包含了放在头皮上的、为256HZ的、来自64个电极的度量。预测71. El Nino: The data set contains oceanographic and surface meteorological readings taken from a series of buoys positioned throughout the equatorial Pacific.厄尔尼诺：该数据集包含了从整个赤道太平洋的一系列浮标的海洋与地面气象读数。72. Entree Chicago Recommendation Data: This data contains a

45、record of user interactions with the Entree Chicago restaurant recommendation system.芝加哥主菜推荐数据：该数据集包含了一个与芝加哥主菜馆的推荐系统的用户交互的记录。相关推荐73. Flags: From Collins Gem Guide to Flags, 1986标志：从柯林斯宝石指南的标志，198674. Forest Fires: This is a difficult regression task, where the aim is to predict the burned area of fo

46、rest fires, in the northeast region of Portugal, by using meteorological and other data (see details at: http:/www.dsi.uminho.pt/pcortez/forestfires).森林火灾：这是一个艰难的回归的任务，其目的是在葡萄牙东北部地区，利用气象数据和其他数据，预测森林火灾的过火面积，（详见：http:/www.dsi.uminho PT / pcortez / forestfires）。预测75. Function Finding: Cases collected m

47、ostly from investigations in physical science; intention is to evaluate function-finding algorithms寻找功能：收集的情况下，大多是从在物理科学的调查;意图是评价函数发现算法76. Gisette: GISETTE is a handwritten digit recognition problem. The problem is to separate the highly confusible digits 4 and 9. This dataset is one of five dataset

48、s of the NIPS 2003 feature selection challenge. Gisette：GISETTE是一个手写数字识别问题。问题是独立的高度confusible数字4和9。这个数据集是5 NIPS的2003年特征选择挑战的数据集之一。77. Glass Identification: From USA Forensic Science Service; 6 types of glass; defined in terms of their oxide content (i.e. Na, Fe, K, etc)玻璃鉴定：从美国法医科学服务; 6种玻璃;在他们的氧化物含量定义（即钠，铁，钾等）分类78. Habermans Survival: Dataset contains cases from study conducted on the survival of patients who had undergone surgery for breast cancer哈伯曼的生存：DataSet包含谁经历了乳腺癌手术患者的生存所进行的研究情况

展开阅读全文