SAS BASE要点笔记.doc

上传人:文库蛋蛋多 文档编号:2396044 上传时间:2023-02-17 格式:DOC 页数:119 大小:719KB
返回 下载 相关 举报
SAS BASE要点笔记.doc_第1页
第1页 / 共119页
SAS BASE要点笔记.doc_第2页
第2页 / 共119页
SAS BASE要点笔记.doc_第3页
第3页 / 共119页
SAS BASE要点笔记.doc_第4页
第4页 / 共119页
SAS BASE要点笔记.doc_第5页
第5页 / 共119页
点击查看更多>>
资源描述

《SAS BASE要点笔记.doc》由会员分享,可在线阅读,更多相关《SAS BASE要点笔记.doc(119页珍藏版)》请在三一办公上搜索。

1、Accessing Data and Creating Data StructuresTopic: Accessing Data and Creating Data Structures1.Reading raw data files using INFILE and INPUT statement2.Writing _NULL_ Data Set3. Assigning and change variable attributes4. Import database table or data file into SAS dataset5. Labeling variables6.Readi

2、ng existing SAS dataset7. Restricting observations while reading data8. Creating temporary and permanent SAS data sets9. Exporting data to different files10.Displaying contents of dataset11. Restricting observations and variables in a SAS data set processed1. Reading raw data files using INFILE and

3、INPUT statement1.1 Introduction1.1.1 Common Step Boundary Keywords:DATA PROC CARDSDATALINES QUITRUN1.1.2 Data Step Flowdata sales; infile rawin;input name $1-10 division $12 years 15-16 sales 19-25;run; proc print data=sales;run;Note: The use of RUN after each step is highly recommendedA. The Compil

4、ation PhaseWhen you submit a DATA step for execution, SAS checks the syntax of the SAS statements and compiles them, that is, automatically translates the statements into machine code. In this phase, SAS identifies the type and length of each new variable, and determines whether a type conversion is

5、 necessary for each subsequent reference to a variable. During the compile phase, SAS creates the following three items: input bufferis a logical area in memory into which SAS reads each record of raw data when SAS executes an INPUT statement. Note that this buffer is created only when the DATA step

6、 reads raw data. (When the DATA step reads a SAS data set, SAS reads the data directly into the program data vector.)program data vector (PDV)is a logical area in memory where SAS builds a data set, one observation at a time. When a program executes, SAS reads data values from the input buffer or cr

7、eates them by executing SAS language statements. The data values are assigned to the appropriate variables in the program data vector. From here, SAS writes the values to a SAS data set as a single observation. Along with data set variables and computed variables, the PDV contains two automatic vari

8、ables, _N_ and _ERROR_. The _N_ variable counts the number of times the DATA step begins to iterate. The _ERROR_ variable signals the occurrence of an error caused by the data during execution. The value of _ERROR_ is either 0 (indicating no errors exist), or 1 (indicating that one or more errors ha

9、ve occurred). SAS does not write these variables to the output data set. descriptor information is information that SAS creates and maintains about each SAS data set, including data set attributes and variable attributes. It contains, for example, the name of the data set and its member type, the da

10、te and time that the data set was created, and the number, names and data types (character or numeric) of the variables. B. The Execution PhaseBy default, a simple DATA step iterates once for each observation that is being created. The flow of action in the Execution Phase of a simple DATA step is d

11、escribed as follows: 1. The DATA step begins with a DATA statement. Each time the DATA statement executes, a new iteration of the DATA step begins, and the _N_ automatic variable is incremented by 1. 2. SAS sets the newly created program variables to missing in the program data vector (PDV). 3. SAS

12、reads a data record from a raw data file into the input buffer, or it reads an observation from a SAS data set directly into the program data vector. You can use an INPUT, MERGE, SET, MODIFY, or UPDATE statement to read a record. 4. SAS executes any subsequent programming statements for the current

13、record. 5. At the end of the statements, an output, return, and reset occur automatically. SAS writes an observation to the SAS data set, the system automatically returns to the top of the DATA step, and the values of variables created by INPUT and assignment statements are reset to missing in the p

14、rogram data vector. Note that variables that you read with a SET, MERGE, MODIFY, or UPDATE statement are not reset to missing here. 6. SAS counts another iteration, reads the next record or observation, and executes the subsequent programming statements for the current observation. 7. The DATA step

15、terminates when SAS encounters the end-of-file in a SAS data set or a raw data file. Note:It shows the default processing of the DATA step. You can code data-reading statements (such as INPUT or SET), or data-writing statements (such as OUTPUT), in any order in your program.Flow of Action in the DAT

16、A StepDiagnosing Errors in the Compilation Phase Now that you know how a DATA step is processed, you can use that knowledge to correct errors. There were errors that are detected during the compilation phase, including misspelled keywords and data set names missing semicolons unbalanced quotation ma

17、rks invalid options. During the compilation phase, SAS can interpret some syntax errors (such as the keyword DATA misspelled as DAAT). If it cannot interpret the error, SAS prints the word ERROR followed by an error message in the log compiles but does not execute the step where the error occurred,

18、and prints the following message to warn you: NOTE:The SAS System stopped processing this step because of errors.Some errors are explained fully by the message that SAS prints; other error messages are not as easy to interpret. For example, because SAS statements are free-format, when you fail to en

19、d a SAS statement with a semicolon, SAS does not always detect the error at the point where it occurs.Diagnosing Errors in the Execution Phase As you have seen, errors can occur in the compilation phase, resulting in a DATA step that is compiled but not executed. Errors can also occur during the exe

20、cution phase. When SAS detects an error in the execution phase, the following can occur, depending on the type of error: A note, warning, or error message is displayed in the log. The values that are stored in the program data vector are displayed in the log. The processing of the step either contin

21、ues or stops. 1.2 Basic Forms of INPUT Statement The most common way to create new datasets is by submitting a DATA step. The INPUT statement describes what data will be contained in your new data set. It is used to read data from an external source, or from lines contained in your SAS program.1.2.1

22、 List InputUse the List input mode to read data recorded with at least one blank space separating each data field. Missing values are represented as a dot (period). This is the simple form of input (freeform list or format-free). DATA Census; INPUT State $ Pop ; CARDS; NC 5.085 SC 2.590 VA 1.360 MA

23、3.450 PA .; run;1.2.2 Column InputUse Column input mode to read the following type of data. The variables must be listed in the order in which they appear in the input data.- Character and numeric data - Data values which are entered in fixed column positions - Character values longer than eight cha

24、racters - Character values that contain embedded blanks Example: INPUT variable startcol - endcol; DATA Census; INPUT State $ 1-2 Pop 3-7; CARDS; NC5.082 SC2.590 VA1.360;run;DATA Census2; INPUT State $ 1-10 Pop 11-15 ; CARDS; New York 5.082 New Jersey 2.590 ;run;The numbers after each variable name

25、in the input statement indicate the columns in which this variable can be found. The $ indicates that the variable is character.Notice that the primary difference between this “column” input statement and the “freeform list” input statement is the inclusion of column ranges telling SAS where in the

26、data set to find the information for each variable.1.2.3 Formatted InputFormatted input is a technique for reading data that allows you to specify the beginning column of a filed and optionally its type and format. Use formatted input mode to read the following: Form: INPUT pointer control variable

27、; Example: data inpt; input 1 custom_id $6. 7 custom_name $6. 14 rental_fee ;datalines;240442Smith 950240910Yang 1120240808Andrew1050;Limitations and advantages of formatted input include: Data must be in fixed columns Data fields may be read in any order. Blanks need not to separate fields. Input o

28、nly the variables you need - skip the rest. Read in data using informat and format specifications.FormatSpecifies values .ExampleCOMMAw.dthat contain commas and decimal placescomma8.2DOLLARw.dthat contain dollar signs and commasdollar6.2MMDDYYw.as date values of the form 09/12/97 (MMDDYY8.) or 09/12

29、/1997 (MMDDYY10.)mmddyy10.w.rounded to the nearest integer in w spaces7.w.drounded to d decimal places in w spaces8.2$w.as character values in w spaces$12.DATEw.as date values of the form 16OCT99 (DATE7.) or 16OCT1999 (DATE9.)date9.1.2.4 Name InputYou can use named input to read records in which dat

30、a values are preceded by the name of the variable and an equal sign (=). The following INPUT statement reads the data lines containing equal signs. data games; input name=$ score1= score2=; datalines;name=riley score1=1132 score2=1187;proc print data=games;run;Note: All forms of input, except the na

31、med input, can be used in any combination1.2.5 Multiple Styles in a Single INPUT Statementdata mul;input idno name $18. team $ 24-30 startwght endwght;cards;023 David Shaw red 189 165049 Amelia Serrano yellow 189 165;The value of IDNO, STARTWGHT, and ENDWGHT are read with list input, the value of NA

32、ME with formatted input, and the value of TEAM with column input.1.3 PointerControls As SAS reads values from the input data records into the input buffer, it keeps track of its position with a pointer. The INPUT statement provides three ways to control the movement of the pointer: 1.3.1 column poin

33、ter controls: reset the pointers column position when the data values in the data records are read, such as , +n1.3.2 line pointer controls: reset the pointers line position when the data values in the data records are read, such as #n, slash(/)1.3.3 line-hold specifiers: hold an input record in the

34、 input buffer so that another INPUT statement can process it. By default, the INPUT statement releases the previous record and reads another record, such as & Summary: PointerControls: n go to column n +n move the pointer n positions hold the current input line and re-read certain variables. useful

35、when each input line contains values for several observations/ skip to the next line of raw data#n stands for the number of the line of raw data for that observationExample:data p; input 1 x 3. +1 y z 3. ;cards;101 29 169102 30 174103 35 172;More Examples for Multiple Observations per Line or Multip

36、le Lines per Observation:If more than one observation exits on a line, use the option at the end of the input statement.Data pc; Input a b c ;Cards;1 1 1 2 2 2 3 3 34 4 4;run;data mm;input age gender $ / weight height #3 education $ income;datalines;35 M 145 174 College 5800032 F120 163High 34000;If

37、 a single observation is spread over across more than one line, the option will hold the current record until all variables have been result inData pc2; Input a b c ;datalines;1 112 2 23334 44 ;1.4 Datalines & Cards They perform same task. Use the DATALINES statement with an INPUT statement to read

38、data that you enter directly in the program, rather than data stored in an external file. You can use only one DATALINES statement in a DATA step. 1.5 Line Hold SpecifiersLine hold specifiers are used to maintain the position of the line and column pointers on the current line in the external file t

39、hrough multiple INPUT statements or multiple iterations of a single data step. Placed at the end of the INPUT statement, they instruct SAS not to read a new record when the next INPUT statement is executed. This capability is the key element of techniques used to read more complex files and to impro

40、ve efficiency. (trailing at-sign) tells SAS to keep this record current until either an INPUT is executed without a trailing or trailing , or until this iteration of the DATA step is completed. 1.6 INFILE Statement- Identifies an external file to read with an INPUT statement.- Be valid in a DATA ste

41、p.- MUST appear BEFORE the input statement that reads data from the file.- Informs SAS where the Data file is located and name of the fileSyntax:INFILE file-specification ; Reading external file with INFILE statement and command options/* Example 1: Reading external file or existing file with DLM op

42、tion*/;Data my2; length custom_id $ 6 department $20; infile a:mydata2.txt DLM=, FIRSTOBS=2; input custom_id $ department $; run;Note: dlm: This option allows you to tell SAS what character is used as a delimiter in a file. If this option is not specified, SAS assumes the delimiter is a space. Some

43、common delimiters are comma, vertical pipe, semi-colon, and tab. The syntax for the option would be as follows: indicates that commas are used to separate variables within the text file. If data had tab characters between values instead of commas, then you could use the following program to read the

44、 file.data tab; infile a:tab.txt dlm=09x; input x y z;run;/* Example 2: Reading external file or existing file with DSD option*/;DSD (Delimited Separated Data) - It has three functions when reading delimited files. The first function is to strip off any quotes that surround values in the text file.

45、The second function deals with missing values. When SAS encounters consecutive delimiters in a file, the default action is to treat the delimiters as one unit. If a file has consecutive delimiters, its usually because there are missing values between them. DSD tells SAS to treat consecutive delimite

46、rs separately; therefore, a value that is missing between consecutive delimiters will be read as a missing value when DSD is specified. The third function assumes the delimiter is a comma. If DSD is specified and the delimiter is a comma, the DLM= option is not necessary. If another delimiter is used, the DLM= option must be used as well. data temp; infile cards dsd dlm=,;/*strip off quotes*/; input a b c d;cards;54,75,253,4487,3,55,465905,66,354;data temp2; infile cards dsd; /* treat consecutive delimiters*/; input a b c;cards;54,4487,55,4690,35;data temp3; infile a:phone.txt d

展开阅读全文
相关资源
猜你喜欢
相关搜索
资源标签

当前位置:首页 > 建筑/施工/环境 > 项目建议


备案号:宁ICP备20000045号-2

经营许可证:宁B2-20210002

宁公网安备 64010402000987号