SAS Part II: Introduction to DATA step programming.

SAS for Windows/X-Windows– Part II:
Introduction to DATA step programming.

VDC –Last update: 02FEB2004

ALL STATISTICAL SEMINARS REQUIRE that you have WINDOWS knowledge
or
experience at INTERMEDIATE Level.

SUMMARY

In this seminar, we cover…
- More on reading in files
- Programming in the DATA step
  - Subsetting IF
  - IF-THEN statements.
  - DO loops.
- Dataset manipulation
- Combining datasets
- Brief introduction to V8
  Features
PLATFORM; Windows, and UNIX.
DURATION: Two hours
PREREQUISITES: MSWindows familiarity and,
Introduction to SAS for Windows I.
HANDS ON: Yes – Examples are located in the network directory
N:\clasfile\stats\

Part I REVIEW

Elementary DATA step and PROC step outputs.
- Commands, Operators,and Functions
- Basic Program:
```
	 libname hdir 'h:\'; 
         libname cdir 'c:\temp\';
         
         data hdir.test1;
         infile 'n:\clasfile\stats\ex1.dat';
         input region $ 1-2 citysize $ 4-4 pop 6-12 product $ 14-17
		saletype $ 19 quantity 21-23 @26 amount 8.2;
         run;
         proc print data=hdir.test1 nobs; run;
```
  Reading data.
  1. Defining External Files:
    - FILENAME statements
      This statement is used to identify an external text [ASCII] file to
      SAS. Can specify typical file options like LRECL, BLKSIZE, and RECFM.
      The format for this command is…
      
      Filename fileref "a:\myfile";
    - LIBNAME statements
      LIBNAME is used to associate a library reference [LIBREF] with the
      physical name of a permanent SAS data library. The LIBREF serves as a
      nickname for the directory location specified in the LIBNAME command. A
      SAS data library is
      a collection of SAS files under the same directory, or the same file type
      depending the operating system.
      The format for this command is…
      
      libname libref [engine] "N:\clasfile\stats\";
  2. The SET Command.
  3. INFILE statement and options
  4. Types of INPUT statements
    - There are 3 ways of writing an INPUT statement: list, column,
      formatted.
      
      List input.
      This method assumes that there is, at least, a space, or other delimiter,
      between each column of data to be read.
      
      Format: input ; E.g.: input name $ pretime ;
      
      Column input
      Sometimes the data is entered as one continues string of numbers without
      any separation between columns of data. In this case, we need to know
      the starting and ending column for each variable. If you do not know
      this information, you will have to get a copy of the first few cases of
      the data and count the column position where each variable starts and
      where it ends. We still need to include the dollar sign if the variable is
      alphanumeric.
      
      Format: input start-end ; E.g.: input name $ 1-12 pretime 13-16;
      
      Alternatively, we can use the @ sign to indicate the starting column for
      that variable. A space, or delimiter, is assumed as ending character.
      
      Format: input @# ; E.g.: input @1 name $ @13 pretime ;
      
      Formatted input: the INFORMATs
      INFORMATs control the way the data is going to be read in.
      There are
      many INFORMATs that can be used. The most common is the numeric one with
      the form
      w.d where,
      w [width] equals the number of places inclusive of decimal places and the
      decimal point, and d [decimal] equals the number of places after decimal
      point.
      
      Format: input w.d; E.g.: input @1 name $12. @13 pretime 4.2;
      
      w.d entered… Will yield…
      
      5.0 ^^^40
      
      5.1 ^40.3
      
      5.2 40.30
      
      5.3 40.30
      
      6.3 40.300
      
      Of course, you can combine all methods.
      
      input @1 name $12.0 pretime 13-16 4.2;
      
      Note: INFORMATs affect the way the data is read in. If you want to change
      the way the data is displayed then you need to use a FORMAT statement.
      There are many FORMATS that can be used depending the data and what you
      like it to look like. For example:
      
      data new; set cdir.test1; format amount dollar10.2; run;
    - INPUT statement options
      Below is a list of some of the options you can use with the
      INPUT command:
      
      @n --> moves column pointer to indicated column. +n --> advances column pointer the indicated number of columns. #n --> advances column pointer to the indicated line, column 1. / --> moves the line pointer to the next record, column 1. @ --> hold the current input line and release the input line at the bottom of the observation loop. @@ --> hold the current input line and carry over to the next repetition of the observation loop.
- Reading in multiple records.
  File Manipulation.
  1. Reading in a single external text file:
    libname cdir 'c:\temp\'; data test1; infile 's:\clasfile\stats\ex1.dat'; input region $ citysize $ pop product $ saletype $ quantity amount; run;
    Alternatively, I could have used the FILENAME command to use a nickname
    for the text input file. This is advantageous if you have multiple text
    files to deal with at the same time.
    libname cdir 'c:\temp\'; data test1; filename myfile 's:\clasfile\stats\ex1.dat'; infile myfile; input region $ citysize $ pop product $ saletype $ quantity amount; run;
  2. Creating a temporary dataset with the first 20 observations, reading
    the data from a permanent SAS dataset.
    data new20; set cdir.test1(obs=20); run;
  3. Overwriting a permanent dataset leaving 20 observations:
    data cdir.test1; set cdir.test1(obs=20); run;
  4. Creating a temporary dataset with selected variables and observations
    from permanent dataset:
    data subset; set cdir.test1(obs=20 keep=name pretime age score); run;
  5. Creating a permanent dataset based on subset of input dataset
    with 20 observations:
    data cdir.subset; set cdir.myfile(obs=20 keep=name pretime age score); if age>20 and age<40 then output; run;
  6. Combining datasets
    - Concadenating
      This method yields one file staked on top of the other.
      All the cases of the first file area read, and the second file is read in.
      This example assumes that both files have the same variables. Else, you
      will have missing data for those rows containing data for the other file’s
      variables.
      
      data combined; set cdir.myfile1 cdir.myfile2; run;
      
      If the same information is listed in both files but with different
      variables names, you can rename the variables of interest. Then:
      
      data combined; set cdir.myfile1(rename=(var1=varA)) cdir.myfile2; run;
    - Interleaving
      In this case, we want to combine two files by some identifier variable.
      Note that both files to be combined have to be sorted before doing the
      following DATA step.
      
      proc sort data=cdir.myfile; by social; run; proc sort data=cdir.myfile2; by ss; run; data combo2; set cdir.myfile1 cdir.myfile2(rename(ss=social)); by social; run;
    - Merging – side by side.
      This method is similar to concadenating two files with the SET command,
      except that here the first row of myfile1 is read, then the first row of
      myfile2 is read, and then the second row.. and so forth.
      
      data merged; merge cdir.myfile1 cdir.myfile2; run;
      
      Matched merging can be obtained by adding a BY statement. Note that here
      you have the option of using ASCENDING, or DESCENDING option:
      
      data merged; merge cdir.myfile1 cdir.myfile2; by descending name; run;
  7. Swaping data in rows to colums
    Sometimes after reading in a spreadsheet file, you might need to swap the
    rows and the colums because the variable names were entered as rows
    instead of column names. To do this, we use PROC TRANPOSE.
    proc transpose data=purple.data2 out=purple.data2t ; id idnum; run;
  Conditional Statements.
  1. Subsetting IF
    Think of the subsetting IF as a wall which can only be crossed if the
    conditional statement is satisfied. This will create the new variable only
    in those cases where x=5, else it will not do it. The output contains only
    those cases that satisfied the conditional.
    index1=index1+1; if region="NE"; var3=var1*(var2); flag=1;
  2. IF THEN statements:
    This a more traditional conditional statement.
    There are several ways of using this. For example,
```
	if x=5 then y=2;
```
    Or…
```
	if x=5 then y=2;
	else y=7;
```
    Also, it can be used with a cascading effect…
```
	if x=5 then y=2;
	else if x=3 then y=3;	
	else  y=7;
```
  3. IF statement with DO loops:
    A DO; END; loop is like an aside, where a group of tasks are performed.
    After completion of the tasks, the program continues where it left.
    Because this IF statement is not subsetting additional lines of code after
    the conditional will be performed.
```
	If x=5 then do;
		y=2;
		index=index+1;
		index2=amount/quantity;
	end;
	newvar=index2/index1;
```
  4. Types of DO loops.
    - do; end;
    - do while(); end;
    - do until(); end;
ODS
Looking ahead… more advanced information and more examples
1. Seminar SAS II: Examples
  and More sample Data
2. Books, documentation and other Help
  and examples resources
3. SAS
  Resources
4. Reading database/spreadsheet files
  Please note that SAS 6.12 will not open EXCELL 97 or higher.
```
PROC ACCESS DBMS=EXCEL;
CREATE purple.booknew;
PATH='C:\DATATEMP\Booki.xls';
GETNAMES YES;
SCANTYPE=YES;
CREATE WORK._IMEX_.VIEW;
SELECT ALL;
RUN;
```
5. Reading SPSS files
6. Advanced programming example
7. Advanced Topics on SAS Data Step
  Programing: What are
  your options?
Where else can I go for more information?
- Your best solution is always the help system of the SAS program and
  to use an engine like
  GOOGLE
  to search the web for what you are looking
  for. In addition,
  
  here are some suggestions.
- SAS: Intermediate/Advanced Level
  Handouts of which
  Notes on: New Features in
  Version 8 of the SAS System offers a self-guided introduction to cool
  features of SAS.
- Frequently Asked Questions for The SAS System Products.
2004-8-3 VDC: WWWSTATS@uic.edu

PurpuraInk Consulting- Chicago, IL & San Juan, PR

From your desktop to ours… linking technology and research.

SAS Part II: Introduction to DATA step programming.

SAS for Windows/X-Windows– Part II:
Introduction to DATA step programming.

w.d entered…	Will yield…
5.0	^^^40
5.1	^40.3
5.2	40.30
5.3	40.30
6.3	40.300

SAS for Windows/X-Windows– Part II: Introduction to DATA step programming.

Share this:

SAS for Windows/X-Windows– Part II:
Introduction to DATA step programming.