
SAS for Windows/X-Windows– Part II:
Introduction to DATA step programming.
VDC –Last update: 02FEB2004

ALL STATISTICAL SEMINARS REQUIRE that you have WINDOWS knowledge or experience at INTERMEDIATE Level. |
- SUMMARY
- Part I REVIEW
- Reading data.
- File Manipulation.
- Conditional Statements.
- ODS
- Looking ahead… more advanced information and more examples
- Where else can I go for more information?
- In this seminar, we cover…
- More on reading in files
- Programming in the DATA step
- Subsetting IF
- IF-THEN statements.
- DO loops.
- Dataset manipulation
- Combining datasets
- Brief introduction to V8
Features
- PLATFORM; Windows, and UNIX.
- DURATION: Two hours
- PREREQUISITES: MSWindows familiarity and,
Introduction to SAS for Windows I. - HANDS ON: Yes – Examples are located in the network directory
N:\clasfile\stats\
Elementary DATA step and PROC step outputs.
- Commands, Operators,and Functions
- Basic Program:
libname hdir 'h:\'; libname cdir 'c:\temp\'; data hdir.test1; infile 'n:\clasfile\stats\ex1.dat'; input region $ 1-2 citysize $ 4-4 pop 6-12 product $ 14-17 saletype $ 19 quantity 21-23 @26 amount 8.2; run; proc print data=hdir.test1 nobs; run;
- Defining External Files:
- FILENAME statements
This statement is used to identify an external text [ASCII] file to
SAS. Can specify typical file options like LRECL, BLKSIZE, and RECFM.
The format for this command is…
Filename fileref "a:\myfile";
- FILENAME statements
- Defining External Files:
- LIBNAME statements
LIBNAME is used to associate a library reference [LIBREF] with the
physical name of a permanent SAS data library. The LIBREF serves as a
nickname for the directory location specified in the LIBNAME command. A
SAS data library is
a collection of SAS files under the same directory, or the same file type
depending the operating system.
The format for this command is…
libname libref [engine] "N:\clasfile\stats\";
- The SET Command.
-
This commands substitutes INFILE, INPUT and CARDS commands since file is
already a SAS dataset and not a text file. The options listed, are used to
control the number of variables or observations being read in.
Format: set <(keep=)>; /* read in vars listed */ set <(drop=)>; /* read all vars except listed */ set <(obs=)>; /* read in specified number of obs */ E.g.: data new; set cdir.file1(keep=region quantity amount)(obs=20); run;
- Below is a list of some of the options you can use with the command
INFILE:
MISSOVER STOPOVER END=variable DELIMITER FIRSTOBS= OBS= DLM= --> use to select a delimiter other than a space. For example, INFILE mydile DLM=','; DSD= --> use to select multiple and consecutive delimiters.
- There are 3 ways of writing an INPUT statement: list, column,
formatted.- List input.
This method assumes that there is, at least, a space, or other delimiter,
between each column of data to be read.Format: input ; E.g.: input name $ pretime ;
- Column input
Sometimes the data is entered as one continues string of numbers without
any separation between columns of data. In this case, we need to know
the starting and ending column for each variable. If you do not know
this information, you will have to get a copy of the first few cases of
the data and count the column position where each variable starts and
where it ends. We still need to include the dollar sign if the variable is
alphanumeric.Format: input start-end ; E.g.: input name $ 1-12 pretime 13-16;
Alternatively, we can use the @ sign to indicate the starting column for
that variable. A space, or delimiter, is assumed as ending character.Format: input @# ; E.g.: input @1 name $ @13 pretime ;
- Formatted input: the INFORMATs
INFORMATs control the way the data is going to be read in.
There are
many INFORMATs that can be used. The most common is the numeric one with
the form
w.dwhere,
w [width] equals the number of places inclusive of decimal places and the
decimal point, and d [decimal] equals the number of places after decimal
point.Format: input w.d; E.g.: input @1 name $12. @13 pretime 4.2;
w.d entered… Will yield… 5.0 ^^^40 5.1 ^40.3 5.2 40.30 5.3 40.30 6.3 40.300
Of course, you can combine all methods.
input @1 name $12.0 pretime 13-16 4.2;
Note: INFORMATs affect the way the data is read in. If you want to change
the way the data is displayed then you need to use a FORMAT statement.
There are many FORMATS that can be used depending the data and what you
like it to look like. For example:
data new; set cdir.test1; format amount dollar10.2; run;
- List input.
- INPUT statement options
- Below is a list of some of the options you can use with the
INPUT command:@n --> moves column pointer to indicated column. +n --> advances column pointer the indicated number of columns. #n --> advances column pointer to the indicated line, column 1. / --> moves the line pointer to the next record, column 1. @ --> hold the current input line and release the input line at the bottom of the observation loop. @@ --> hold the current input line and carry over to the next repetition of the observation loop.
-
If you have multiple records per subject, you want to read in variables
whose values are across multiple rows, or records. In the following
example, there are a total of 7 records.
/*Comments go here */ input idnum 1-3 /* = line 1 */ /// salary 20-27 /* = line 2,3,4 --read in line 4 */ // gender $ /* = line 5,6 -- read in line 6 */ / ; /* = line 7 -- jump 7 */
By the way, notice that the way of commenting out text is by using pairs
of
/* */ , where the text to be commented out is inside ofthe pair of these characters.
- Reading in a single external text file:
libname cdir 'c:\temp\'; data test1; infile 's:\clasfile\stats\ex1.dat'; input region $ citysize $ pop product $ saletype $ quantity amount; run;
Alternatively, I could have used the FILENAME command to use a nickname
for the text input file. This is advantageous if you have multiple text
files to deal with at the same time.
libname cdir 'c:\temp\'; data test1; filename myfile 's:\clasfile\stats\ex1.dat'; infile myfile; input region $ citysize $ pop product $ saletype $ quantity amount; run;
the data from a permanent SAS dataset.
data new20; set cdir.test1(obs=20); run;
data cdir.test1; set cdir.test1(obs=20); run;
from permanent dataset:
data subset; set cdir.test1(obs=20 keep=name pretime age score); run;
with 20 observations:
data cdir.subset; set cdir.myfile(obs=20 keep=name pretime age score); if age>20 and age<40 then output; run;
- Concadenating
This method yields one file staked on top of the other.
All the cases of the first file area read, and the second file is read in.
This example assumes that both files have the same variables. Else, you
will have missing data for those rows containing data for the other file’s
variables.
data combined; set cdir.myfile1 cdir.myfile2; run;
If the same information is listed in both files but with different
variables names, you can rename the variables of interest. Then:
data combined; set cdir.myfile1(rename=(var1=varA)) cdir.myfile2; run;
- Interleaving
In this case, we want to combine two files by some identifier variable.
Note that both files to be combined have to be sorted before doing the
following DATA step.
proc sort data=cdir.myfile; by social; run; proc sort data=cdir.myfile2; by ss; run; data combo2; set cdir.myfile1 cdir.myfile2(rename(ss=social)); by social; run;
- Merging – side by side.
This method is similar to concadenating two files with the SET command,
except that here the first row of myfile1 is read, then the first row of
myfile2 is read, and then the second row.. and so forth.
data merged; merge cdir.myfile1 cdir.myfile2; run;
Matched merging can be obtained by adding a BY statement. Note that here
you have the option of using ASCENDING, or DESCENDING option:
data merged; merge cdir.myfile1 cdir.myfile2; by descending name; run;
Sometimes after reading in a spreadsheet file, you might need to swap the
rows and the colums because the variable names were entered as rows
instead of column names. To do this, we use PROC TRANPOSE.
proc transpose data=purple.data2 out=purple.data2t ;
id idnum;
run;
-
Conditional processing is used to select subgroups of data or limit when a
- Subsetting IF
Think of the subsetting IF as a wall which can only be crossed if the
conditional statement is satisfied. This will create the new variable only
in those cases where x=5, else it will not do it. The output contains only
those cases that satisfied the conditional.
index1=index1+1; if region="NE"; var3=var1*(var2); flag=1;
- IF THEN statements:
This a more traditional conditional statement.
There are several ways of using this. For example,if x=5 then y=2;
Or…
if x=5 then y=2; else y=7;
Also, it can be used with a cascading effect…
if x=5 then y=2; else if x=3 then y=3; else y=7;
- IF statement with DO loops:
A DO; END; loop is like an aside, where a group of tasks are performed.
After completion of the tasks, the program continues where it left.
Because this IF statement is not subsetting additional lines of code after
the conditional will be performed.If x=5 then do; y=2; index=index+1; index2=amount/quantity; end; newvar=index2/index1;
- Types of DO loops.
- do; end;
- do while(); end;
- do until(); end;
certain task will be performed.
-
The Output Delivery System [ODS] is a new module with serves as a
- Generate SAS data files [tables] as the output of PROCedures.
- Costumize reports.
- Send output to a variety of destinations. There are two types of
destinations: Report and Data. Report destinations
include: LISTING, HTML,
and PRINTER -which can be either postscript or PCL. Data - Select, and/or exclude, any part [components] of your output.
- If more than one component is generated, organize and manage your
output.
mediatior between the procedures and the output. Detailed information on
ODS is availbable from the handout
New Features of SAS V8. Briefly,
By invoking ODS, you can:
For example:
ods listing close; ods html path='c:\temp\' (url=none) body='printout.html' style=sasweb; options pageno=1 linesize=120 pagesize=65; proc print data=cdir.test1 nobs; run; ods html close ; ods listing;
Looking ahead… more advanced information and more examples
- Seminar SAS II: Examples
and More sample Data - Books, documentation and other Help
and examples resources - SAS
Resources - Reading database/spreadsheet files
Please note that SAS 6.12 will not open EXCELL 97 or higher.
PROC ACCESS DBMS=EXCEL; CREATE purple.booknew; PATH='C:\DATATEMP\Booki.xls'; GETNAMES YES; SCANTYPE=YES; CREATE WORK._IMEX_.VIEW; SELECT ALL; RUN;
- Reading SPSS files
-
A caveat with reading SPSS files into SAS is that SAS ONLY
reads SPSS
portable files. Therefore, you have to save the SPSS file from
within SPSS
as a .por file.READING FILES USING PROC CONVERT filename in 'spss-file-name'; proc convert spss=in out=temp; run; libname purple "c:\temp\"; filename file1 "c:\datatemp\myfile1.por"; filename file2 "c:\datatemp\myfile2.por"; proc convert spss=file1 out=purple.new1; run; proc convert spss=file2 out=purple.new2; run; READING FILES USING ENGINES libname spssfile spss 'spss-file-name'; proc copy in=spssfile out=work; run;
- Advanced programming example
- Advanced Topics on SAS Data Step
Programing: What are
your options?
Where else can I go for more information?
- Your best solution is always the help system of the SAS program and
to use an engine like
GOOGLE
to search the web for what you are looking
for. In addition,
here are some suggestions. - SAS: Intermediate/Advanced Level
Handouts of which
Notes on: New Features in
Version 8 of the SAS System offers a self-guided introduction to cool
features of SAS.
Frequently Asked Questions for The SAS System Products.
| 2004-8-3 VDC: |
|
|