SAS Part II: Introduction to DATA step programming.


SAS for Windows/X-Windows– Part II:
Introduction to DATA step programming.

 

VDC –Last update: 02FEB2004


ALL STATISTICAL SEMINARS REQUIRE that you have WINDOWS knowledge
or
experience at INTERMEDIATE Level.

 

 
 
 

 

 
 
 

SUMMARY

 

  • In this seminar, we cover…
    • More on reading in files
    • Programming in the DATA step
      • Subsetting IF
      • IF-THEN statements.
      • DO loops.
    • Dataset manipulation
    • Combining datasets
    • Brief introduction to V8
      Features
  • PLATFORM; Windows, and UNIX.
  • DURATION: Two hours
  • PREREQUISITES: MSWindows familiarity and,
    Introduction to SAS for Windows I.
  • HANDS ON: Yes – Examples are located in the network directory
    N:\clasfile\stats\


     

     
     
     

    Part I REVIEW

     

    Elementary DATA step and PROC step outputs.

    • Commands, Operators,and Functions
    • Basic Program:
      	 libname hdir 'h:\'; 
               libname cdir 'c:\temp\';
               
               data hdir.test1;
               infile 'n:\clasfile\stats\ex1.dat';
               input region $ 1-2 citysize $ 4-4 pop 6-12 product $ 14-17
      		saletype $ 19 quantity 21-23 @26 amount 8.2;
               run;
               proc print data=hdir.test1 nobs; run;
      

       

       
       
       

      Reading data.

       

      1. Defining External Files:
        • FILENAME statements

          This statement is used to identify an external text [ASCII] file to
          SAS. Can specify typical file options like LRECL, BLKSIZE, and RECFM.
          The format for this command is…

          Filename fileref "a:\myfile";

        • LIBNAME statements

          LIBNAME is used to associate a library reference [LIBREF] with the
          physical name of a permanent SAS data library. The LIBREF serves as a
          nickname for the directory location specified in the LIBNAME command. A
          SAS data library is
          a collection of SAS files under the same directory, or the same file type
          depending the operating system.
          The format for this command is…

          libname libref [engine] "N:\clasfile\stats\";

      2. The SET Command.
          This commands substitutes INFILE, INPUT and CARDS commands since file is
          already a SAS dataset and not a text file. The options listed, are used to
          control the number of variables or observations being read in.

          Format:	set  <(keep=)>; /* read in vars listed */
          	set  <(drop=)>; /* read all vars except listed */
          	set  <(obs=)>;   /* read in specified number of obs */
          
          E.g.:	data new;
          	set cdir.file1(keep=region quantity amount)(obs=20);
          	run;
          
      3. INFILE statement and options
          Below is a list of some of the options you can use with the command
          INFILE:

          MISSOVER
          STOPOVER
          END=variable
          DELIMITER
          FIRSTOBS=
          OBS=		
          DLM=		--> use to select a delimiter other than a space. 
          		For example, INFILE mydile DLM=',';
          DSD=		--> use to select multiple and consecutive delimiters.
          

      4. Types of INPUT statements
        • There are 3 ways of writing an INPUT statement: list, column,
          formatted.

          • List input.

            This method assumes that there is, at least, a space, or other delimiter,
            between each column of data to be read.

            Format:		input   ;
            
            E.g.:		input name $ pretime ;
            
          • Column input

            Sometimes the data is entered as one continues string of numbers without
            any separation between columns of data. In this case, we need to know
            the starting and ending column for each variable. If you do not know
            this information, you will have to get a copy of the first few cases of
            the data and count the column position where each variable starts and
            where it ends. We still need to include the dollar sign if the variable is
            alphanumeric.

            Format:		input   start-end ; 
            
            E.g.:		input name $ 1-12 pretime 13-16;
            

            Alternatively, we can use the @ sign to indicate the starting column for
            that variable. A space, or delimiter, is assumed as ending character.

            Format:		input @#   ; 
            
            E.g.:		input @1 name $  @13 pretime ;
            
          • Formatted input: the INFORMATs

            INFORMATs control the way the data is going to be read in.
            There are
            many INFORMATs that can be used. The most common is the numeric one with
            the form
            w.d where,
            w [width] equals the number of places inclusive of decimal places and the
            decimal point, and d [decimal] equals the number of places after decimal
            point.

            Format:		input  w.d;
            
            E.g.:		input @1 name $12. @13 pretime 4.2;
            
            w.d entered… Will yield…
            5.0 ^^^40
            5.1 ^40.3
            5.2 40.30
            5.3 40.30
            6.3 40.300

          Of course, you can combine all methods.

          	input @1 name $12.0  pretime 13-16 4.2;
          

          Note: INFORMATs affect the way the data is read in. If you want to change
          the way the data is displayed then you need to use a FORMAT statement.
          There are many FORMATS that can be used depending the data and what you
          like it to look like. For example:

          	data new;
          	set cdir.test1;
          	format amount dollar10.2;	
          	run;
          

        • INPUT statement options
            Below is a list of some of the options you can use with the
            INPUT command:

            @n	--> moves column pointer to indicated column.
            +n	--> advances column pointer the indicated number of columns.
            #n	--> advances column pointer to the indicated line, column 1.
            /	--> moves the line pointer to the next record, column 1.
            @	--> hold the current input line and release the input line 
            		at the bottom of the observation loop. 
            @@	--> hold the current input line and carry over to the next
            		repetition of the observation loop.
            

    • Reading in multiple records.
        If you have multiple records per subject, you want to read in variables
        whose values are across multiple rows, or records. In the following
        example, there are a total of 7 records.

        			/*Comments go here   */	
        input idnum 1-3		/*  = line 1			  */
        /// salary 20-27		/*  = line 2,3,4 --read in line 4 */
        // gender $		/*  = line 5,6 -- read in line 6  */
        / ;			/* = line 7 -- jump 7		  */
        


        By the way, notice that the way of commenting out text is by using pairs
        of /* */ , where the text to be commented out is inside of
        the pair of these characters.

       

       
       
       

      File Manipulation.

       

      1. Reading in a single external text file:

        	libname cdir 'c:\temp\';
                data test1;
                infile 's:\clasfile\stats\ex1.dat';
                input  region $ citysize $ pop product $ saletype $ 
        		quantity amount;
                run;
        


        Alternatively, I could have used the FILENAME command to use a nickname
        for the text input file. This is advantageous if you have multiple text
        files to deal with at the same time.

        	libname cdir 'c:\temp\';
                data test1;
         	filename myfile 's:\clasfile\stats\ex1.dat';
                infile myfile;
                input  region $ citysize $ pop product $ saletype $ 
        		quantity amount;
        
                run;
        

      2. Creating a temporary dataset with the first 20 observations, reading
        the data from a permanent SAS dataset.

        	data new20;
        	set cdir.test1(obs=20);
        	run;
        

      3. Overwriting a permanent dataset leaving 20 observations:

        	data cdir.test1;
        	set cdir.test1(obs=20);
        	run;
        

      4. Creating a temporary dataset with selected variables and observations
        from permanent dataset:

        	data subset;
        	set cdir.test1(obs=20 keep=name pretime age score);
        	run;
        

      5. Creating a permanent dataset based on subset of input dataset
        with 20 observations:

        	data cdir.subset;
        	set cdir.myfile(obs=20 keep=name pretime age score);
        	if age>20 and age<40 then output;
        	run;
        

      6. Combining datasets
        • Concadenating
          This method yields one file staked on top of the other.
          All the cases of the first file area read, and the second file is read in.
          This example assumes that both files have the same variables. Else, you
          will have missing data for those rows containing data for the other file’s
          variables.

          	data combined;
          	set cdir.myfile1 cdir.myfile2;
          	run;
          
          


          If the same information is listed in both files but with different
          variables names, you can rename the variables of interest. Then:

          	data combined;
          	set cdir.myfile1(rename=(var1=varA)) cdir.myfile2;
          	run;
          

        • Interleaving
          In this case, we want to combine two files by some identifier variable.
          Note that both files to be combined have to be sorted before doing the
          following DATA step.

          proc sort data=cdir.myfile; by social; run;
          proc sort data=cdir.myfile2; by ss; run;
          data combo2;
          	set cdir.myfile1 cdir.myfile2(rename(ss=social));
          	by social;
          	run;
          

        • Merging – side by side.

          This method is similar to concadenating two files with the SET command,
          except that here the first row of myfile1 is read, then the first row of
          myfile2 is read, and then the second row.. and so forth.

          	data merged;
          	merge cdir.myfile1 cdir.myfile2;
          	run;
          


          Matched merging can be obtained by adding a BY statement. Note that here
          you have the option of using ASCENDING, or DESCENDING option:

          	data merged;
          	merge cdir.myfile1 cdir.myfile2;
          	by descending name;
          	run;
          

      7. Swaping data in rows to colums

        Sometimes after reading in a spreadsheet file, you might need to swap the
        rows and the colums because the variable names were entered as rows
        instead of column names. To do this, we use PROC TRANPOSE.

        proc transpose data=purple.data2 out=purple.data2t ;
                id idnum;
                run;
        

       

       
       
       

      Conditional Statements.

       

        Conditional processing is used to select subgroups of data or limit when a
        certain task will be performed.

      1. Subsetting IF

        Think of the subsetting IF as a wall which can only be crossed if the
        conditional statement is satisfied. This will create the new variable only
        in those cases where x=5, else it will not do it. The output contains only
        those cases that satisfied the conditional.

        	index1=index1+1;
        	if region="NE";
        		var3=var1*(var2);
        		flag=1;
        

      2. IF THEN statements:

        This a more traditional conditional statement.
        There are several ways of using this. For example,

        	if x=5 then y=2;

        Or…

        	if x=5 then y=2;
        	else y=7;
        

        Also, it can be used with a cascading effect…

        	if x=5 then y=2;
        	else if x=3 then y=3;	
        	else  y=7;
        
      3. IF statement with DO loops:

        A DO; END; loop is like an aside, where a group of tasks are performed.
        After completion of the tasks, the program continues where it left.
        Because this IF statement is not subsetting additional lines of code after
        the conditional will be performed.

        	If x=5 then do;
        		y=2;
        		index=index+1;
        		index2=amount/quantity;
        	end;
        	newvar=index2/index1;
        
      4. Types of DO loops.
        • do; end;
        • do while(); end;
        • do until(); end;

     

     
     
     

    ODS

     

      The Output Delivery System [ODS] is a new module with serves as a
      mediatior between the procedures and the output. Detailed information on
      ODS is availbable from the handout
      New Features of SAS V8. Briefly,

      By invoking ODS, you can:

      • Generate SAS data files [tables] as the output of PROCedures.
      • Costumize reports.
      • Send output to a variety of destinations. There are two types of
        destinations: Report and Data. Report destinations
        include: LISTING, HTML,
        and PRINTER -which can be either postscript or PCL. Data
      • Select, and/or exclude, any part [components] of your output.
      • If more than one component is generated, organize and manage your
        output.

      For example:

        ods listing close;
        ods html path='c:\temp\' (url=none) body='printout.html' style=sasweb;
        options pageno=1 linesize=120 pagesize=65;
        
        proc print data=cdir.test1 nobs;  run;
        
        
        ods html close ; ods listing;
        

     

     
     
     

    Looking ahead… more advanced information and more examples

     

    1. Seminar SAS II: Examples
      and More sample Data
    2. Books, documentation and other Help
      and examples resources
    3. SAS
      Resources
    4. Reading database/spreadsheet files

      Please note that SAS 6.12 will not open EXCELL 97 or higher.

      PROC ACCESS DBMS=EXCEL;
      CREATE purple.booknew;
      PATH='C:\DATATEMP\Booki.xls';
      GETNAMES YES;
      SCANTYPE=YES;
      CREATE WORK._IMEX_.VIEW;
      SELECT ALL;
      RUN;
      

    5. Reading SPSS files
        A caveat with reading SPSS files into SAS is that SAS ONLY
        reads SPSS
        portable files
        . Therefore, you have to save the SPSS file from
        within SPSS
        as a .por file.

        READING FILES USING PROC CONVERT
        
        filename in 'spss-file-name';
        proc convert spss=in out=temp;
        run;
        
        
        libname purple "c:\temp\";
        filename file1 "c:\datatemp\myfile1.por";
        filename file2 "c:\datatemp\myfile2.por";
        proc convert spss=file1 out=purple.new1; run;
        proc convert spss=file2 out=purple.new2; run;
        
        
        READING FILES USING ENGINES
        
        libname spssfile spss 'spss-file-name';
        proc copy in=spssfile out=work;
        run;
        
        



    6. Advanced programming example
    7. Advanced Topics on SAS Data Step
      Programing: What are
      your options?

     

     
     
     

    Where else can I go for more information?

     

     

     


      2004-8-3  VDC: mailboxWWWSTATS@uic.edu


      UIC Home Page Search UIC Pages Contact UIC