Introduction to SAS PROCedures: Data Analysis

stats image
Introduction to SAS PROCedures: Data Analysis

 

last updated: 13SEP03

line

 

 
 
 

 

 
 
 

Lets PROCeed… the Saga!

 

    As you know, SAS has two steps: DATA and PROC’s. You are probably
    by now familiar with the DATA step which is discussed in detail in our
    Introductory SAS seminars. You also should
    refer to Introduction to SAS PROCedures for a brief
    listing of the most common procedures used for statistical data analysis.

    »

    NOTE: These examples assume that you are using temporary data files. For
    permanent data files add the library name to the file names. Also, the
    appropriate <options> have been written in to get the
    desired results. Other options are not mentioned here. For convenience,
    SAS syntax is in capital letters, and user input in small letters. In some
    cases the DATA= options has been ommitted assuming that the
    PROC is being
    run on the last file opened.

     

     
     
     

    Data sets to practice with

     

     

     
     
     

    Data description and simple inference

     

    1. Examine variable distributions and tests of normality

        PROC UNIVARIATE DATA=mydata NORMAL PLOT ;
         VAR v1 v2;
        RUN;
        
        Options:
        
         	
      • NORMAL - runs test of normality
      • PLOT - generates stem-leaf, box, and normal probability plots.
    2. Histograms and Scatterplots
      • Histogram

        PROC GCHART;
         VBAR v1 v2;
        RUN;
        
        
      • Scatterplot

        PROC GPLOT;
         VBAR v1*v2;
        RUN;
        
        
    3. Correlations

        PROC CORR DATA=mydata ;
         VAR v1 v2;
        RUN;
        
        
    4. Subgroups: Overlay of plots
      • Includes setup of 2 symbols to be used

        SYMBOL1 V=dot; SUMBOL2 V=triangle;
        
        PROC GPLOT;
         VBAR v1*v2=v3;
        RUN;
        
        Where v3 is a categorical variable. 
        
    5. Subgroups: Separate Analyses

        PROC SORT DATA=mydata; BY v3; RUN;
        
        PROC CORR DATA=mydata ;
         VAR v1 v2; BY v3;
        RUN;
        
        
        

       

       
       
       

      Multiple Regression

       

      • Create new output file with standardized residuals, and Cooks
        distances.

          PROC REG DATA=mydata ;
           MODEL y1=x1--x10/SELECTION=STEPWISE;
           OUTPUT OUT=mydatanew PREDICTED=rhat STUDENT=residstd COOK=cookdist;
          RUN;
          
          
      • Examination of Residuals

          PROC UNIVARIATE DATA=mydatanew NORMAL PLOT;
           VAR residstd;
          RUN;
          
          
      • Plot residuals vs. model included variables.

          PROC GPLOT DATA=mydatanew ;
           PLOT residstd * (x1 x2 x3)
           PLOT residstd * rhat;
          RUN;
          
          
           	
        • where (x1 x2 x3) are the variables kept in the regression model.
      • Plot Cook distances: first create a variable with the observation
        number to be used for the plot.

          DATA plotcook; 
            SET mydatanew;
            obsnum=_N_ ; 
          RUN;
          
          SYMBOL1=needle;
          PROC GPLOT;
           PLOT cookdist * obsnum;
          RUN;
          
          
          
          

       

       
       
       

      ANOVA- balanced design

       

      • Includes interaction effects and multiple comparisons of means.

          PROC ANOVA DATA=mydata ;
           CLASS x1 x2;
           MODEL y1= x1 x2 x1*x2;
           MEANS x1 x2/SCHEFFE;
          RUN;
          
      • Side by side box plots of differences between means of factors by
        their level

          PROC SORT DATA=mydata; BY x1; RUN;
          
          PROC UNIVARIATE data=mydata PLOT;
           VAR y1;
           BY x1;
          RUN;
          
          PROC UNIVARIATE data=mydata PLOT;
           VAR y1;
           BY x2;
          RUN;
          
          

       

       
       
       

      ANOVA- UNbalanced design

       

      • Order of categorical variable changed to get desired Type I SS.

          PROC GLM;
           CLASS x1 x2;
           MODEL y1= x1 x2 x1*x2;
          RUN;
          
          PROC GLM;
           CLASS x1 x2;
           MODEL y1= x2 x1 x1*x2;
          RUN;
          
          

       

       
       
       

      Repeated Measures

       

      • Syntax:

        PROC MEANS DATA=mydata  <options> ; 
         VAR <variables>
        RUN;

         

         
         
         

        MANOVA

         

        • Syntax:

          PROC MEANS DATA=mydata  <options> ; 
           VAR <variables>
          RUN;

           

           


            2003-9-14  VDC: mailboxWWWSTATS@uic.edu


            UIC Home Page Search UIC Pages Contact UIC