Introduction to SAS PROCedures: Data Analysis

last updated: 13SEP03

line

»

NOTE: These examples assume that you are using temporary data files. For
permanent data files add the library name to the file names. Also, the
appropriate <options> have been written in to get the
desired results. Other options are not mentioned here. For convenience,
SAS syntax is in capital letters, and user input in small letters. In some
cases the DATA= options has been ommitted assuming that the
PROC is being
run on the last file opened.

Data sets to practice with

Datasets from the book:
A Handbook of Small Data Sets edited by D.J. Hand, et al.
STATS@uic.edu:
Statistics, Data Resources, and Advanced topics
STATS@uic.edu:
SAS II Seminar– Looking ahead and Examples

Data description and simple inference

Examine variable distributions and tests of normality

PROC UNIVARIATE DATA=mydata NORMAL PLOT ;
 VAR v1 v2;
RUN;

Options:

 	NORMAL - runs test of normality

 	PLOT - generates stem-leaf, box, and normal probability plots.

Histograms and Scatterplots

Histogram
```
PROC GCHART;
 VBAR v1 v2;
RUN;
```

Scatterplot
```
PROC GPLOT;
 VBAR v1*v2;
RUN;
```

Correlations

PROC CORR DATA=mydata ;
 VAR v1 v2;
RUN;

Subgroups: Overlay of plots

Includes setup of 2 symbols to be used

SYMBOL1 V=dot; SUMBOL2 V=triangle;

PROC GPLOT;
 VBAR v1*v2=v3;
RUN;

Where v3 is a categorical variable.

Subgroups: Separate Analyses

PROC SORT DATA=mydata; BY v3; RUN;

PROC CORR DATA=mydata ;
 VAR v1 v2; BY v3;
RUN;

Multiple Regression

Create new output file with standardized residuals, and Cooks
distances.

PROC REG DATA=mydata ;
 MODEL y1=x1--x10/SELECTION=STEPWISE;
 OUTPUT OUT=mydatanew PREDICTED=rhat STUDENT=residstd COOK=cookdist;
RUN;

Examination of Residuals

PROC UNIVARIATE DATA=mydatanew NORMAL PLOT;
 VAR residstd;
RUN;

Plot residuals vs. model included variables.

PROC GPLOT DATA=mydatanew ;
 PLOT residstd * (x1 x2 x3)
 PLOT residstd * rhat;
RUN;


 	where (x1 x2 x3) are the variables kept in the regression model.

Plot Cook distances: first create a variable with the observation
number to be used for the plot.

DATA plotcook; 
  SET mydatanew;
  obsnum=_N_ ; 
RUN;

SYMBOL1=needle;
PROC GPLOT;
 PLOT cookdist * obsnum;
RUN;

ANOVA- balanced design

Includes interaction effects and multiple comparisons of means.

PROC ANOVA DATA=mydata ;
 CLASS x1 x2;
 MODEL y1= x1 x2 x1*x2;
 MEANS x1 x2/SCHEFFE;
RUN;

Side by side box plots of differences between means of factors by
their level

PROC SORT DATA=mydata; BY x1; RUN;

PROC UNIVARIATE data=mydata PLOT;
 VAR y1;
 BY x1;
RUN;

PROC UNIVARIATE data=mydata PLOT;
 VAR y1;
 BY x2;
RUN;

ANOVA- UNbalanced design

Order of categorical variable changed to get desired Type I SS.

PROC GLM;
 CLASS x1 x2;
 MODEL y1= x1 x2 x1*x2;
RUN;

PROC GLM;
 CLASS x1 x2;
 MODEL y1= x2 x1 x1*x2;
RUN;

Repeated Measures

Syntax:

PROC MEANS DATA=mydata  <options> ; 
 VAR <variables>
RUN;

MANOVA

Syntax:

PROC MEANS DATA=mydata  <options> ; 
 VAR <variables>
RUN;

2003-9-14 VDC:

WWWSTATS@uic.edu

PurpuraInk Consulting- Chicago, IL & San Juan, PR

From your desktop to ours… linking technology and research.