Introduction to SAS PROCedures

stats image
Introduction to SAS PROCedures

 

last updated: 08SEP03 …and ‘in progress’

line

 

 
 
 

 

 
 
 

Lets PROCeed…

 

    As you know, SAS has two steps: DATA and PROC’s. You are probably
    by now familiar with the DATA step which is discussed in detail in our
    Introductory SAS seminars.

    PROCedures are used to generate output. Whether is graphical, printed, or
    computational. The list of PROCedures is extensive. I include here a
    brief list. More than attempting to document syntax, which can be easily
    found in the SAS Online docs after you know what you are looking for, I am
    trying to list those PROCs which are most useful for those statistical
    analyses you crave for. I think the hard part is knowing what one needs
    to look given a statistical analysis design. For details on any of the
    PROCedures below please refer to the SAS ONLINEDOCS on your installation
    of SAS, or from any of the sites listed next.

     

     
     
     

    More information on how to use SAS PROCedures

     

     

     
     
     

    Automation of libraries

     

      We are going to build on our basic knowledge of
      macros and use it when writing our procedures.

      The following macro variable allows for multiple lines of code to run
      soothly if we need to change directories.

       %let lib=fall00;
      

      Now we can write our code with the reference data=&lib.. in our procedures. By just
      changing the value of this macro variable, which we can write on the top
      of our SAS program file, we
      alter the name of the directory location, or library, we are using for
      all the procedures we want to simultaneously run.

     

     
     
     

    Options and Formats

     

      PROC GDEVICE

      • Options for graphics device driver parameters.

      PROC GOPTIONS

      • Lists the values of graphic options and global statement definitions
        currently in effect for your session.

      PROC OPTIONS

      • Lists in the LOG window the values of system options for all
        system variables.

      PROC FORMAT

      • Defines and prints formats and informats which are used in other DATA
        or PROC steps.
      • Syntax
        proc format; 
        value $gender 'F'='1' 'M'='2';
        value  likert 1='Str Disagree'
                      2='Disagree'
                      3='Neutral'
                      4='Agree'
                      5='Str Agree'
                      9='NA';
        run;

     

     
     
     

    Data Management

     

      PROC APPEND

        This procedure equates to running a DATA step with SET statements to
        append SAS datasets but it
        is more efficient if processing time is of the matter.

        PROC APPEND prevents you from combining data across files where the
        common variable have different lenghts, and deleter unwanted variables
        from the APPENDed file. Something to note which can be a problem is that
        with the APPEND procedure you also add the ‘main’ file each time records
        are added. As a result, you need to keep track of not appending the same
        ‘main’ data more than once.

        The FORCE option, which is optional, must be used if variable attributes
        such as length do not
        match between the ‘main’ and the new ‘appended’ file, or when variables
        are added to the ‘appended’ file. You might want to use FORCE in the
        beginning and note problems in appending the data from the LOG window.
        This will flag problems or discrepancies that you might have not noticed
        in advanced.

        The format of this procedure is as follows:

        proc append base=maindata data=more1 [force]; run;

      PROC CATALOG

      • Executes changes in catalogs.

      PROC COMPARE

      • Compares contents of 2 SAS datasets.

      PROC CONTENTS

      • Describes the variable contents of a SAS Dataset.
      • Syntax
        proc contents data=&lib..myfile; run;

      • Sample Output.

      PROC COMPARE

      • Compares contents of 2 SAS datasets.

      PROC FORMS

      • Prints data in rectangular from (mailing lists).

      PROC RANK

      • Generates ranks for one or more numeric variables.

      PROC DATASETS

      • Executes changes to SAS datasets.

      PROC DISPLAY

      • Executes an AF entry.

      PROC SORT

      • Sorts observations in a SAS dataset.

      PROC STANDARD

      • Generates standardized values of given variables to given mean and
        standard deviation.

      PROC SPELL

      • Checks the spelling of an external file.

      PROC TRANSPOSE

      • Transposes datasets; converts observations to variables
        and variables to observations.

        <!–

        proc transpose data=myfile1 out=myfile2 name=newvarwithcolumnnames 
        prefix=varn;
        by region ;
        id pstatus;
        var inf fas;
        run;
        

        –>

       

       
       
       

      Basic Data Analysis

       

        PROC CORR

        • Computes correlation coefficients.

        PROC FREQ

        • Print tables of frequencies, computes Chi-squared goodness-of-t tests
          for one-way tables.
        • Resource Link: Frequency
          Tables

        PROC MEANS

        • Compute descriptive statistics for numeric variables.
        • Equivalent to the SUMMARY procedure with the PRINT option.
        • Syntax:
          PROC MEANS DATA=mydata  <options> ; 
           VAR <variables>
          RUN;
        • Options

            N
            NMISS
            STD
            STDERR
            CLS
            LCLM
            UCLM
            MIN
            MAX
            SUM
            VAR
            CV
            SKEWNESS
            KURTOSIS
            T
            PRT
            MAXDEC=n

        PROC SUMMARY

        • Computes descriptive statistics and frequencies [similar to MEANS].

        PROC TABULATE

        • Generates hierarchical tables of descriptive statistics and
          frequencies.

        PROC TTEST

        • Computes T-statistic and group mean hypothesis testing.

        PROC UNIVARIATE

        • Computes univariate statistics. This includes tests of normality,
          stem-leaf plots, box plots.
        • Syntax:
          PROC UNIVARIATE DATA=mydata <options> ; 
           VAR <variables>
           ID <keyvariable>
          RUN;

       

       
       
       

      Advanced Data Analysis

       

        PROC ANOVA

        • Analysis of variance models.

        PROC CALIS

        • Linear structural equation modeling using covariance structure
          analysis.

        PROC CANCORR

        • Canonical correlation analysis.

        PROC CATMOD

        • Categorical data models and fits linear models to response frequency
          models.

        PROC GENMOD

        • Generalized linear models.

        PROC GLM

        • Random-effects and repeated-measures models. This includes analysis of
          main effects, interactions, nested effects and contrasts.
        • General
          Linear Models
        • Syntax

        PROC LATTICE

        • special balanced lattice designs
        • Features available in PROC MIXED.

        PROC LOGISTIC

        • Categorical data modelling: Logs odd ratios.

        PROC MIXED

        PROC NESTED

        • Analysis of variance and covariance for special nested designs,
          and customized algorithms.
        • Features available in PROC MIXED.

        PROC NLMIXED

        • Handles models in which the fixed or random effects enter nonlinearly.

        PROC NPAR1WAY

        • Analysis of variance on ranks.

        PROC PRINQUAL

        • Linear and nonlinear fits optimizing covariance or correlation matrix
          of transformed variables.

        PROC PROBIT

        • Probit analysis. Used by economist instead of LOGISTIC. Computes
          maximum-likelihood estimates of regression and threshold parameters for
          binomial and multinomial data and other discrete event data.

        PROC REG

        • Linear regression analysis.

        PROC VARCOMP

        • Estimates variance components models.
        • Features available in PROC MIXED.

       

       
       
       

      Factor and Principal Components, and Discriminant Analysis

       

        PROC CANDISC

        • Canonical discriminant analysis.

        PROC DISCRIM

        • Discriminant analysis.

        PROC FACTOR

        • Factor and component analysis.

        PROC PRINCOMP

        • Principal component analysis.

        PROC STEPDISC

        • Stepwise discriminant analysis.

       

       
       
       

      Clustering

       

        PROC ACECLUS

        • Approximate estimates of the pooled within-cluster covariance matrix
          for clusters assumed to be multivariate normal with equal
          covariance matrices.

        PROC CLUSTER

        • Hierarchical clustering.

        PROC FASTCLUST

        • Disjoint clustering of large datasets.

        PROC MODECLUS

        • Nonparametric density estimates clustering.

        PROC TREE

        • Generates tree diagram created by CLUSTER or VARCLUS.

        PROC VARCLUS

        • Correlation or covariance matrix clustering.

       

       
       
       

      Survey and Survival Data Analysis

       

        PROC LIFEREG

        PROC LIFETEST

        • Compute and plot the estimate of the distribution of the survival
          time.

        PROC PHREG

        • Fit the Cox regression model

        PROC SURVEYSELECT

        • provides a variety of methods for selecting probability-based random
          samples.

        PROC SURVEYMEANS

        • Estimates population totals, means, and ratios (SAS 8.2 and later),
          with estimates of their variances, confidence limits, and other
          descriptive statistics, under sample designs that may
          include stratification, clustering, and unequal weighting.

        PROC SURVEYREG

        • Estimates regression coefficients by generalized least squares, using
          elementwise regression, assuming that the regression coefficients are the
          same across strata and PSUs.

       

       
       
       

      Printing and Reporting

       

        PROC CALENDAR

        • Prints a SAS dataset in the form of a calendar.

        PROC PRINT

        • prints to the output window
        • Example:
          proc print data=mydir.myfile; run;

        PROC PRINTTO

        • redirects the log or standard print file.

        PROC REPORT

        • The REPORT procedure “combiness features of the PRINT, MEANS, and
          TABULATE procedures with features of the DATA step in a single
          report-writing tool that can produce a variety of reports”.
        • Example:
          options pageno=1;
          proc report data=&lib..byprint2 nowindows headline headskip missing nocenter ; 
          column  printer dept amount nuser,(n pctn) nover,(n pctn) ;
            define dept / 'Dept' group width=35 ;
            define printer /'Printer' group;
            define amount / 'Amount' sum format=dollar9.2;
            define nuser / 'N user' sum ; 
            define nover / 'N Over' n sum;
            define pctn / '% ' format=percent8.2;
                break after printer / skip ol summarize; 
                compute after ; dept= 'Totals:' ;endcomp;
                rbreak after / ol summarize ;
          run;

       

       
       
       

      Graphics

       

        PROC CHART

        • Generates text character charts.
        • Syntax:
          PROC CHART DATA=dataset options;
          		(options = )
          BY varname;
          VBAR varname / options; *vbar, and hbar, pie, all have
          HBAR varname; the same options;
          	(options = discrete
                             levels= # (number of declared midpoints)
                             midpoints = list
                             subgroup = variable
                             sumvar = variable
                             type = freq or pct
          RUN;
          
          

        PROC GPLOT

        • Generates two-dimensional graphs.
        • Module: SAS/GRAPH

        PROC G3D

        • Generates three-dimensional graphs
        • High resolution
        • Module: SAS/GRAPH

        PROC G3GRID

        • High resolution

        PROC GCHART

        • Generates vertical and horizontal histograms,
          block charts, pie and donut charts, and star charts.
        • High resolution
        • Module: SAS/GRAPH

        PROC GCONTOUR

        • Contour plots.
        • High resolution

        PROC GMAP

        • Two-dimensional (choropleth) and three-dimensional (surface,
          block, and prism) color maps.
        • High resolution
        • Module: SAS/GRAPH

        PROC GREPLAY

        • High resolution

        PROC PLOT

        • Produces scatter graphs using text characters.
        • Low resolution
        • Module: SAS/BASE

       

       
       
       

      SQL

       

        PROC SQL

        • Implements the Structured Query Language (SQL) for the SAS System.

       

       
       
       

      PROCedures for Converting to/form SAS files.

       

        PROC EXPORT

        • Copies SAS datasets to DLM, TAB, & CSV formats.
      • The handout
        Converting to/form SAS files
        covers the following procedures:

        PROC CPORT -creates transport files.
        PROC COPY -creates copies of SAS datasets.
        PROC CIMPORT -used to convert a transport file to a SAS dataset.
        PROC ACCESS
        PROC CONVERT -used to import BMPD, OSIRIS and SPSS.
        PROC DBLOAD
        PROC DBF 
        PROC DIF 
        PROC DATASETS -executes changes to SAS datasets.