A brief Introduction to the SAS System and language

The SAS System is a computer program for entering and analysing data. However, it is much more than just a program for 'doing' Analysis of Variance. SAS can be used for entering data, storing and manipulating data much like a spreadsheet program, using SAS/CALC. SAS can read in data from common spreadsheets and databases, output the results of statistical analyses in tables, graphs, in Rich Text Format (.rtf) [for input to Word], in Portable Document Format (.pdf) [for reading with Adobe Acrobat Reader] and in HTML format (for directly publishing on the 'Web') and much more.

The main use of SAS in this course is for statistical analyses. There are versions of SAS for most types of computers from mainframes to PC's. SAS for Windows is available on the Faculty Local Area Network (LAN). In this course the datasets used and the sizes of the models are relatively small and can easily be handled using Windows; the amount of computer RAM is not a limitation; the SAS jobs do not take long to run and the temporary work space required on the hard disk is small. However, for the analysis of research data if you have much data and/or large statistical models you may well find that you run out of space when using Windows. You can easily see this; SAS will print error mesages in the LOG file saying that there was not enough space for the analysis. This is often a constraint of the Windows Operating System (particularly 32-bit Windows). A much better operating system for using SAS is Linux (a Unix-like operating system). If you have serious statistical analyses you might well consider using Linux or a Unix OS.

Whether you use SAS under Windows or any other operating system it is useful to calculate the amount of RAM that you need for an analysis. There are formulae for PROC GLM and PROC MIXED; they depend upon the size of the model and the number of observations. In the relevant parts of this course we will see how much RAM is required for various models and we will refer to the chapters and sections of the SAS manuals which show how one can compute the RAM one needs.

To start the SAS program from the computer desktop is just the same as to invoke any other program, one 'double clicks' with the left mouse button on the SAS icon, shown here . Alternatively, (under Windows) one can invoke the Start Programs menu and navigate from there to start SAS.

This will probably open up a submenu, which contains a list of the various SAS programs and utilities; select the SAS prorgam (normally this will be SAS 9.2 (English).

Within SAS there are basically 3 'Windows', which you can select by clicking on the 'Window' pulldown menu :

  1. Program Editor.
    Where you can enter SAS program commands and data.
  2. Log.
    Where a 'LOG', or written record of what SAS has interpreted your commands to be, whether the procedure worked correctly and how long it took to run, as well as any other messages warning you of errors or possible problems.
  3. Output.
    Your results!

You should read the SAS Language and Procedures, Introduction, manual; there's lots of useful information about the SAS system. SAS provide all of their documentation on-line http://support.sas.com/documentation. I suggest that you make a habit of saving your program statements (from the Program Editor window), the log output (from the Log window) and the actual output results (from the Output window). If you only save the output (results) then often you will find that you are missing a record of any data manipulation steps, that you do not have an accurate record of just which files you read and that you are not sure of just what statistical model you actually fitted. The Log will tell you what you did, the Output will show you what you got, and the program statements from the Program Editor will allow you to easily re-run everything if you need to, or if you need to make only a small change!


1. Basic concepts

SAS is made up of DATA steps (for reading data into SAS) and PROCedures (for analyses, sorting, etc).

Usually the first thing that one does is to read data into a SAS data set using the SAS DATA step. One can either read the data into SAS from an external file (for example a file of data on diskette), or include the data directly 'in-line' with the actual SAS statements (using the CARDS command). We will use this method of including data 'in-line' because it means that you can easily see exactly what data is being used in any of my examples.

Let us suppose that we have collected data on a group of cows being fed diets with varying Energy Density and Acid Base Balance (Chlorine % and Potassium %). We are recording the milk yield and milk composition (Fat % and Protein %) for each cow. We have the following layout of our data :

 Cow Id    Energy Density    Chlorine %    Potassium %    Milk Yield    Fat %    Protein %  
   1      3.05      1.45      5.67      0.34      3.71      3.21  
   2      4.22      1.35      4.86      0.11      3.71      3.21  
   3      3.34      0.26      4.19      0.38      3.71      3.21  
   4      3.77      0.23      4.42      0.68      3.71      3.21  
   5      3.52      1.10      3.17      0.18      3.71      3.21  
   6      3.54      0.76      2.76      0.0      3.71      3.21  
   7      3.74      1.59      3.81      0.08      3.71      3.21  
   8      3.78      0.39      3.23      0.11      3.71      3.21  
   9      2.92      0.39      5.44      1.53      3.71      3.21  
   10      3.10      0.64      6.16      0.77      3.71      3.21  
   11      2.86      0.82      5.48      1.17      3.71      3.21  
   12      2.78      0.64      4.62      1.01      3.71      3.21  
   13      2.22      0.85      4.49      0.89      3.71      3.21  
   14      2.67      0.90      5.59      1.40      3.71      3.21  
   15      3.12      0.92      5.86      1.05      3.71      3.21  
   16      3.03      0.97      6.60      1.15      3.71      3.21  
   17      2.45      0.18      4.51      1.49      3.71      3.21  
   18      4.12      0.62      5.31      0.51      3.71      3.21  
   19      4.61      0.51      5.16      0.18      3.71      3.21  
   20      3.94      0.45      4.45      0.34      3.71      3.21  
   21      4.12      1.79      6.17      0.36      3.71      3.21  
   22      2.93      0.25      3.38      0.89      3.71      3.21  
   23      2.66      0.31      3.51      0.91      3.71      3.21  
   24      3.17      0.20      3.08      0.92      3.71      3.21  
   25      2.79      0.24      3.98      1.35      3.71      3.21  
   26      2.61      0.20      3.64      1.33      3.71      3.21  
   27      3.74      2.27      6.50      0.23      3.71      3.21  
   28      3.13      1.48      4.28      0.26      3.71      3.21  
   29      3.49      0.25      4.71      0.73      3.71      3.21  
   30      2.94      2.22      4.58      0.23      3.71      3.21  

So we have 1 line (record) per cow; each cow is a separate animal and observation. The 7 columns of data correspond to the 7 'variables' (Cow Id, ED, Cl%, K%, MY, F%, P%).

We will show the SAS statements and data layout just as they will be entered into the Program Editor window of SAS.

For each cow we have her milk yield as well as the composition of her milk (in terms of fat % and protein %). We can calculate the fat yield and protein yield of each cow as :

fat yield = milk yield * fat % / 100
protein yield = milk yield * protein % / 100

Rather than having to make this calculation by hand we can do this whilst we are inputting the data into SAS. It is quicker, less prone to errors and provides us with an audit trail so that we (you and your supervisor) can see exactly what was done; a most important point if you come to ask me questions during this course, or if you go and see your supervisor later with your research results.

a) Read data into a SAS data set. The SAS DATA step.


data cow1;          /* name of data set to create
                              in this data step */
input cowid  ed  cl  k  my  fpc  ppc;   /* variables */
fy = my * fpc / 100;           /* create new variable */
py = my * ppc / 100;
/* cards statement indicates that data follows according
   to the layout described in the input statement,
   column/variables must be seperated by at least 1 space */
cards;
1 3.05 1.45 5.67 0.34 3.71 3.21
2 4.22 1.35 4.86 0.11 3.71 3.21
3 3.34 0.26 4.19 0.38 3.71 3.21
4 3.77 0.23 4.42 0.68 3.71 3.21
5 3.52 1.10 3.17 0.18 3.71 3.21
6 3.54 0.76 2.76 0.0  3.71 3.21
7 3.74 1.59 3.81 0.08 3.71 3.21
8 3.78 0.39 3.23 0.11 3.71 3.21
9 2.92 0.39 5.44 1.53 3.71 3.21
10 3.10 0.64 6.16 0.77 3.71 3.21
11 2.86 0.82 5.48 1.17 3.71 3.21
12 2.78 0.64 4.62 1.01 3.71 3.21
13 2.22 0.85 4.49 0.89 3.71 3.21
14 2.67 0.90 5.59 1.40 3.71 3.21
15 3.12 0.92 5.86 1.05 3.71 3.21
16 3.03 0.97 6.60 1.15 3.71 3.21
17 2.45 0.18 4.51 1.49 3.71 3.21
18 4.12 0.62 5.31 0.51 3.71 3.21
19 4.61 0.51 5.16 0.18 3.71 3.21
20 3.94 0.45 4.45 0.34 3.71 3.21
21 4.12 1.79 6.17 0.36 3.71 3.21
22 2.93 0.25 3.38 0.89 3.71 3.21
23 2.66 0.31 3.51 0.91 3.71 3.21
24 3.17 0.20 3.08 0.92 3.71 3.21
25 2.79 0.24 3.98 1.35 3.71 3.21
26 2.61 0.20 3.64 1.33 3.71 3.21
27 3.74 2.27 6.50 0.23 3.71 3.21
28 3.13 1.48 4.28 0.26 3.71 3.21
29 3.49 0.25 4.71 0.73 3.71 3.21
30 2.94 2.22 4.58 0.23 3.71 3.21
; /* indicates end of data */

Text in bold are variable names that you (the user) can assign. They should not be more than 8 characters long.

In addition, in SAS you should not have SAS statement lines more than 80 columns wide on 1 line. If you have many input variables then go to a new line and continue your statements.

You could type all these SAS statements and lines of data into SAS directly in the Program Editor window. However, if you are like me you are almost certain to make some typing mistakes! I find it easier and more convenient to type all the SAS statements and data into a normal, regular ASCII text file with a normal text editor and then in SAS simply read in the file into the Program Editor.

See the SAS Introductory guide for more details on creating data sets and variables.

Importing SAS statements into the Program Editor

Suppose that you have typed the SAS statements and data into a text file (ASCII) and saved it on diskette in a file name example1.sas. I use a file extension of .sas for my SAS input files (of SAS statements and data), a file extension of .log when I save the log results of a SAS job and a file extension of .out (or .lst) when I save the output results of a SAS job.

Having entered the SAS statements and data into a file example1.sas how do we get them into the Program Editor window of SAS? When you invoke SAS you will normally see the initial layout as shown above. You will note that the Program Editor window is highlighted, that is to say that it is the 'active' window of the 3 SAS windows (Program Editor, Output and Log). Just the the left of the Program Editor is a small icon. If you 'click' once on it you will pull down a list; one of the options being Menu. selecting ('clicking' with the left mouse button) this opens up a further menu with File as a choice. Selecting this brings up a menu with Open; this in turn brings up the system Open Filelist command. This Open Filelist varies from operating system to operating system (Unix, OS/2, Windows 98, Windows 2003, Linux, etc). It is a function of the Graphical User Interface (GUI) and not SAS per se.

This is the means by which one can read in a prepared file into the Program editor window.

Note one very important point. When using SAS and the cards statement and the data immediately follow you should not have more than 80 columns (spaces) across the page. This is not a problem in my various examples, there should never be that many variables. However, if and when you are analysing data from your own research it is quite likely that you will have many variables, observation numbers, identifiers etc and you may well have data more than 80 columns wide. You cannot input this using the cards facility of SAS. You have to use the 'infile' command, see the SAS Language Guide for more details.

Why cannot we have more than 80 columns of SAS statements or data input to SAS with the cards command?

Well, because when SAS was first created the cards command meant just that : computer punch cards (80 columns wide)! This meaning has not changed.

This has now created a SAS data set (temporary) called cow1, with 9 variables and 30 records.

b) Print summary, simple statistics about a data set.


ods rtf 'c:\temp\fred1.rtf';

proc means data=cow1;
var my fy py ed cl k;
run;

ods rtf close;

proc gplot data=cow1;
 plot my*ed;
run;

The var(iable) keyword lets us tell SAS that we only want summary statistics about the specified variables (my, fy, py, ed, cl, and k). The ODS statement allows us to produce, as well as output on the screen (in the output window, a quite high quality Rich Text Format (.rtf) output, which can be incorporated into a Word document or other wordprocessing programme. The PROCedure PROC GPLOT allows us to produce very high quality plots (graphs). In this example we plot milk yield (my), our 'Y' variable, against Energy Density (ed), our 'X' variable.

c) Different statistical analysis procedures


ods pdf 'c:\temp\fred2.pdf';

proc glm data=cow1;
model my fy py = ed cl k/xpx i solution;
estimate 'Energy density' ed 1/e;
estimate 'Chlorine %' cl 1/e;
estimate 'Potassium' k 1/e;
run;

ods pdf close;

In the model statement of both PROC REG and PROC GLM we have 3 seperate dependent variables (my, fy and py). Specifying 3 dependent variables provides 3 analyses at one; rather than specifying the same model 3 times (once for my, once for fy and once for py). Note also that we are once again making use of the ODS output facility, only this time we are generating a Portable Document Format (.pdf) output file.

PROC REG allows analysis of regression problems, where the independent variables are continuous, quantitative traits.

PROC GLM allows analysis of regression and classification independent variables.

In the model statement of PROC GLM after the independent variables we have

/xpx i solution
The / seperates the independent variables of the model from xpx i solution, which are options, requesting the X'X matrix, the Inverse and the solution vector, all quite useful!

The estimate statement, after fitting the model, allows us to request SAS to compute, and write out, the estimate of the parameter associated with the particular factor, in this case the estimate of ed, cl and k. The coefficient 1 in the 3 estimate statements relates to the question of statistical estimability and the general matrix k' that is central to Linear Models and Estimable Functions.

SAS code for the above example


2. Additional techniques

If you are accessing these files using a Web browser of some sort running under a graphical operating system you may be able to 'cut-and-paste'. For example, if you access some of the files that contain only SAS data and code you can save these to a "notepad" or "clipboard" and then change to SAS and then "import" the saved text from the notepad or clipboard directly into the SAS Program Editor. This is possible using OS/2, Unix and X-windows, and Windows (Microsoft). Thus, for example, you could be looking at these notes, concurrently start up SAS, and then cut-and-paste the SAS data and code to the Program Editor, run (submit) the SAS statements and look at the output and then swap back to these notes to compare what you obtained with the comments, questions and suggestions in each of the various topics covered.

Note: a useful feature to know. If you are planning to print the output of the log and/or output (from the File, Print menu), then it can usually be sensible to set the page width and print length of SAS, since the default for SAS is often the old-style 120 columns paper width, not the more common 80 columns for regular portrait orientation letter paper.

So. Invoke SAS. From the Tools menu select Options, then System, and then log and procedure output control. This will bring up another submenu with one of the choices being SAS log and procedures control. Double-click on that; which will bring up a further submenu which will have the options for page/line width and length. Double-click on linesize, the linesize (lenght) will be shown (it may be 96 or 110 or 120), change it to 80 and then click on OK, which will bringyou back to the submenu. Double click on Page size (number of lines of SAS output to print per page), ad change it to 49. Click on OK, and then select OK again from the main sub-menu, to set the system output options for you SS session.

R. I. Cue ©
Department of Animal Science, McGill University
last updated : 2010 May 6