Monday, February 11, 2013

DOSUB and DOSUBL - Data Driven Development

I have always been a fan of data driven applications where data (including paramater files) drives or defines the code to be executed. The DOSUB and DOSUBL functions, experimental in SAS 9.3, are a great addition to the toolset available in SAS to build data driven applications.  In my (upcoming) ebook (SAS® Server Pages: Generating Dynamic Content), there are lots of examples that use these functions so SAS code can be executed from a SAS Server Page. Both functions have a single character argument:
  1. The argument to DOSUB is a fileref that point to the code to be executed.
  2. The argument to DOSUBL is the line (or lines) of code to be executed.
What I would like to describe here is how they can be used for data driven development. In Chapter 4 of my book there is an example of a mail-merge application. This is a very simple example of data driven development: for each observation in a SAS data set run some code to create a letter or a report. The logic is fairly straightforward:
  1. Determine how many observations there are in the input data set.
  2. Use a macro to loop from 1 to the number of observations and do the following in each iteration
    1. Read the ith observation
    2. Load the values of the needed variables into macro variables
    3. Invoke PROC STREAM to process a SAS Server Page that references the macro variables in the text of the letter
In other words, we use macro to run some code for each observation in a SAS data set The input SAS Server Page and one sample generated letter are shown below in Figures 1 and 2.
Figure 1. Input SAS Server Page
Figure 2. Generated Letter for John
What DOSUB and DOSUBL allow us to do is is to invert this process. For each observation in an input SAS data set, we run some code. So the DATA step become the driver instead of a Macro Language DO loop. The following program demonstrates this approach using the DOSUBL function.

proc format;
 /* map the value of sex to daughter/son */
 value $gender 'F' = 'daughter'
               'M' = 'son'
;
run;
data _null_;
 set sashelp.class;
 /* associate formats with sex and age */
 format sex $gender. age words.;
 /* create macro vars from the data step vars */
 /* vvalue uses the formatted value */
 call symputx('name',vvalue(name));
 call symputx('height',vvalue(height));
 call symputx('weight',vvalue(weight));
 call symputx('sex',vvalue(sex));
 call symputx('age',vvalue(age));
 /* define the code to run for each observation */
 code = 'filename letter "&root\letters\&name..html" '
      ||'lrecl=32767; '
      ||'proc stream outfile=letter quoting=both; '
      ||'begin '
      ||'&streamdelim; %include srvrpgs(class.html); '
      ||';;;;'
      ;
 /* run the code */
 rc = dosubl(code);
run;

Each execution of the  DATA step invokes PROC STREAM to generated the desired letter.

CALL EXECUTE vs DOSUB and DOSUBL

Like CALL EXECUTE, the DOSUB and DOSUBL functions allow you to generate code to be executed. However, unlike CALL EXECUTE, both DOSUB and DOSUBL execute the code the immediately while code passed to CALL EXECUTE is executed after the DATA step completes. In this example, if CALL EXECUTE had been used, all the generated letters would have used the values of the macro variables from the last observation in our input data set since each execution of the DATA step overwrites the macro variables. Since code passed to the DOSUB/DOSUBL functions is executed immediately, the values of the macro variables in our letter resolve the values from the current observation.

A Best Practice for the DOSUBL Argument

The length of the code stream passed to the DOSUBL routine can not exceed 32,767 characters. For the example included above, the code was included inline to simplify reviewing the example. As a Best Practice, code to be executed should not be included inline. Instead the code can be packaged as a macro and the macro call would be the argument to the DOSUBL routine. Alternatively it could be stored in an external file that is pointed to by a fileref and that fileref would be the argument to the DOSUB routine.

This Best Practice reinforces the paradigm shift when using DOSUB/DOSUBL routines. Instead of using the Macro Language to loop and execute mutiple DATA and/or PROC steps, we can now have the DATA step do the looping and execute a macro in each loop/iteration.