Monday, March 19, 2012

Processing External Files with PROC STREAM

In the examples posted in previous blog entries (A Gentle Introduction to SAS Server Pages and PROC STREAM: Extending the Macro Language to create more than just SAS code), the input text being processed was included in the SAS job stream and was delimited by:

• the token BEGIN (case insensitive)
• four semicolons with no intervening spaces starting in column 1 (;;;;)

PROC STREAM utilizes the SAS word-scanner and tokenization facilities to resolve and execute macro variable references, macro functions and macro calls for all the text delimited by BEGIN .... ;;;; and directs the output to a specified external file:

proc stream outfile= ... ;
BEGIN
/* Input text to be processed */
;;;;
run;

Quite often the text that you will want PROC STREAM to process is contained in an external file (what I would refer to as a SAS Server Page - text, e.g., HTML, along with commands interpreted by SAS to generate additional data-driven content). So the question becomes how to do that since there in no infile option. And the answer is to use a %INCLUDE statement. PROC STREAM recognizes the %INCLUDE statement and will use the contents of that file as its input. So our PROC STREAM statement looks like this:

proc stream outfile= ... ;
BEGIN
/* Include text file to be processed */
&streamDelim; %include fileref-or-path-to-file;
;;;;

Note that fileref-or-path-to-file can reference a fileref, a physical path or use SAS aggregate syntax. I prefer to use aggregate syntax as typically my SAS Server Pages are organized in one or more directories (and you can define your fileref so it points to concatenated directories).

The content of the file being included need not be SAS code, it can be any text (e.g., HTML, XML, CSV, SAS code, and more) - whatever text it contains will be processed by the SAS tokenizer.

But now you ask, what it &streamDelim; and why is it there? It is a delimiter that is needed because certain SAS statements must appear on statement boundaries (i.e., they must be the very first statement in your program or they must immediately follow a semi-colon). So in order for PROC STREAM to recognize the %INCLUDE, it needs to follow a semicolon. But since you typically don't want the ; in the output file, we need to have a way to tell PROC STREAM to ignore it - thus &streamDelim. There are a number of other things you can do with &streamDelim - and I'll have examples and blog postings between now and SAS Global Forum.

NOTE: In the SAS TS1M2 release, PROC STREAM will create the &streamDelim macro variable if it does not already exist. Until then, you can slightly modify the syntax above as follows to create a value for streamDelim that is a valid SAS name token but whose text value is not in your input SAS Server Page, for example:

%let streamDelim = __&sysfunc(datetime(),z18.)
proc stream . . . . resetDelim = "&streamDelim";
/* Include text file to be processed */
BEGIN
&streamDelim; %include srvrpgs(HelloWorld.html);
;;;;

Input HTML (and other) files may have an additional wrinkle - named HTML Entities. For example, the HelloWorld.html file contains ® for the registered trademark symbol in the text:

. . . input SAS® Server Page . . .

When this text is processed we will likely get a warning, or depending on the context, an error message for &reg since the SAS tokenizer will interpret it as a macro variable reference. However, the comparable numeric HTML Entity (®) for the registered trademark symbol does not have this problem since the SAS tokenizer does not see #174 as a macro variable reference. In order to avoid editing the input files to convert named to numeric HTML Entities, we can let the SAS tokenizer do the work for us. Since the content of the input file is being tokenized and macro variable references are replaced with their values, including a series of %let statements like the following before invoking PROC STREAM for the standard HTML entities

%let reg = ®

will allow the tokenizer to do the substitutions for us: &reg will be replaced by &#174 and so ® will be resolved to ®.

You can see both %include and this substitution in action on my server using:

• the SAS/IntrNet Application Dispatcher
• the Stored Process Server

both of which use the same code and the same input SAS Server Page (our Hello World example).

I will have more examples in future blog entries (and, of course, in the book) that take advantage of %INCLUDE, including input SAS Server Pages that have %INCLUDE statements.