Thursday, March 31, 2011

Comparisons among infile options (missover truncover stopover scanover)

TRUNCOVER reads file when the last variable has varying lengths.

FLOWOVER causes an INPUT statement to continue to read the next input data record if it does not find values in the current input line for all the variables in the statement, which is default.

STOPOVER causes the DATA step to stop if an INPUT statement reaches the end of the current record without find values for all variables in the statement.

SCANOVER causes the INPUT statement to scan the input data records until the character string that is specified in the @’character-string’ expression is found.

MISSOVER prevents an INPUT statement from reading a new input data record if it does not find values in the current input line for all the variables in the statement. When an INPUT statement reaches the end of the current input data record, variables without any values assigned are set to missing.

SORT and BY

When you use BY-group processing (without options) in the DATA step, the data set has to be either sorted by the BY variables or indexed by the BY variables.
BY statement is usually coupled with SET, MERGE, UPDATE or MODIFY.
However, this rule has exception when you use NOTSORTED or DESCENDING (?) option with SET statement (these options can't be used with MERGE, UPDATE or MODIFY).
Note: if you use NOTSORTED or DESCENDING option, the index will not be used with BY statement.
You can still use FIRST. and LAST. even when you use NOTSORTED option.

SYMPUT and SYMPUTX

When we use CALL SYMPUT to create a macro variable, please be noted it is
assigned to the MOST LOCAL, NON-EMPTY symbol table.

Call Symput:

Use CALL SYMPUT is you need to assign a data step value to a macro variable.

Syntax: Call Symput (“Macro variable”, character value)

The first argument to the Symput routine is the name of the macro variable to be assigned to the value from the second argument.

The second argument is the character value that will be assigned to the macro variable. The second argument need to be always a character value or if a numeric value is to be used it should convert first into character variable before assigning it to macro variable. It may lead to problems, if you don’t do the conversion from numeric to character. In this case, SAS automatically converts numeric value of the variable to character value before assigning it to macro variable and prints a message in the LOG saying that conversion happened.

See the example:

data _null_;
count=1978;
call symput('count',count);
run;
%put &count;

19 data _null_;
20 count=1978;
21 call symput('count',count);
22 run;

NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column).
21:21
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds


23 %put &count;
1978

Even though macro variable count is resolved to the value i.e 1978, SAS printed a note saying that NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column). 21:21

To avoid that… you should do the conversion from numeric to character before assigning it to macro variable. Here is the syntax of that:

data _null_;
count=1978;
call symput('count',strip(put(count,8.)));
run;
%put &count;

29 data _null_;
30 count=1978;
31 call symput('count',left(put(count,8.)));
32 run;

NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds


33 %put &count;
1978

Note:

1) Even though we have created macro variable using the CALL SYMPUT but the same macro variable cannot be used in the same data step. The reason behind this is, macro code is compiled and executed before the data step code compiles and executes. So macro variable created by the CALL SYMPUT cannot available for the data step because macro variable compilation time occurs after or in the middle of the execution of the Data Step code.
(If a case arises where you have to access the same macro variable inside the data step, you can certainly do... by using two diff.. macro functions called RESOLVE or SYMGET)

2) SAS always aligns numeric values right and variable values get truncated as a part of this and to avoid that use the strip function to remove all the leading spaces as like in the above example.
3) If CALL SYMPUT is used outside the macro ( i.e open code) it creats global macro variable whereas it creats a local macro variable when it is used inside a macro.

CALL SYMPUTX:

SAS introduced CALL SYMPUTX in version 9 to address the pitfalls of CALL SYMPUT.

Advantages of CALL SYMPUTX over CALL SYMPUT include:

1) SYMPUTX automatically convert the numeric variables to character variables before assigning it to macro variable. (No need of manual conversion using PUT statement as given in the above example)

2) Call Symputx strips leading and trailing blanks. So no need of using STRIP or LEFT function to remove the leading spaces

Syntax: Call Symputx (“Macro Variable”, Character Value, Symbol Table)

First and second arguments are same as in CALL SYMPUT. The third argument (Symbol table) is optional and the valid value of it is G, L, and F. If we put G then the macro variable will be stored in the global symbol table, else if we specify L SAS will store the macro in the local symbol table, else if we don’t specify or specify F SAS follows the same rules as like for Call Symput.

Example:

data _null_;
count=1978;
call symputx('count',put(count,8.),’G’);
run;
%put &count;

29 data _null_;
30 count=1978;
31 call symputx('count',put(count,8.));
32 run;

NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds


33 %put &count;
1978

Note: If you use CALL SYMPUT instead of CALL SYMPUTX, the program executes in an identical manner, but a note is written to the SAS Log about conversion of numeric values to character values.


Here is the simple way to understand the diff between CALL SYMPUT AND CALL SYMPUTX:
Retrieved from: Using_the_SAS_V9_CALL_SYMPUTX_Routine from SAScommunity.org page:
Submitted by Michael A. Raithel.

The SAS V9 CALL SYMPUTX routine can save you keystrokes and lead to leaner, cleaner SAS programs.
Instead of using:
call symput('MACROVAR',trim(left(charvar)));

to load a SAS macro variable with a character string that might contain blanks, you could use SYMPUTX instead:
call symputX('MACROVAR',charvar);

SYSDATE and TODAY()

&sysdate is the date that the SAS session started on
today() is date based on the system clock.