Sunday, April 3, 2011

MISSING STATEMENT

You are requested to generate a report on all subjects in a study that had a missing value for their birth dates. So you write a DATA step to subset the data:

data MissingBirth;
set Raw;
if birthdate = . then output;
run;


This step is simple and seemingly effective code. However, the person who created Raw used special missing character to account for the different missing (e.g. .A for adopted and no original record, .Q for interviewer forgot to ask, etc.). So you still have missing birthdates in your output data. Rewriting the IF statement as

if birthdate <= .Z then output;

would solve this problem. Another research site sends you a new set of Raw data and you run your subsetting program. Of course, it doesn’t work. This time the researchers entered the dates as text strings. So you could now write the IF statement as

if birthdate = ' ' then output;

But you have no way of knowing what kind of data you will get in the future. To cover all bets, you could write the IF statement as

if missing(birthdate) then output;

No comments: