I have spent a significant part of my career as a data visualization specialist. I am very picky about formatting and reporting. Then, when i started using SAS, faced some challenges when changing number and character formats, especially when it came to dates. Not surprisingly, both Kunal and I receive a good number of inquiries on this topic..
And SAS, there are several options to improve report layouts. In this article, we will particularly discuss about the methods to play with the format of the data values. It should be noted that these changes are only applied while the results are displayed. Changing the output format does not change the shape, data is stored on the back end.
By default, SAS provides several built-in formats to handle various formats, but they are not enough to meet the personalized requirements that your data may have. For instance, we may have coded Male and Female as M and F (O 0 Y 1), but while printing we would like to show the field as MALE and FEMALE only. Another common example is displaying area codes on telephone numbers from 10 digits (for instance, 123-3456-789).
PD: This is a long article compared to what I usually write, so feel free to digest it to pieces.
The article is divided into 2 general parts: First, we understand predefined formats and user-defined formats in SAS. Then, we analyze various applications and examples of these concepts.
Integrated SAS formats
Then, sample data is displayed containing details on agent performance. We will use this dataset in examples and discussion in this article. Here DOJ is in numeric format, as defined by SAS by default, from 1 of January of 1960.
Let's first look at the syntax of the FORMAT statement: –
Syntax: – FORMAT Variable format (s);
Let's do some exercises with the format declaration:
Example 1: Show DOJ in DDMMYYYY format:
Code:
Process Print Data = Sales;
Format DOJ Date 9. ;
To run;
Production:
Here is the list of predefined date formats available to change the output format of variables:
Example 2: display dollar sales amount with one decimal place
Code:
Process Print Data = Sales;
Salesamount Dollar9.1 format;
To run;
Production:
List of predefined date formats available to change the output format of variables:
Example 3: Display Salesamount in dollars with two decimal places with commas and DOJ in Weekdate format.
Code:
Process Print Data = Sales;
Format Sales Amount Dollar12.2 DOJ WEEKDATE .;
To run;
Production:
Need for user-defined formats:
So far we have seen how to change the format of numbers and dates with built-in SAS formats. But there can be many times when SAS's built-in formats don't meet our needs.. As in the current dataset, we want:
- Shows “MASCULINE” Y “FEMININE” instead of “M” Y “F”
- Redefine categories A, B, C, D and E as Ultra, Super, Average, Low y Poor
- Shows the Salesamount frequency in three categories ” = 12000″
All of these things can be done with data passing (o PROC SQL) with the If-Else statement (or change case), but it will require a new variable that will store these formatted values. While, if we only want to change the display (and not the values in the dataset), so creating a user defined format using PROC FORMAT is a more efficient way to make these changes.
User-defined formats using Proc FORMAT
PROC FORMAT is a procedure that creates a mapping of data values to data labels. User-defined FORMAT mapping is independent of a SAS DATA SET and variables and must be explicitly assigned in a DATASTEP and / the subsequent PROC.
Syntax of PROC FORMAT: –
Rules for defining FORMAT NAME:
- For character values, the first character must be a dollar sign ($) and a letter or underscore as the second character. For numerical values, the name must have a letter or an underscore as the first character
- The name cannot end with a number
- Cannot be the name of an existing SAS format
- It must not end with a period in the VALUE statement
Let's solve the problems discussed above using the PROC format:
Trouble 1 (Y 2): Show M and F as Male and Female.
Code:
Process format;
Value $ Genderfmt ‘M’ = ‘Male’
‘F’ = ‘Woman’;
To run;
Process Print Data = Sales;
Genre Format $ Genderfmt. ;
To run;
Production:
Similarly, we can solve it for the problem 2. Define Category A, B, C, D and E as Ultra, Super, Average, Low y Poor.
Trouble 3: you want to show Salesamount frequency in three categories ” = 12000″
Code:
Proc format;
Valor Salegrp low-8000 = ‘<8000'
8000 – <12000 = '8000-12000'
12000-Alto = ‘> = 12000’;
To run;
Proc Freq Data = Ventas;
Salesamount Salegrp format .;
Mesa Salesamount;
To run;
Production:
Above, I have used ranges to define the format. Can be used for numeric and character values.
The special keywords used to define the ranges are: –
a) Ranges can be multiple values separated by commas.
i) ‘A’, ‘B’, ‘C’
ii) 22, 44, 67
b) Ranges can include or exclude bounding values, based on the use of various keywords like:
i) 24 – 45: Includes values of 24 a 45 including 24 Y 45.
ii) 24 <- 45: Includes values between 24 a 45 including 45 and excluding 24.
iii) 24 – <45: Includes values between 24 a 45 including 24 and excluding 45.
iv) 24 <- <45: Includes values between 24 a 45 excluding 24 Y 45.
c) LOW refers to the minimum number available, HIGH refers to the highest number available and OTHERS includes all unspecified numbers. Others also include missing values if not specified.
Some important points / applications to consider about PROC FORMAT
-
PROC Format returns the original value if we have not included all the data values when defining the format.
-
We can create multiple formats in a single PROC FORMAT statement, specifying multiple values in PROC Format
Code:
Process format;
Valor Salegrp low-8000 = ‘<8000'
8000 – <12000 = '8000-12000'
12000-Alto = ‘> = 12000’;
Value $ Genderfmt ‘M’ = ‘Male’
‘F’ = ‘Woman’;
To run;
Process Print Data = Sales;
Salesamount Salegrp format. Gender $ Genderfmt .;
To run;
Production:
-
PAGROC FORMAT can be used to create efficient data joins
In PROC FORMAT, we have a choice CNTLIN, which allows us to create a format from a data set instead of a VALUE statement. Before using this option, we first look at the guidelines below: –
a) The input data set must contain three variables required by PROC FORMAT: START, LABEL y FMTNAME.
b) Here START is the key field between these two tables and must be unique in the input dataset (data set, we are using in CNTLIN option).
c) The LABEL variable is the value that we want to assign to another data set.
d) The variable FMTNAME is the name of the format and must be passed between single quotes.
e) After defining the format, we can use the put function to create a variable in the dataset based on the key field and the format we have defined.
Example:
Let's say we want to map agent DOB from dataset Agent_DOB (Sample data) to the previous data set
Here we want to create a format based on Agent_DOB and apply it to Sales. And Agent_DOB, we would consider AGT_ID (key field, between both data sets) como START, DOB as Label and FMTNAME = '$ AGENT'. Now look at the statements (then).
Data Agent_DOB_Fmt;
Rename AGT_ID = START;
Set Agent_DOB;
Label = DOB;
FMTNAME = ‘$ AGENT’;
To run;
Process format CNTLIN = Agent_DOB_Fmt;
To run;
The above code generated a format using the Agent_DOB_Fmt dataset and then, to merge it with the SALES dataset, data steps were written (then).
Data Sales_DOB;
Set sales;
Date_birth = Put (Eight_ID, $ Agent.);
To run;
Limitation of PROC FORMAT as a merger
This is the best method when we want to merge a variable from another dataset, but if we want to add five or multiple variables, then we have to repeat the PROC FORMAT statement so many times together with the multiple PUT function in the data pass. Then, I prefer to use MERGE or PROC SQL in such cases.
-
We can save the user defined SAS format for future use.
All SAS formats are stored in a catalog (format collection). When we create a format, is stored in the catalog. If we don't specify the catalog, SAS stores the formats in the WORK library in a catalog called FORMATS. Like other data sets in the WORK library, are also removed at the end of the session.
Now, to save user-defined formats, we need to specify where to store the catalog and how to call it. This can be achieved by storing formats in a library other than WORK.
Paso 1 First, we have to define a library (here i am using SAS University edition)
Syntax: – LIBNAME Library_Name “Path”
LIBNAME STATDATA “/ carpetas / miscarpetas / ECSTAT0”;
Step-2 Use the library option in PROC format and provide a library name with the name of the format file. The file name must be a valid SAS data set name.
Syntax: – PROC FORMAT LIBRARY = Library_Name.MYFILENAME;
Process Format Library = STATDATA.Gender_Fmt;
Value $ Genderfmt ‘M’ = ‘Male’
‘F’ = ‘Woman’;
To run;
Above program has created a file called Gender_Fmt.sas7bcat in your directory location.
Now, whenever we want to use a stored format, we have to tell SAS to look for formats in that catalog file. This is done with the fmtsearch option. Then, before using it, we need to write a statement.
Syntax: – Options fmtsearch = (Library_Name.MYFILENAME)
Options fmtsearch = (STATDATA.Gender_Fmt); / * After this we can use Gender_Fmt * /
Proc Print data = Ventas;
Genre Format $ Genderfmt .;
To run;
The SAS image format creates templates in which we define how the numbers are displayed. With the use of PICTURE FORMAT, we can overcome multiple number display problems like: –
1. Placing decimals and commas
2. Embed characters with numbers
3. Prefixes
Example 1: – I want to show the sales amount that ends with a% sign and preceded by a sign $.
Process format;
Imagen New_fmt low-High = ‘000000%’ (Prefix = ‘$’);
To run;
Process Print Data = Sales;
Salesamount New_fmt format .;
To run;
Production:
Example 2: – Let's say, I have a phone number for 10 digits and I want to display it as 123-3456-789.
Data phone;
Input contact;
data lines;
1111111111
222222222
3333333333
;
To run;
Process format;
Tele low-high image = ‘000-0000-000 ′;
To run;
Proc print data = Phone;
Tele Contact Format .;
To run;
Final notes: –
In this article, we discussed various methods to display the format of data values using built-in and user-defined formats. We have also analyzed various techniques to define formats as ranges, images, handle missing values and mismatched values using OTHERS. We have also discussed the efficient fusion technique using PROC FORMAT. This should be all you need to be a professional with SAS formats.
We have not covered In-format when reading a non-standard dataset. We will talk about that in one of our future posts..
I hope this article has been helpful to you? We have simplified this topic and we have tried to present it in a very simple and lucid way. If you need more help with the SAS format, feel free to ask your questions through the comments below.