SAS PROC format | Predefined and user-defined format in SAS

Contents

I have spent a significant part of my career as a data visualization specialist. I am very picky about formatting and reporting. Then, when i started using SAS, faced some challenges when changing number and character formats, especially when it came to dates. Not surprisingly, both Kunal and I receive a good number of inquiries on this topic..

And SAS, there are several options to improve report layouts. In this article, we will particularly discuss about the methods to play with the format of the data values. It should be noted that these changes are only applied while the results are displayed. Changing the output format does not change the shape, data is stored on the back end.

By default, SAS provides several built-in formats to handle various formats, but they are not enough to meet the personalized requirements that your data may have. For instance, we may have coded Male and Female as M and F (O 0 Y 1), but while printing we would like to show the field as MALE and FEMALE only. Another common example is displaying area codes on telephone numbers from 10 digits (for instance, 123-3456-789).

PD: This is a long article compared to what I usually write, so feel free to digest it to pieces.

The article is divided into 2 general parts: First, we understand predefined formats and user-defined formats in SAS. Then, we analyze various applications and examples of these concepts.

1_proc_format_sas-6665279

Integrated SAS formats

Then, sample data is displayed containing details on agent performance. We will use this dataset in examples and discussion in this article. Here DOJ is in numeric format, as defined by SAS by default, from 1 of January of 1960.

2_proc_format_data-2295727Let's first look at the syntax of the FORMAT statement: –

Syntax: – FORMAT Variable format (s);

Let's do some exercises with the format declaration:

Example 1: Show DOJ in DDMMYYYY format:

Code:

Process Print Data = Sales;
Format DOJ Date 9. ;
To run;

Production:

3_proc_format-6128743

Here is the list of predefined date formats available to change the output format of variables:

4_proc_format-5452091

Example 2: display dollar sales amount with one decimal place

Code:

Process Print Data = Sales;
Salesamount Dollar9.1 format;
To run;

Production:

5_proc_format_sas-1402548

List of predefined date formats available to change the output format of variables:

6_proc_format_sas1-3504838

Example 3: Display Salesamount in dollars with two decimal places with commas and DOJ in Weekdate format.

Code:

Process Print Data = Sales;
Format Sales Amount Dollar12.2 DOJ WEEKDATE .;
To run;

Production:

7_proc_format_sas-2681156

Need for user-defined formats:

So far we have seen how to change the format of numbers and dates with built-in SAS formats. But there can be many times when SAS's built-in formats don't meet our needs.. As in the current dataset, we want:

  1. Shows “MASCULINE” Y “FEMININE” instead of “M” Y “F”
  2. Redefine categories A, B, C, D and E as Ultra, Super, Average, Low y Poor
  3. Shows the Salesamount frequency in three categories ” = 12000″

All of these things can be done with data passing (o PROC SQL) with the If-Else statement (or change case), but it will require a new variable that will store these formatted values. While, if we only want to change the display (and not the values ​​in the dataset), so creating a user defined format using PROC FORMAT is a more efficient way to make these changes.

User-defined formats using Proc FORMAT

PROC FORMAT is a procedure that creates a mapping of data values ​​to data labels. User-defined FORMAT mapping is independent of a SAS DATA SET and variables and must be explicitly assigned in a DATASTEP and / the subsequent PROC.

Syntax of PROC FORMAT: –

8_proc_format_sas-8064107

Rules for defining FORMAT NAME:
  1. For character values, the first character must be a dollar sign ($) and a letter or underscore as the second character. For numerical values, the name must have a letter or an underscore as the first character
  2. The name cannot end with a number
  3. Cannot be the name of an existing SAS format
  4. It must not end with a period in the VALUE statement

Let's solve the problems discussed above using the PROC format:

Trouble 1 (Y 2): Show M and F as Male and Female.

Code:

Process format;
Value $ Genderfmt ‘M’ = ‘Male’
‘F’ = ‘Woman’;
To run;
Process Print Data = Sales;
Genre Format $ Genderfmt. ;
To run;

Production:

9_proc_format_sas-4009602Similarly, we can solve it for the problem 2. Define Category A, B, C, D and E as Ultra, Super, Average, Low y Poor.

Trouble 3: you want to show Salesamount frequency in three categories ” = 12000″

Code:

Proc format;
Valor Salegrp low-8000 = ‘<8000'
8000 – <12000 = '8000-12000'
12000-Alto = ‘> = 12000’;
To run;
Proc Freq Data = Ventas;
Salesamount Salegrp format .;
Mesa Salesamount;
To run;

Production:

10_proc_format_sas-6560974

Above, I have used ranges to define the format. Can be used for numeric and character values.

The special keywords used to define the ranges are: –

a) Ranges can be multiple values ​​separated by commas.

i) ‘A’, ‘B’, ‘C’
ii) 22, 44, 67

b) Ranges can include or exclude bounding values, based on the use of various keywords like:

i) 24 – 45: Includes values ​​of 24 a 45 including 24 Y 45.
ii) 24 <- 45: Includes values ​​between 24 a 45 including 45 and excluding 24.
iii) 24 – <45: Includes values ​​between 24 a 45 including 24 and excluding 45.
iv) 24 <- <45: Includes values ​​between 24 a 45 excluding 24 Y 45.

c) LOW refers to the minimum number available, HIGH refers to the highest number available and OTHERS includes all unspecified numbers. Others also include missing values ​​if not specified.

Some important points / applications to consider about PROC FORMAT

  • PROC Format returns the original value if we have not included all the data values ​​when defining the format.

11_proc_format_sas-6398919

  • We can create multiple formats in a single PROC FORMAT statement, specifying multiple values ​​in PROC Format

Code:

Process format;
Valor Salegrp low-8000 = ‘<8000'
8000 – <12000 = '8000-12000'
12000-Alto = ‘> = 12000’;
Value $ Genderfmt ‘M’ = ‘Male’
‘F’ = ‘Woman’;
To run;
Process Print Data = Sales;
Salesamount Salegrp format. Gender $ Genderfmt .;
To run;

Production:

12_proc_format_sas-4219573

  • PAGROC FORMAT can be used to create efficient data joins

In PROC FORMAT, we have a choice CNTLIN, which allows us to create a format from a data set instead of a VALUE statement. Before using this option, we first look at the guidelines below: –

a) The input data set must contain three variables required by PROC FORMAT: START, LABEL y FMTNAME.

b) Here START is the key field between these two tables and must be unique in the input dataset (data set, we are using in CNTLIN option).

c) The LABEL variable is the value that we want to assign to another data set.

d) The variable FMTNAME is the name of the format and must be passed between single quotes.

e) After defining the format, we can use the put function to create a variable in the dataset based on the key field and the format we have defined.

Example:

Let's say we want to map agent DOB from dataset Agent_DOB (Sample data) to the previous data set13_proc_format_sas-7383015

Here we want to create a format based on Agent_DOB and apply it to Sales. And Agent_DOB, we would consider AGT_ID (key field, between both data sets) como START, DOB as Label and FMTNAME = '$ AGENT'. Now look at the statements (then).

Data Agent_DOB_Fmt;
Rename AGT_ID = START;
Set Agent_DOB;
Label = DOB;
FMTNAME = ‘$ AGENT’;
To run;
Process format CNTLIN = Agent_DOB_Fmt;
To run;

14_proc_format_sas-9892599

The above code generated a format using the Agent_DOB_Fmt dataset and then, to merge it with the SALES dataset, data steps were written (then).

Data Sales_DOB;
Set sales;
Date_birth = Put (Eight_ID, $ Agent.);
To run;

15_proc_format_sas-4787747

Limitation of PROC FORMAT as a merger

This is the best method when we want to merge a variable from another dataset, but if we want to add five or multiple variables, then we have to repeat the PROC FORMAT statement so many times together with the multiple PUT function in the data pass. Then, I prefer to use MERGE or PROC SQL in such cases.

  • We can save the user defined SAS format for future use.

All SAS formats are stored in a catalog (format collection). When we create a format, is stored in the catalog. If we don't specify the catalog, SAS stores the formats in the WORK library in a catalog called FORMATS. Like other data sets in the WORK library, are also removed at the end of the session.

Now, to save user-defined formats, we need to specify where to store the catalog and how to call it. This can be achieved by storing formats in a library other than WORK.

Paso 1 First, we have to define a library (here i am using SAS University edition)

Syntax: – LIBNAME Library_Name “Path”
LIBNAME STATDATA “/ carpetas / miscarpetas / ECSTAT0”;

Step-2 Use the library option in PROC format and provide a library name with the name of the format file. The file name must be a valid SAS data set name.

Syntax: – PROC FORMAT LIBRARY = Library_Name.MYFILENAME;
Process Format Library = STATDATA.Gender_Fmt;
Value $ Genderfmt ‘M’ = ‘Male’
‘F’ = ‘Woman’;
To run;

Above program has created a file called Gender_Fmt.sas7bcat in your directory location.

Now, whenever we want to use a stored format, we have to tell SAS to look for formats in that catalog file. This is done with the fmtsearch option. Then, before using it, we need to write a statement.

Syntax: – Options fmtsearch = (Library_Name.MYFILENAME)
Options fmtsearch = (STATDATA.Gender_Fmt); / * After this we can use Gender_Fmt * /
Proc Print data = Ventas;
Genre Format $ Genderfmt .;
To run;

The SAS image format creates templates in which we define how the numbers are displayed. With the use of PICTURE FORMAT, we can overcome multiple number display problems like: –

1. Placing decimals and commas

2. Embed characters with numbers

3. Prefixes

Example 1: – I want to show the sales amount that ends with a% sign and preceded by a sign $.

Process format;
Imagen New_fmt low-High = ‘000000%’ (Prefix = ‘$’);

To run;
Process Print Data = Sales;
Salesamount New_fmt format .;
To run;

Production:

16_proc_format_sas-4904837

Example 2: – Let's say, I have a phone number for 10 digits and I want to display it as 123-3456-789.

Data phone;
Input contact;
data lines;
1111111111
222222222
3333333333
;
To run;
Process format;
Tele low-high image = ‘000-0000-000 ′;
To run;
Proc print data = Phone;
Tele Contact Format .;
To run;

17_proc_format_sas-2596746

Final notes: –

In this article, we discussed various methods to display the format of data values ​​using built-in and user-defined formats. We have also analyzed various techniques to define formats as ranges, images, handle missing values ​​and mismatched values ​​using OTHERS. We have also discussed the efficient fusion technique using PROC FORMAT. This should be all you need to be a professional with SAS formats.

We have not covered In-format when reading a non-standard dataset. We will talk about that in one of our future posts..

I hope this article has been helpful to you? We have simplified this topic and we have tried to present it in a very simple and lucid way. If you need more help with the SAS format, feel free to ask your questions through the comments below.

If you like what you have just read and want to continue learning about analytics, subscribe to our emails, Follow us on twitter or like ours page the Facebook.

Subscribe to our Newsletter

We will not send you SPAM mail. We hate it as much as you.