ETL processes: Transformation. What does it consist of?

Contents

In the ETL processesAfter the extraction of the data from the source or source sources comes the second stage: The transformation.The transformation stage of an ETL procedure is the Applying a series of functions or business rules on the extracted data to convert it into data that will then be uploaded to the new source. ETL_Pentaho.jpg

Why is a transformation procedure necessary??

To understand the need for a transformation procedure, we must pay attention that in an ETL procedure several sources are handled, some of them outside the organization itself: Stock market information from a website outside the company, any type of download from the Internet, an Office package, etc. This variety of databases, sometimes from several countries, con diferentes idiomas y diferentes unidades de measure, makes it impossible or difficult to make comparisons if you don't do conversions and formatting beforehand. Hence the need for transformation processes.

Transformation actions

The most common actions or processes are:

  • Data Reformatting.
  • Unit Conversion. As an example, Convert miles to kilometers per hour or vice versa. This is very common when extracting data from countries with different metric units. Another case would be the conversion of different currencies (Pounds, euros …) in a single standard value.
  • Selecting columns for post-loading. As an example, Cause columns with null values not to load.
  • Column Aggregation. Adding a column with the origin of certain cars would be an example.
  • Divide a column into several. This action is very useful for, as an example, Separate into three columns, one for the first name and two for the surname, the identification of a person who was previously in a single field.
  • Translate codes. As an example, if the source sources store an "H" for men and an "M" for women, Give the essential instructions for the destination to store a "1" for men and a "2" for women.
  • Get new calculated values.
  • Unite data from multiple sources.
  • Searches. This is when data is taken and compared with other types of data, Cross-referencing information. As an example, capturar un código de cliente de una database y cruzarlo con otra base de préstamos concedidos para saber si dicho cliente disfruta o no de ese préstamo.
  • Pivoting. A procedure similar to searches but with a higher degree of complexity, since data from different sources are cross-referenced.

Who is in charge of carrying out these transformations??

This role corresponds to the developer or analyst of the ETL procedure in question. The definition of the transformations to be carried out is based on a previous analysis and the cleaning stage that, as we have already pointed out and will deepen later, It is a separate procedure but closely linked to the transformation procedure.

Transformation: A vital procedure for subsequent analysis and comparisons

Data transformation, after their extraction and as a previous step to loading, it cannot be considered a secondary or dispensable stage. Without a good job of data transformation, it would not be feasible to carry out comparisons and analyses. In other words, one of the great benefits for the institutions of implementing an ETL procedure would be lost.

Subscribe to our Newsletter

We will not send you SPAM mail. We hate it as much as you.