
Why is a transformation procedure necessary??
To understand the need for a transformation procedure, we must pay attention that in an ETL procedure several sources are handled, some of them outside the organization itself: Stock market information from a website outside the company, any type of download from the Internet, an Office package, etc. This variety of databases, sometimes from several countries, con diferentes idiomas y diferentes unidades de measureThe "measure" it is a fundamental concept in various disciplines, which refers to the process of quantifying characteristics or magnitudes of objects, phenomena or situations. In mathematics, Used to determine lengths, Areas and volumes, while in social sciences it can refer to the evaluation of qualitative and quantitative variables. Measurement accuracy is crucial to obtain reliable and valid results in any research or practical application...., makes it impossible or difficult to make comparisons if you don't do conversions and formatting beforehand. Hence the need for transformation processes.
Transformation actions
The most common actions or processes are:
- Data Reformatting.
- Unit Conversion. As an example, Convert miles to kilometers per hour or vice versa. This is very common when extracting data from countries with different metric units. Another case would be the conversion of different currencies (Pounds, euros …) in a single standard value.
- Selecting columns for post-loading. As an example, Cause columns with null values not to load.
- Column Aggregation. Adding a column with the origin of certain cars would be an example.
- Divide a column into several. This action is very useful for, as an example, Separate into three columns, one for the first name and two for the surname, the identification of a person who was previously in a single field.
- Translate codes. As an example, if the source sources store an "H" for men and an "M" for women, Give the essential instructions for the destination to store a "1" for men and a "2" for women.
- Get new calculated values.
- Unite data from multiple sources.
- Searches. This is when data is taken and compared with other types of data, Cross-referencing information. As an example, capturar un código de cliente de una databaseA database is an organized set of information that allows you to store, Manage and retrieve data efficiently. Used in various applications, from enterprise systems to online platforms, Databases can be relational or non-relational. Proper design is critical to optimizing performance and ensuring information integrity, thus facilitating informed decision-making in different contexts.... y cruzarlo con otra base de préstamos concedidos para saber si dicho cliente disfruta o no de ese préstamo.
- Pivoting. A procedure similar to searches but with a higher degree of complexity, since data from different sources are cross-referenced.
Who is in charge of carrying out these transformations??
This role corresponds to the developer or analyst of the ETL procedure in question. The definition of the transformations to be carried out is based on a previous analysis and the cleaning stage that, as we have already pointed out and will deepen later, It is a separate procedure but closely linked to the transformation procedure.
Transformation: A vital procedure for subsequent analysis and comparisons
Data transformation, after their extraction and as a previous step to loading, it cannot be considered a secondary or dispensable stage. Without a good job of data transformation, it would not be feasible to carry out comparisons and analyses. In other words, one of the great benefits for the institutions of implementing an ETL procedure would be lost.