The ETL processes are a standard definition used to refer to the data movement and transformation. It is the procedure that enables institutions move data from multiple sources, reformat and load into another database (called data mart O data warehouse) to analyze them. They can also be sent to another operating system to support a business procedure.
In summary, the main objective of this procedure is facilitate data movement and transformation, integrating the different systems and sources in the modern organization.
The term ETL stands for:
Tto transform: to transform.
Phases of an ETL procedure
The different phases or sequences of an ETL procedure are as follows:
Extraction data from one or more source systems.
Transformation of said data, In other words, the opportunity to reformat and clean this data when necessary.
Load of said data in another place or database, a data mart or a data warehouse, to analyze them or support a business procedure.
Data cleansing as a separate stage from ETL processes
Although it could be understood as an integrated action in the data transformation stage, today the trend is to consider the data cleaning as a separate stage of the ETL procedure.
This vision corresponds to a more modern and practical conception of the procedure.. To save time and be more efficient, it is convenient to unify criteria, as an example by entering “of” instead of “avenue” on all records in a postal address database, BEFORE starting the ETL procedure itself.
It is as important to have consolidated information as that all the data is correct and with a single vision for all users. Only in this way will it be possible to obtain truly optimal and efficient work circuits and analysis of said data..
What systems can be integrated into an ETL procedure?
ETL processes can include:
Systems legacy. In other words, legacy, inherited or old.
New systems. Windows based, Linux and also in modern social networks: Facebook, Twitter, Linkedin, etc.
The systems legacy or inherited are generally characterized by: be closed, not allowing changes and having difficult access (in general some kind of conductor special). They are systems that process inward and, therefore, do not allow the addition of a computer that works in parallel.
Opposite case, new or modern systems (based on Windows or Linux) they are open, complete and interconnected. An example would be a Linux server farm, that enables the different nodes to interconnect with each other.
Benefits of ETL processes
Any company or organization benefits from implementing an ETL procedure to move and transform the data it handles for the following reasons:
Be able to create a Master data management, In other words, a standardized central repository of all the organization's data. As an example, if we have a customer object in a credit database and another customer object in the credit card database, What would be the master's degree Would establish, concretely and unequivocally, a unique customer record with your first and last name for the organization.
Enable managers make strategic decisions based on analysis of data loaded into new and updated databases: the lady mart or data warehouse.
It serves for integrate systems. Institutions grow organically and more and more data sources are added. This causes new needs to begin to emerge, What integrate the data of a on line bank with old data from a system legacy.
Can have a global vision of all the data consolidated in a data warehouse. As an example, create a marketing strategy based on the analysis of the above data.
ETL procedure: an effective system, but with challenges and problems to solve
As we have seen, ETL processes are very useful and beneficial for institutions due to their ability to integrate large databases, thus achieving a single global vision that allows analysts and managers to make the appropriate strategic decisions.
Implementing a well-defined ETL system is challenging given that, to be truly effective, must allow the integration of systems. legacy (some already very obsolete) with the most modern. At the same time, access to all these systems must occur not only in read mode, but also as writing.