Big data presents new data management challenges that go beyond managing big data. Un desafío que a menudo se pasa por alto es el ciclo de vida y la standardizationStandardization is a fundamental process in various disciplines, which seeks to establish uniform standards and criteria to improve quality and efficiency. In contexts such as engineering, Education and administration, Standardization makes comparison easier, interoperability and mutual understanding. When implementing standards, cohesion is promoted and resources are optimised, which contributes to sustainable development and the continuous improvement of processes.... de una databaseA database is an organized set of information that allows you to store, Manage and retrieve data efficiently. Used in various applications, from enterprise systems to online platforms, Databases can be relational or non-relational. Proper design is critical to optimizing performance and ensuring information integrity, thus facilitating informed decision-making in different contexts.... en este contexto de big data.
Photo credits: iLexx
Data governance, tanto de la Data SourceA "Data Source" refers to any place or medium where information can be obtained. These sources can be both primary and, such as surveys and experiments, as secondary, as databases, academic articles or statistical reports. The right choice of a data source is crucial to ensure the validity and reliability of information in research and analysis.... como de sus resultados, in this type of database presents great challenges. The comparison of the life cycle and normalization of a traditional database with big data helps to understand one of the more complex data governance challenges in this new world of data.
The stages of the data life cycle
A data life cycle typical will consist of four stages:
- Ingestion. I can not think of him life cycle and normalization from a database without starting at the beginning, moment in which the different data sources are incorporated into the data platform. At this stage it is also common to find data verification and validation processes basic, even though the main thing right now is to erase all available data in a central location (which can be a data warehouse, a data warehouse or data lake).
- ID / Cleaning / Enrichment. The data types and the names by which they appear in the columns are recognized. In this stage, The data can also be enriched and cleaned.
- Standardization. This step involves transforming the data into a commercially agreed neutral data model.. Here, relationships are established between the different data entities, essentially coding internal knowledge and data structure. This stage is also known as data integration stage and coincides with the time when business rules are regularly introduced and domain checks, as well as validation of master or reference data
- Presentation. It is the final step of the procedure, when the transformation of the neutral business model created in the previous step into one or more company-specific data representations is completed. This model is often called a dimensional model.. It is common for additional business rules to apply at this point, as well as aggregations and the creation of derived data.
- Outline in reading / outline in writing. You can't talk about the entire data lifecycle without mentioning when the user uses it. One of the main differences between traditional data warehousing and big data warehousing is the point at which the end user interacts with the information.. Therefore, while in traditional data storage environment, the general consumer would use a well defined writing scheme, BI platforms and advanced analytics solutions can consume data from the presentation layer to provide reports, dashboards and predictive analytics, allowing the data consumer to access the data much earlier.
Life cycle and normalization of a database in Big Data environments
When considering the life cycle and normalization of a database, everything related to the use of information is decisive, both in terms of processing and in relation to the cost of the data life cycle. Specifically:
- And Big Data, the first two stages are high volume and low cost and effort.. Data is abundant and cheap, and ingestion, data identification and cleaning is relatively simple. Despite this, the challenge lies in the management of Big Data. The difficulty of the last two life cycle processes and the normalization of a database has to do with the creation of meaning. de un conjunto de datos tan grande y en gran measureThe "measure" it is a fundamental concept in various disciplines, which refers to the process of quantifying characteristics or magnitudes of objects, phenomena or situations. In mathematics, Used to determine lengths, Areas and volumes, while in social sciences it can refer to the evaluation of qualitative and quantitative variables. Measurement accuracy is crucial to obtain reliable and valid results in any research or practical application.... desorganizado (reading scheme).
- In a traditional setting, Conversely, data warehousing requires a considerable amount of effort to ensure the quality of the ingested data and transform the data into suitable data models. (written sketch). Something that extends to the consistent application of business rules. Despite this, as all consumers have the same vision of the data universe, query performance is quite high and user query capacity benefits. The value density of data is much higher than in big data environments. Here, each row has an intrinsic value.
Finally, in matters associated with the life cycle and standardization of a database, you need to pay attention agility. And that is something inherent to big data. Although data warehouses are notoriously difficult, time consuming and expensive to modify, data consumers set their own criteria and timelines within a world of big data.
(function(d, s, id) {
var js, fjs = d.getElementsByTagName(s)[0];
if (d.getElementById(id)) return;
js = d.createElement(s); js.id = id;
js.src = “//connect.facebook.net/es_ES/all.js#xfbml=1&status=0”;
fjs.parentNode.insertBefore(js, fjs);
}(document, ‘script’, 'facebook-jssdk'));