Life cycle and normalization of a database in the context of big data

Share on facebook
Share on twitter
Share on linkedin
Share on telegram
Share on whatsapp

Contents

Big data presents new data management challenges that go beyond managing big data. An often overlooked challenge is the lifecycle and normalization of a database in this big data context..

istock-654067526-5806779

Photo credits: iLexx

Data governance, both from the data source and its results, in this type of database presents great challenges. The comparison of the life cycle and normalization of a traditional database with big data helps to understand one of the more complex data governance challenges in this new world of data.

The stages of the data life cycle

A data life cycle typical will consist of four stages:

  1. Ingestion. I can not think of him life cycle and normalization from a database without starting at the beginning, moment in which the different data sources are incorporated into the data platform. At this stage it is also common to find data verification and validation processes basic, even though the main thing right now is to erase all available data in a central location (which can be a data warehouse, a data warehouse or data lake).
  2. ID / Cleaning / Enrichment. The data types and the names by which they appear in the columns are recognized. In this stage, The data can also be enriched and cleaned.
  3. Standardization. This step involves transforming the data into a commercially agreed neutral data model.. Here, relationships are established between the different data entities, essentially coding internal knowledge and data structure. This stage is also known as data integration stage and coincides with the time when business rules are regularly introduced and domain checks, as well as validation of master or reference data
  4. Presentation. It is the final step of the procedure, when the transformation of the neutral business model created in the previous step into one or more company-specific data representations is completed. This model is often called a dimensional model.. It is common for additional business rules to apply at this point, as well as aggregations and the creation of derived data.
  5. Outline in reading / outline in writing. You can't talk about the entire data lifecycle without mentioning when the user uses it. One of the main differences between traditional data warehousing and big data warehousing is the point at which the end user interacts with the information.. Therefore, while in traditional data storage environment, the general consumer would use a well defined writing scheme, BI platforms and advanced analytics solutions can consume data from the presentation layer to provide reports, dashboards and predictive analytics, allowing the data consumer to access the data much earlier.

Life cycle and normalization of a database in Big Data environments

When considering the life cycle and normalization of a database, everything related to the use of information is decisive, both in terms of processing and in relation to the cost of the data life cycle. Specifically:

  1. And Big Data, the first two stages are high volume and low cost and effort.. Data is abundant and cheap, and ingestion, data identification and cleaning is relatively simple. Despite this, the challenge lies in the management of Big Data. The difficulty of the last two life cycle processes and the normalization of a database has to do with the creation of meaning. of such a large and largely disorganized data set (reading scheme).
  2. In a traditional setting, Conversely, data warehousing requires a considerable amount of effort to ensure the quality of the ingested data and transform the data into suitable data models. (written sketch). Something that extends to the consistent application of business rules. Despite this, as all consumers have the same vision of the data universe, query performance is quite high and user query capacity benefits. The value density of data is much higher than in big data environments. Here, each row has an intrinsic value.

Finally, in matters associated with the life cycle and standardization of a database, you need to pay attention agility. And that is something inherent to big data. Although data warehouses are notoriously difficult, time consuming and expensive to modify, data consumers set their own criteria and timelines within a world of big data.

(function(d, s, id) {
var js, fjs = d.getElementsByTagName(s)[0];
if (d.getElementById(id)) return;
js = d.createElement(s); js.id = id;
js.src = “//connect.facebook.net/es_ES/all.js#xfbml=1&status=0”;
fjs.parentNode.insertBefore(js, fjs);
}(document, ‘script’, 'facebook-jssdk'));

Subscribe to our Newsletter

We will not send you SPAM mail. We hate it as much as you.