Data profiling, the first step in quality processes

Share on facebook
Share on twitter
Share on linkedin
Share on telegram
Share on whatsapp



Data quality projects need to establish an action framework that follows a strategy oriented towards a series of objectives that we will only achieve after developing a specific action plan. In this context, the data profile It is part of a fundamental procedure that is carried out prior to the design of the quality rules, known as Data discovery, that, Besides, it also includes the identification of inefficiencies and layoffs.

With the Data Discovery app, a complex procedure of key relevance to explore models and / or undocumented data sources, we were able to identify and measure them. Specifically, profiling carries out a data quality audit to find the root of errors like a first step to find a solution to company data quality problems caused by a myriad of reasons, as migrations, data entry, data growth, diversity of sources or, between a long etcetera, loading errors.

Profiling, key in data quality life cycle control

The quality processes implemented, However, they are continuous, since they focus on the life cycle control data quality. In general, control is done through profiling, making a structure and content profiling, as well as subsequent cleaning, through actions that follow a logical order, from discovery and analysis to definition, developing, review and follow-up.

Subsequently, indeed, a data cleaning from the information that the profile reveals to us. It will be then when the rules are defined and the objectives are established according to the needs of the company., because beyond a minimum, data quality is a flexible concept that has to be adapted to the organization's requirements, looking for a balance. between costs and functionality.

Although it is best that the procedure is performed globally, non-departmental, it is common to implement processes progressively. Therefore, if governance and data quality solutions lack global approach, at least they should be scalable, what it means to carry out the procedure as maintenance and extension, in which case the profiling must go identifying, classifying and quantifying quality problems in all sources.

That data quality audit, profiling takes the form of a scorecard that informs us in a concrete way, at a qualitative and statistical level of the quality of the data (mistakes, percentages of duplicate data, redundant, incomplete, etc.) at the beginning, before determining remediation initiatives within the data quality project.

After profiling, included within the Data Discovery, the Data quality and a series of quality control activities, as data assurance, data cleansing or data profiling, Following clear procedures from the beginning. According to the Firstlogic methodology, A data quality procedure covers a series of phases that span from evaluation (profiling) until the final report, in which reports are presented on the data quality procedure implemented. Meanwhile, we will have carried out other processes no less important, as analysis, categorization, standardization, correction, improvement, data crossover and unification, in this order. Always keeping in mind that it is a continuous improvement procedure.

Image source: Stuart Hundred /

Related Post:

(function(d, s, id) {
var js, fjs = d.getElementsByTagName(s)[0];
if (d.getElementById(id)) return;
js = d.createElement(s); = id;
js.src = “//”;
fjs.parentNode.insertBefore(js, fjs);
}(document, ‘script’, 'facebook-jssdk'));

Subscribe to our Newsletter

We will not send you SPAM mail. We hate it as much as you.