Introduction to data quality: definition, control and benefits


Lack of data quality It is one of the main problems faced by those responsible for information systems and companies in general., since it clearly represents one of the problems “hidden” more serious and persistent in any organization.

data quality

In reality, a good data quality is the most powerful corporate assetas it enables you to accelerate growth and better manage costs and initiatives for better returns.

How we define data quality

According to ISO rule 9000: 2000, quality could be defined as “the degree to which a set of inherent characteristics meets the requirements, In other words, with the established need or expectation, generally implicit or mandatory”.

In the words of David Loshin, President of Integrity of Knowledge, Inc: “To be able to relate data quality problems to their impact on the business, we have to be able to categorize both our expectations of data quality and the criteria of impact on the company”.

The Dr. Kaoru Ishikawa (1988), at the same time, consider that: “In its narrowest interpretation, quality means product quality, but in its broadest interpretation it means quality of work, quality of service, information quality, quality of the procedure, management quality and company quality “.

How we control it

To get a good quality control It is necessary to cover a whole procedure to achieve our objective, which is to improve quality for a better and greater satisfaction of the client and of oneself as a company or industry.

To know the details of this procedure, click here to download a companion e-book to this post that covers the subject in much more depth.

Through the stages, we have the possibility of detecting any anomaly that may occur during any of our processes before achieving our objective, therefore it is essential to carry out an adequate, correct monitoring and continuous improvement.

The benefits

Companies that give relevance to the quality of their data, enable them to obtain key benefits to add value to the business and differentiate themselves from the rest of their competitors, toasting:

  • Minimize risks in your projects, especially those related to Information Technologies.

  • Save time and resources, making better use of infrastructure and technological systems to exploit your information.

  • Timely business decision making, based on reliable information, validated and clean.

  • Adaptation to international standards or regulations on information management, allowing ease of execution.

  • Improve confidence, the good relations and the image of the company before its clients compared to the competition.

What is the relevance of big data quality and its challenges?

Know what is the relevance of quality big data data we must pay attention that this is a precondition for the analysis and use of big data and to guarantee the value of that data. The development of technologies such as cloud computing, the internet of things and social media, has caused the amount of data to continuously increase and accumulate at an unprecedented rate..

By obtaining and analyzing big data from various sources and with different uses, Researchers and all those who make decisions in companies have realized that this massive amount of information can offer many advantages to understand the needs of customers, improve service quality and predict and prevent risks. But nevertheless, the use and analysis of big data must be based on exact data what makes us see what is the relevance of data quality, since it is a necessary condition to generate value from big data.

Features of Big Data

As Big Data Introduces New Features, the quality of your data also faces many challenges. Big data characteristics are reduced to 4V: volume, speed, variety and value:

  • Volume refers to the tremendous volume of data. We usually use TB or larger amounts to measure this volume of data.
  • The speed means data is being formed at unprecedented speed and needs to be dealt with in a timely manner.
  • Variety indicates that Big Data has all kinds of data types, and this diversity divides the data into structured data and unstructured data. These data of various types require greater data processing capabilities.
  • Finally, Value represents a low value density. The density of the value is inversely proportional to the total size of the data, the larger the scale of big data, less valuable is the data.

The challenges of big data data quality

Because big data has those characteristics of 4V, when companies use and process big data, extracting high-quality real data from massive data sets, variable and complicated, this becomes an urgent obstacle. Nowadays, big data data quality faces the following challenges:

  • The diversity of sources Data provides rich data types and complex data structures and increases the difficulty of data integration.
  • The volume of data is tremendous, and it is difficult to judge the quality of the data in a reasonable time.
  • Data changes very fast and the “puntuality” of the data is very brief, what needs higher requirements for processing technology.
  • There are not too many standards of data quality Unified and approved data quality and big data research.

Big Data quality criteria

Big data is relatively new and there is no uniform definition of the quality of your data or the quality criteria to use.. But one thing is true: the quality of the data depends not only on its own characteristics, but in addition to the business environment that uses the data, including users and business processes. Only data that fits the relevant uses and meets the requirements can be considered qualified data (or good quality).

Regularly, data quality standards are developed from the perspective of data producers. In the past, data consumers were direct or indirect data producers, which guaranteed the quality of the data. But nevertheless, In the age of big data, with the diversity of data sources, data users are not necessarily data producers. Therefore, it is very difficult to measure the quality of the data.

We choose the commonly accepted and widely used data quality dimensions as big data quality standards and redefine their core concepts based on actual business needs. At the same time, each dimension can be divided into many typical items associated with it, and each item has its own corresponding quality indicators. This way, the hierarchical quality standards for big data:

  1. Availability:
  • Accessibility:
    • If a data access interface is provided
    • Data can be made public or easily acquired
  • Possibility:
    • Within a specified time, if the data arrives on time
    • If the data is periodically updated
    • If the time interval between data collection and processing until its publication meets the requirements.
  • Usability:
    • Credibility:
      • The data comes from specialized institutions in a country, field or industry.
      • Experts or specialists regularly audit and verify the accuracy of the data content.
      • Data exists in the range of known or acceptable values
  • Reliability:
    • Accuracy
      • The data provided is accurate
      • The representation (the value) of the data reflects well the actual state of the source information.
      • Information representation (data) will not cause ambiguity
    • Consistency:
      • After processing the data, their concepts, value domains and formats still match as before processing.
      • For a time, data remains consistent and verifiable.
      • All data is consistent or verifiable
    • Integrity:
      • The data format is clear and meets the criteria.
      • Data is consistent with structural integrity
      • The data is consistent with the integrity of the content.
    • I complete it:
      • Whether a component deficiency will affect data usage for multi-component data
      • Whether a component deficiency will affect the accuracy and integrity of the data.
  • Relevance:
    • Convenience:
      • The data collected does not fully coincide with the theme, but they do expose an aspect
      • Most of the recovered data sets are within the recovery topic that users need
      • The information topic matches the user recovery topic
  • Presentation quality:
    • Readability:
      • The data (contents, format, etc.) are clear and understandable
      • It is easy to judge that the data provided meets the needs
      • The description, data classification and coding meet specifications and are easy to understand


    The advent of the era of big data has seen data from various industries and fields show explosive growth. How to ensure the quality of big data data and how to analyze and extract information and insights hidden behind the data become major problems for companies.. Poor data quality can lead to low efficiency in the use of data and even serious errors in decision-making.

    Related Posts:

    Subscribe to our Newsletter

    We will not send you SPAM mail. We hate it as much as you.