Data validation and verification

Contents

Introduction

Very often, we use data verification and data validation interchangeably when it comes to data quality. But nevertheless, these two terms are different. In this article, we will understand the difference in 4 different contexts:

  1. Verification and Validation Meaning Dictionary
  2. Difference between data verification and data validation in general
  3. Difference between verification and validation from a software development perspective
  4. Difference between data verification and data validation from a machine learning perspective

1) Verification and Validation Meaning Dictionary

On board 1 explains the dictionary meaning of the words verification and validation with some examples.

screenshot-from-2021-03-08-17-04-09-e1615203319373-4126887

In summary, verification is about truthfulness and precision, whereas validation is about supporting the soundness of a point of view or the accuracy of a claim. Validation checks the accuracy of a methodology, while verification checks the accuracy of the results.

2) Difference between data verification and data validation in general

Now that we understand the literal meaning of the two words, let's explore the difference between “data verification” Y “data validation”.

Data verification: to make sure the data is accurate.

Data validation: to make sure the data is correct.

Let's develop with examples in the Table 2.

76972table2-9093822

Table 2: “Data verification” Y “data validation” examples

3) Difference between verification and validation from a software development perspective

From a software development perspective,

  • Verification is done to ensure the software is of high quality, well designed, robust and without errors without entering its usability.
  • Validation is performed to ensure usability and the ability of the software to meet customer needs.

As the picture shows 1, proof of correction, robustness analysis, unit tests, integration tests and others are all check Steps where tasks are oriented to verify details. The software output is verified with the desired output. Secondly, model inspection, black box testing and usability testing are all validation Steps where the tasks are oriented to understand if the software meets the requirements and expectations.

16417fig1-8538128

Fig 1: Differences between verification and validation in software development

4) Difference between data verification and data validation from a machine learning perspective

The paper of data verification in the machine learning process it is that of a gatekeeper. That ensures accurate and up-to-date data overtime. Data verification is mainly done in the new stage of data acquisition, namely, in step 8 of ML pipeline, as shown in Fig. 2. Examples of this step are identifying duplicate records and performing deduplication, and clean up the discrepancy in customer information in fields like address or phone number.

Besides, data validation (in step 3 of ML pipeline) ensures that the incremental data of the step 8 added to the learning data are of good quality and similar (from a statistical properties perspective) to existing training data. For instance, this includes find anomalies in the data detecting him differences between existing training data and new data to add to training data. On the contrary, any data quality issues / statistical differences in incremental data can be lost and training errors can accumulate over time Y deteriorate the accuracy of the model. Therefore, data validation detects significant changes (yes there are) in incremental training data at an early stage that helps with root cause analysis.

69253fig2-9964897
Fig 2: Components of the Machine Learning Pipeline

Authors:

1. Aditya Agarwal: Aditya Aggarwal is Data Science – Practice Leader at Abzooba Inc. More than 12 years of experience in driving business goals through data-driven solutions, Aditya specializes in predictive analytics, machine learning, business intelligence and business strategy. in a variety of industries. As Advanced Analytics Practice Leader at Abzooba, Aditya leads a team of more than 50 Energetic Data Science Professionals at Abzooba Who Are Solving Interesting Business Problems Using Machine Learning, deep learning, natural language processing and computer vision. Provides AI thought leadership to clients to translate their business goals into analytical problems and data-driven solutions. Under his leadership, various organizations have automated routine tasks, have reduced operating costs, increased team productivity and improved top and bottom line revenue. You have created solutions like the surrogacy engine, the price recommendation engine, predictive IoT sensor maintenance and more. Aditya has a Bachelor of Technology and a Bachelor of Business Administration from the Indian Institute of Technology (IIT), Delhi.

2. Dr. Rabbit Bose: The doctor. Arnab Bose is Chief Scientific Officer of Abzooba, a data analysis company, and adjunct professor at the University of Chicago, where he teaches machine learning and predictive analytics, machine learning operations, Time Series Analysis and Forecasting and Health Analytics in the Master of Science in Analytics program. He is a veteran of the predictive analytics industry of 20 years of enjoying using structured and unstructured data to predict and influence behavioral outcomes in healthcare, retail, finance and transportation. His current focus areas include health risk stratification and chronic disease management using machine learning., and production deployment and monitoring of machine learning models. Arnab has published book chapters and refereed articles in numerous conferences and magazines of the Institute of Electrical and Electronic Engineers (IEEE). He has received the Best Presentation at the American Control Conference and has given talks on data analysis at universities and companies in EE.. UU., Australia and India. Arnab has a master's and a doctorate. degrees in electrical engineering from the University of Southern California, and a B.Tech. in electrical engineering from Indian Institute of Technology in Kharagpur, India.

The media shown in this article is not the property of DataPeaker and is used at the author's discretion.

Subscribe to our Newsletter

We will not send you SPAM mail. We hate it as much as you.