The first pillar of a data quality solution: Architecture-Technology

Contents

Screenshot 2014 03 24 at 16.15.48

From a business point of view, a data quality solution is built on four pillars: technology, know-how, processes and methodology. Let's take a closer look at the first one..

Certainly, technology is essential due to its intrinsic functionalities, continuous improvement of new functions, the support offered, etc., generating efficiency with respect to development times and a very significant cost reduction.

The architecture to be used in a data quality solution is made up of several components and, at the same time, each of these components is specialized in offering specific efficient solutions.

In this aspect, simply, The first thing we need is a module that gives us a complete vision of what our database is like with respect to the main attributes of the data..

Then, we need a module that allows us to develop business rules on the defects found in our database. This module must be completed with two more sub-modules: the duplicate identifier and phonetic identifier.

At the same time, the architecture must also provide the real dictionaries, to be used to identify and replace variants of a name with your real name automatically.

And to finish, and perhaps the most important module, It is the firewall that will prevent new erroneous data from entering systems again, since without it a data quality project would not make sense.

DQ techniques

DQ is a family of eight or more associated techniques. Data standardization is the most used method, followed by verifications, validations, monitoring, profiling, comparison, etc.

Institutions generally apply only one technique, sometimes a couple of them, and generally in a single data type. The most common is name and address cleanup applied to direct contact data sets, even though it hardly applies to data sets outside of direct marketing campaigns.

Similarly, deduplication techniques, unification and enrichment rarely applied outside the home context.

Many DQs focus solely on the customer's data domain. The reality is that other areas could also be improved, as products, financial data, partners, workers and locations.

DQ's current projects are a kind of quality hub that supports the exchange of data through various applications, having to support basic aggregation functions, standardization, identity resolution, etc., in real time.

DQ in real time

Gradual migration to real-time operation is the current trend in data management. This includes the data quality management data disciplines, data integration, master data management and complex event processing.

Between these, real-time quality management ranks second in growth, after MDM and just before integration.

Accelerated business processes require data cleansing and completion as soon as the data is created or modified to support customer service, next day delivery, operational BI, financial transactions, cross-sell and up-sell and sell. Marketing campaings.

Similarly, these same processes require real-time data exchange between multiple applications with overlapping responsibilities (as an example, a shared customer record between enterprise resource planning and CRM applications).

For these and other situations, real-time data quality reduces business risk and corrects or improves information while in motion in a business procedure.

Profiling

Continuous improvement of data quality is a challenge when you are unaware of the current state of your data and its use.. At the same time, understanding business data through profiling is a starting point for choosing which data needs special attention.

What is profiling? It is a series of techniques to identify erroneous data, null data, incomplete data, data without referential integrity, data that does not conform to the required format, business information patterns, trends, medias, standard deviations, etc.

A good profile is essential for two reasons: 1) the project analysis is realistic and reliable, Y 2) will allow us, from the second iteration, measure and compare the evolution of the project's governance quality indicators.

In this aspect, so that profiling becomes a must-have technique for DQ, must meet certain requirements:

It must be reusable

Profiling is generally focused on generating statistics about the data types and values ​​of a single column from a table in a database..

Despite this, a good tool should reveal the dependencies between multiple tables, databases and systems.

Data monitoring

Data tracking is a form of profiling, since each time it is used it measures the degree of progress in quality. This is the key to corroborating the continuous improvement of the data.

Supervision of the data quality procedure

This function compares the source and destination to verify that the data is loaded correctly, which is essential in any data quality procedure.

Architecture components

The architecture is made up of several items. Let's analyze them:

Data quality web services

It is a function for developing web services that are called from the PowerCenter Web Services Hub for the purpose of invoking mappings that contain Informatica Data Quality transformations or other processes or applications that name these web services. The fundamental advantage is that they make it possible to manage the information that enters the systems, avoiding manual information entry.

Identity resolution

Provides a dictionary of the most used words in the country to identify and relate the jargon.

AddressDoctor software library

Provides analysis functions, cleaning, address validation and standardization, as well as assignment of geographic coordinates. It is the essential truth dictionary to avoid having hundreds of street variants in the system.

Data explorer

Provides a client-server environment for three-dimensional profiling (column, table, cross table), orphan scan, key validation, identification and labeling of quality problems.

Data analyzer

Designed to analyze, standardize, to enrich, deduplicate, correct and report all types of master data, including customer data, products, inventories, assets and financial. Enables the development of personalized quality rules according to the specific needs of each client.

Data quality identity match option

Provide searches, matches and displays duplicates of data stored in relational databases and flat files.

Related Post:


Subscribe to our Newsletter

We will not send you SPAM mail. We hate it as much as you.