Restricted access
 
 

October 29, 2010

Data Quality as the Biggest Issue of Data Integration

Filed under: Data Quality — Katherine Vasilega @ 2:18 am

Data quality is a big, but often a neglected issue of data integration. To better understand the ways of solving it, we should first define the subject and elements of data quality. Data quality is a process of arranging information so that individual records are accurate, updated, and correctly represented. In other words, good data quality means that company’s data is accurate, complete, consistent, timely, unique, and valid.

Poor data quality has two critical consequences:

    • It reduces the number of problems you can solve with a data integration solution in a given period of time
    • It increases the effort to solve a single problem

ETL in data integration has the biggest impact on data quality. When you design ETL processes, the typical focus is on collecting and merging information. If you add data quality rules, the ETL design becomes more complex.

There are a number of approaches to data quality in data integration, but regardless of that data has to fit a few objectives concerning its correctness, consistency, completeness, and validity. To do that, we have to put through our data over a certain process that involves extracting the very core of data, cleaning it off any unnecessary information, conforming to requirements and delivering high quality data.

Procedures to keep data quality must be practiced periodically to ensure a desired level of data integration quality standards. Records can not be duplicated, out of date, and unsynchronized. That is why it is crucial for an organization that performs data integration to appoint data stewards, who will be in charge of sustaining data quality.

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URL

Leave a comment