Restricted access

March 5, 2011

How Can Data Governance Serve Data Integration Projects?

Filed under: Data Integration, Data Quality — Katherine Vasilega @ 3:56 am

Data governance initiatives in an organization are intended to cover data quality, data management, and data policy issues. These activities are carried out by data stewards and a team that develops and implements business rules for administrating the use of data.

The focus on data governance is essential when the company has to implement a successful data integration strategy and use it for analysis, reporting, and decision-making. Here are some ways of making data integration projects more efficient with data governance:

    • It brings IT and business teams together. Data governance identifies what is really important to the business and helps establish business rules that are crucial for data integration.

    • A data governance program can help your company define and measure the potential ROI you get from maintaining data. You can use this information to calculate the ROI for data integration projects.

    • It helps you learn who’s responsible for the data quality. Data governance provides valuable information that enables to appoint data stewards and decision makers for data integration projects. Since data governance tells you who’s responsible for the data, you know where to go to resolve data quality issues.

    • Data governance can save you money, because it helps establish best practices and select cost-effective data integration and data quality tools.

Data governance and data integration are tightly connected with each other. You are not likely to enjoy data integration benefits without a strong governance program. On the other hand, data governance is only possible if your data is stored in an integrated system. My advice: make sensible use of both.

December 15, 2010

The Role of Data Stewards in Data Integration

Filed under: Data Quality — Katherine Vasilega @ 3:52 am

The amount of data gathered from different sources can very quickly become overwhelming. For effective data integration, all this data must be maintained and managed. This is where data stewards come into play.

Data stewards do not own the data for themselves and do not have complete control over it. Their main role is just to ensure that data will be accurate and that it will pass the quality standards agreed upon by the company. They perform their duties before, in the process, and after data integration, which helps maintain the information in the long run.

To be effective, data stewards need to work together with the database administrators, data architects, and anyone who is also involved in data management and data integration in the organization. Aside from technical skills, a data steward should have a clear way of communicating issues and ideas during the data integration process.

Responsibilities of a data steward include but are not limited to:

    • Ensuring that the new data doesn’t overlap any existing, contradicting data.
    • Looking for possible errors in the data structure.
    • Ensuring that the data is error-free.
    • Performing data warehousing
    • Approving the consistency of data

Data stewards are accountable for enhancing data quality, especially during data integration activities. Their primary role is to ensure that the data governance goals of the company are met.

December 13, 2010

Data Integration Best Practices: Using Taxonomies

Filed under: Data Integration, Data Quality — Katherine Vasilega @ 8:20 am

Data taxonomies are tree-structured classification systems, which provide increasing refinement of classes as you go deeper into the tree. Here are some tips for working with taxonomies when building a data integration solution.

    1. If the data is rich enough, you might not need taxonomies at all, as you may be able to find what you need using a keyword search. Taxonomies are only needed when there is no other data available to assist classification.

    2. Your taxonomy is never going to go away once you have it. Nodes are only going to be added to it, not removed. So keep it as small and simple as you can, and try to minimize the addition of new nodes.

    3. You have to understand what kind of the taxonomy is going to be used in the data integration solution. Most taxonomies are designed with human browsing in mind. On the other hand, they can be built with an intent to reduce the search space for an item when the data set is large. There may also be the need to automatically classify a data item into the taxonomy. The features that make a taxonomy detectable to business users are not be the same ones that make it easies to be processed by electronic systems.

    4. If you need a taxonomy for electronic systems, try to keep it small. This makes classifiers much easier to build.

    5. Have a precise data-labeling policy, don’t ever label a data point with both a parent and child class from the taxonomy.

You have to keep in mind that sometimes the need will arise to ingest a new data source into the existing system. This data source will have its own classification that will be not quite compatible with the existing one. This is why you should avoid deep and highly refined taxonomies in your data integration solution in general.

November 2, 2010

When To Use Integration Appliances For Data Integration?

Filed under: Data Integration, Data Quality — Katherine Vasilega @ 3:15 am

An integration appliance is a computer system specifically designed to lower the cost of integrating computer systems. It enables to integrate all types of applications and data directly from Web and on–premises resources, without installing any additional software. Very few vendors offer these solutions at the moment.

The concept of integration appliances is compelling and cost-effective. They offer a simplified environment and often include connectors and adapters that are ready for installation. Appliances are certainly appealing, but are they really sufficient for your data integration needs? Do they provide data quality?

If you need a simple method of data integration, these solutions will probably fit. But what if you need to integrate more complex data? With any enterprise data warehouse, there is a strong need to analyze and interpret data. Integrating the data and presenting it to the business users in a suitable manner is crucial, if you want to get the real value of data integration. With integration appliances, as soon as you start to add business rules, the process slows down. The bottleneck tends to happen around the integration layer component where data integration, separation, and transformation are handled.

Integration appliances are supposed to eliminate excessive complexity for users, but data integration experts agree that these solutions are not always a good fit for an organization. Until integration appliance vendors can offer solutions that integrate information with very complex transformations and business rules, the appliances will continue to fit the simple data integration needs and will not suit enterprise-level data management requirements.

October 29, 2010

Data Quality as the Biggest Issue of Data Integration

Filed under: Data Quality — Katherine Vasilega @ 2:18 am

Data quality is a big, but often a neglected issue of data integration. To better understand the ways of solving it, we should first define the subject and elements of data quality. Data quality is a process of arranging information so that individual records are accurate, updated, and correctly represented. In other words, good data quality means that company’s data is accurate, complete, consistent, timely, unique, and valid.

Poor data quality has two critical consequences:

    • It reduces the number of problems you can solve with a data integration solution in a given period of time
    • It increases the effort to solve a single problem

ETL in data integration has the biggest impact on data quality. When you design ETL processes, the typical focus is on collecting and merging information. If you add data quality rules, the ETL design becomes more complex.

There are a number of approaches to data quality in data integration, but regardless of that data has to fit a few objectives concerning its correctness, consistency, completeness, and validity. To do that, we have to put through our data over a certain process that involves extracting the very core of data, cleaning it off any unnecessary information, conforming to requirements and delivering high quality data.

Procedures to keep data quality must be practiced periodically to ensure a desired level of data integration quality standards. Records can not be duplicated, out of date, and unsynchronized. That is why it is crucial for an organization that performs data integration to appoint data stewards, who will be in charge of sustaining data quality.

October 18, 2010

Gartner names approaches to successful data integration projects

Filed under: Data Integration, Data Quality — Katherine Vasilega @ 6:31 am

It is pretty clear today that there is no a “one size fits all” approach to data integration. With all the tools, techniques, and methodologies available, data integration is still a great challenge that organizations face. According to the global research firm Gartner, companies worldwide have spent more than $1.5 billion on integration. Gartner forecasts that companies will purchase and consume much more integration services in the next five years.

Last week, Ted Friedman, a vice president at Gartner, named five key factors to make data integration projects successful. They are based on the information elicited from surveys and conversations with Gartner clients. These factors include:

Standardization means that organizations should focus on repeatable processes and approaches for dealing with data integration issues.

Diversification is about employing a wider variety of tools that meet the needs of the particular business.

Unification stands for determining how to best link the combinations of available tools and architectures in a synergistic way.

Leveraging data-integration technology to its fullest implies that when data integration has a positive impact on the business, organizations still need to focus on ways to increase the breadth of business impact.

Governance is seen as an insurance policy to get the optimal value out of all data integration investments.

Though being a challenge, a successful data integration project enables an organization to save on expensive upgrading costs by achieving the full potential of their current systems. That is why it is so important to carry out complete research and analysis before actually introducing a data integration solution.

October 15, 2010

Data Integration Quality Techniques

Filed under: Data Integration, Data Quality — Tags: — Katherine Vasilega @ 6:56 am

Data integration experts consider information quality the main attribute for business users. Not only does the user need information to be delivered on time, but s/he also wants this information to be of a certain quality. Thus, data integration quality criteria are required.

Data integration technical quality criteria, such as metrics and thresholds, should be defined first. These data quality criteria are business independent, contemporary data integration technologies can automatically evaluate them. These criteria are:

  • Data types
  • Data domain compliance (domains refer to a set of allowable values. For structured data, this can be a list of values, such as postal codes, a range of values between 1 and 100, etc.)
  • Statistical features of the data set (maximum value, minimum value, population distributions)
  • Referential relationships

Other measures of data quality involve business rules compliance. For example, a mobile operator may establish as a business rule the number of months an account has a positive credit balance. All accounts with more than two months of negative balance are considered invalid. These criteria cannot be automatically evaluated by a data integration system, although data integration technologies allow business rules to be programmed.

Business rules can automate the decisions that the company makes in its day-to-day operations. This type of data integration rules can be used to audit data for compliance with both external and internal regulations and policies.

As I have mentioned in my previous posts, before acquiring a data integration solution, an organization should establish a set of business rules that later have to be transformed into the data integration tool. Now you can see why—business rules are efficient means to achieve enhancement in data quality.

October 8, 2010

Data Integration in Social Networks: Is It Real?

Filed under: Data Integration, Data Quality — Tags: , — Katherine Vasilega @ 8:07 am

Social networking is growing rapidly. The large number of social networks has resulted in vast—but diverse—information about individuals. In order to put this data to commercial use, we need a smart solution to integrate all information available among different social networks. This is quite a challenging task for any data integration software, and let me explain why.

First, there are no restrictions on the amount of data which a user can publish in social networks. In addition, this massive amount of data is not necessarily structured. Therefore, data integration amongst social networks may become a headache.

Second, there are plenty of privacy and security concerns in social networks. Forged identity is extremely difficult to track and its prevention is kind of impossible. There are no means for proper monitoring of unauthorized access to data in social networks. Anyone can create a profile for Bill Gates, Barack Obama, or Charlie Chaplin. Misrepresentation of information in social network may lead to incorrect data mapping, which creates obstacles to developing a consistent, single view of data.

Data integration in social networks is now generating a lot of interest and is definitely a future trend of the data integration development. However, the potentially high commercial value attached to the development of social networks is really hard to be utilized in full. There are still too many privacy and consistency issues related to data integration in social networking. At the moment, they slow down the development process of a comprehensive data integration solution.

Though some attempts are taken to integrate data from social networks, these solutions can not yet be applied to commercial use.

October 6, 2010

Data Integration: Useful Reading about Lean Integration

Filed under: Data Integration, Data Quality — Tags: — Katherine Vasilega @ 3:11 am

The other day, I came across a book on data integration—“Lean Integration: An Integration Factory Approach to Business Agility” by David Lyle and John G. Schmidt that inspired me to devote today’s post to this topic. Originally developed for the Toyota Production System, lean management is now applied in a number of industries, including IT. Lean Integration is a management system that emphasizes creating value for customers, continuous improvement, and eliminating waste as substantial parts of data integration policy.

Here are Lean Data Integration Principles, as described in the book:

Focus on the customer and eliminate waste
by deleting the data that is not needed, preventing redundant efforts and inefficient processes.

Automate processes
with a well-organized approach taking advantage of reusable templates, components, and business rules.

Continuously improve to enable managing data with graphic tools, constantly checking data quality, and providing more business collaboration within the data integration solution.

Empower the team with tools that can be used by non-technical professionals to analyze and manage data.

Build in quality of the data integration solution. The solution should enable to identify problems on early stages.

Plan for change and mass-customize with data virtualization and logical data objects that can be reused without damaging other systems.

Optimize the whole with a single data integration platform designed to enable lean integration.

The book leads you through these principles and tells you how to apply them in practice. It features dozens of data integration case studies and real-life examples. I think, this is a pretty useful piece of advice on quality data integration that can be worth your attention.

October 4, 2010

Data Integration Categories

Filed under: Data Integration, Data Quality — Tags: , , — Katherine Vasilega @ 6:45 am

There are three major data categories to consider when carrying out data integration initiatives. They require a clear understanding to help find a proper data integration solution. Here is a brief description of each category.

1. Master Data. Also called reference data, master data is any information that is considered to play a key role in the business. Master data may include information about customers, products, employees, locations, inventory, suppliers, and more. Master data is stored in the Data Warehouse.

2. Operational Transaction Data. This data includes the information about the activities, such as purchases, call details, claims, transactions, and so on. This data is stored in the Operational Data Store and is considered low-level data with limited history that is captured “real time” or “near real time” as opposed to the much greater volumes of master data.

3. Decision Support Data.
This data category includes historic data used in strategic and tactical analyses. Trends, patterns, data mining, and multi-dimensional analytics can then be used in Decision Support systems that are able to provide predicted outcomes from different scenarios and strategies, so answering “what if?” questions.

All three data types require similar processes, as data must be collected, cleaned, integrated, and populated into the repository. In addition, the three forms of data share many of the same data integration technologies: ETL, hardware, software, applications.

Whether you create a distinct data integration solution for each data type, or a single data integration solution for all three types, you have to study what data integration vendors are offering and choose the best technology to fit your needs.

« Older Posts