Restricted access

December 13, 2010

Data Integration Best Practices: Using Taxonomies

Filed under: Data Integration, Data Quality — Katherine Vasilega @ 8:20 am

Data taxonomies are tree-structured classification systems, which provide increasing refinement of classes as you go deeper into the tree. Here are some tips for working with taxonomies when building a data integration solution.

    1. If the data is rich enough, you might not need taxonomies at all, as you may be able to find what you need using a keyword search. Taxonomies are only needed when there is no other data available to assist classification.

    2. Your taxonomy is never going to go away once you have it. Nodes are only going to be added to it, not removed. So keep it as small and simple as you can, and try to minimize the addition of new nodes.

    3. You have to understand what kind of the taxonomy is going to be used in the data integration solution. Most taxonomies are designed with human browsing in mind. On the other hand, they can be built with an intent to reduce the search space for an item when the data set is large. There may also be the need to automatically classify a data item into the taxonomy. The features that make a taxonomy detectable to business users are not be the same ones that make it easies to be processed by electronic systems.

    4. If you need a taxonomy for electronic systems, try to keep it small. This makes classifiers much easier to build.

    5. Have a precise data-labeling policy, don’t ever label a data point with both a parent and child class from the taxonomy.

You have to keep in mind that sometimes the need will arise to ingest a new data source into the existing system. This data source will have its own classification that will be not quite compatible with the existing one. This is why you should avoid deep and highly refined taxonomies in your data integration solution in general.

December 10, 2010

Data Integration Predictions for 2011

Filed under: Data Integration — Katherine Vasilega @ 7:59 am

It’s time to make prognosis about the upcoming year 2011. Here are some trends in data integration that are likely to get further development in 2011.

Enhanced data availability

The data is not going to be locked up at the corporate warehouses anymore. Many businesses move to the Cloud, and so will their master data. Organizations start seeing benefits of sharing information and making their data more open.

Business and IT will converge more

The difference between IT staff and marketing teams gets less obvious. Business people get more and more involved in using data integration techniques in their everyday activities. Business people have to be more educated about information technologies. On the other hand, IT specialists need marketing skills to promote their projects and tools. Successful data integration initiatives are impossible without involving both IT and business users.

Data integration tools will enhance further

Data integration and migration tools will become more user-friendly as business users need access them to manage data. Future tools will focus on work flow features, reporting and better graphical user interfaces to provide business users with more opportunities.

In 2011, the business will rely on data more, than it ever did before. Today, digital data is a huge part of our lives. No matter if the economy turns up or down, data integration industry will continue to deliver sophisticated solutions, to provide top quality of data.

December 9, 2010

Quality of Transformed Data in Data Integration

Filed under: Data Integration — Katherine Vasilega @ 7:41 am

Ensuring data quality after transformation is the most difficult part of data integration procedures. Data transformation algorithms often rely on the theoretical data definitions and data models, rather than on actual information about data content. Since this information is usually incomplete, outdated, and incorrect, the converted data looks nothing like what was expected before the data integration project started.

Every system consists of three layers: database, business rules, and user interface. As a result, what users see is not what is actually stored in the database. This is especially true for legacy systems, which are notorious for elaborate hidden business rules. Even if the data is transformed with accuracy, the information that comes out of the new system will be totally incorrect, if you are not aware of those rules.

Moreover, the source data itself can be in issue in data integration. Inaccurate data tends to spread like a virus during the transformation process. A data cleansing initiative is typically necessary and must be performed before, rather than after, transformation.

To gain data quality, you have to precede the transformation stage with extensive data profiling and analysis. In fact, data quality after the transformation is directly related to the amount of knowledge about the actual data you possess. Lack of an in-depth analysis will guarantee a significant loss of data quality in data integration. In an ideal data integration project, 80 percent of the time should be spent on data analysis, and 20 percent on designing transformation rules. In practice, however, this rarely occurs. Therefore, the initial stage of data integration process needs full attention of your team.

December 6, 2010

Data Integration Models

Filed under: Data Integration — Katherine Vasilega @ 8:37 am

To better understand the process of data integration, it’s helpful to consider integration models. Identifying the data integration model that suits your company, enables you to match up your requirements with data integration tools and technologies you need.

Simple information transformation: transforming one schema to another, without the ability to leverage logical operators, just moving and changing the data.

Transformation with logical operators (e.g., “If—then”): these data integration solutions deal with transformations in your data based upon content, lookup, or external information, such as time and date.

Complex transformation: data integration solutions that deal with complex schemas and semantic management. The software may include nested transformations and complex logic, like entire programs that are attached to a transformation.

Schemas with transformation bound to processes: the data integration solution with the ability to bind information flow, transformation, and logic to a process.

Transformations with information bound to services:
this model includes integration with Web services. This data integration model also includes the solutions that can abstract services and data in many physical databases.

Your data integration requirements may not be limited to these models. That is why you have to carefully select the data integration technology that can get you from simple data integration solution to more sophisticated concepts.

December 3, 2010

External Data in Business Data Integration

Filed under: Data Integration — Tags: — Katherine Vasilega @ 7:59 am

One of the great opportunities for business data integration today is the potential for buying and sharing data via the Cloud. Adding the data originating outside of your organization can be a strategic advantage, as it provides a huge value increase of your data integration efforts.

The majority of organizations implement business intelligence systems that are internally focused. They show data from within the company: products, sales, contact information of customers, etc. These systems might be able to present you only the opportunities embedded in the data you already have.

But how much do you know about “good” and “bad” customers except for their names and addresses? Do you know their income level, likes and dislikes, social habits? This information can be crucial for your marketing policy. It gives you a ground for the right decision-making. It’s a good idea to plan your business data integration initiatives in a way that they will support data from external sources sold as commercial Web data services.

A data integration solution can include connecting with Web services that provide demographics and social information about your existing and potential customers. You can receive up-to-date information based on phone verification services, demographics services, social research studies, etc.

Income level, household size, number of cars owned, level of crime in the local area, and average level of local Facebook usage are all examples of external data that can be used to evaluate your customers. Tying this information to your existing customer records will give you a wider perspective of your target audience.

And the final remark: before subscribing to the external Web data services, you have to evaluate your data quality and the ability of your business data integration solution to handle large volumes of data.

December 2, 2010

Data Integration Approaches For Effective Decision-Making

Filed under: Data Integration — Katherine Vasilega @ 8:14 am

To understand the complicated world of data integration, it is makes sense to learn what technologies and approaches adjoin this discipline. Today I’d like to talk about three important aspects of data integration: mashups, complex event processing, and change data capture.

Mashup is a Web page or an application that integrates two or more elements from different data sources to provide unique information. An example of a mashup is the integration of business addresses and online maps to quickly see where you have to deliver your goods or services. For business users, mashups allow integrating sales data with up-to-date prices and then displaying the real-time results within a single page.

Complex event processing is a data integration approach that allows following different events across all the layers of an organization. It enables to identify the most meaningful events, analyze their impact, and take decisions. CEP enables organizations to quickly engage the continuously changing information. Business users can monitor, analyze and act on data streams.

Change data capture is the process of identifying important changes made to the information in data sources. You can then apply the changes throughout an enterprise to ensure that data in different systems remains synchronized. CDC technology minimizes the IT resources required for ETL processes because it only deals with updates and other data changes.

Access to the important data for effective decision-making is the main goal of data integration. The above-mentioned technologies and approaches help avoid costly mistakes that occur when you don’t react to market events as quick as you can.

November 26, 2010

Data Virtualization in Data Integration

Filed under: Data Integration — Katherine Vasilega @ 8:27 am

Data virtualization is a method of data integration that enables to gather data contained within a variety of databases in a single virtual warehouse. The process of data virtualization includes four major steps:

    1. Organizing software interfaces to understand the structure of data sources and their level of security.

    2. Bringing these data structures to a single data integration solution for viewing and administration.

    3. Establishing a true metadata abstraction layer, which can be used for data organization, data management, data quality control, etc.

    4. Synchronize the data across the various sources.

Data virtualization combines various data warehouses into a single and uniform data source without actually migrating the physical data. This data integration technology has many other business benefits, including:

    • Lower costs for physical and virtual data integration
    • Maximized agility by avoiding data movement, promoting reuse and ensuring data quality
    • Improved security by utilizing an abstraction layer to minimize the impact of change
    • Making the data available to various consuming applications: CRM, ERP, Cloud computing platforms, etc.

This positions data virtualization as a powerful data integration technology. It has the required functionality to seamlessly blend various Cloud architectures and on-premises applications. This tremendously simplifies the issues associated with data integration and ongoing data management.

November 25, 2010

Two Major Data Integration Trends of 2010

Filed under: Data Integration — Katherine Vasilega @ 7:00 am

Data integration and ETL are constantly growing technologies. By the end of the year, we can clearly see the directions in which they are going to develop in the future. Here are the top data integration trends of 2010 that make data integration professionals happy.

Open APIs provide organizations with a new way to access online business capabilities. To connect to Amazon, Google, Facebook, and thousands of other companies, you don’t have to contact their staff. You can just use their open APIs for both data integration and application integration. This will enable you to

    • Provide a simple way to integrate your business with capabilities offered by other businesses.
    • Streamline the integration with other online service providers.
    • Easily access their data.
    • Provide a wider access to your services or products by enabling the presence of your product in hundreds and thousands of other places, where vendors sell related products or services.

Data Integration in the Cloud
was positioned as one of the top reasons companies were uncomfortable with moving their data and applications off-premises. Many businesses were concerned about security issues and could not bear a thought of storing their data with third-parties. This year, however, we see that the situation has changed. As major IT players move their integration solutions to the Cloud, many organizations have overcome their fears and are willing to join the current trend.

These two integration trends are the pure bliss for data integration professionals. They make small and large businesses realize that data integration is vital. They might be a reason for the growing data integration specialists demand. What is more important, they certainly inspire developers to create more sophisticated data integration solutions and ETL tools.

November 24, 2010

Customer Data Integration: Tips and Tricks

Filed under: Data Integration — Katherine Vasilega @ 8:02 am

Customer data integration is one of the most challenging tasks in the integration field. As you know, customer data can be very complex. For example, there can be a dozen fields to represent information about the customer in the source system and they can all have a different structure.

Here are the most common customer data integration issues and tips on how to solve them.

1. Explain the need for customer data integration to your employees.
Explain them, that no matter how good the current data warehouse/ CRM system is, it is not complete enough to provide the relevant information. Make sure that every person is ready to collaborate.

2. Formulate more than one business problem that customer data integration can solve. Your CDI efforts should be positioned as an ongoing program that can fit various business needs.

3. Set the functional requirements. Don’t rush to make a list of vendors and their solutions. You have to decide what functionality you need first. Data management requires a great focus on functional requirements. Until you have your list of features in hand, you won’t be able to pick up the data integration solution.

4. Hire the qualified IT personnel to help you with customer data integration. Many companies are so accustomed to buying out of the box applications that they don’t think it is necessary to have an in-house IT specialist. Once again, customer data application is a complicated task and you are sure to need that specialist on your team.

Customer data integration ensures that all relevant departments in the company have constant access to the most current and complete view of customer information. Properly conducted, CDI is the mot valuable tool of decision-making.

November 23, 2010

Data Integration in the Cloud Tips

Filed under: Data Integration — Katherine Vasilega @ 6:33 am

More and more businesses are moving to the Cloud, integrating their existing IT systems, applications, and data. Here are some recommendations on successful data integration in the Cloud.

1. Create a strategy. You should have a plan and develop a long-term Cloud strategy closely tied to the overall business process. Your data integration project should have a set of goals and priorities, a budget, and a deadline.

2. Use an integrated approach. A standalone approach to Cloud computing delivers only short-term value. It will require future re-implementation of data integration solutions or a full data migration procedure. You have to leverage data both on- and off-premises, therefore only an integrated approach to the Cloud infrastructure will deliver long-term results.

3. Get business users engaged. SaaS applications are designed to be managed by business users. Business users are data experts, who understand all the meaning of data in the warehouse. Cloud data integration should minimize development, implementation and maintenance resources, allowing business users to focus on their core activities.

4. Keep security in mind. Data integration in the Cloud involves moving sensitive data between the Cloud and local networks, therefore security issues are vital. When selecting a data integration solution, pay special attention to the standards supported for securing the data in transit.

5. Maximize connectivity options. Cloud computing has become a vast definition for services on the Web—it includes everything from SaaS, PaaS, to Web-based utilities, social networks, and so on. Connectivity requirements will continue to grow beyond standard enterprise applications, legacy systems and databases, to various Web services yet to come.

To avoid data integration headaches, you have to be consistent, meaning that every developer and business user has to know what to do and follow a clear strategy and a set of requirements. If you take the right approach to data integration in the Cloud, you will then utilize the solution without the need for additional staff to set up and maintain your data warehouse.

« Older PostsNewer Posts »