Restricted access
 
 

December 13, 2010

Data Integration Best Practices: Using Taxonomies

Filed under: Data Integration, Data Quality — Katherine Vasilega @ 8:20 am

Data taxonomies are tree-structured classification systems, which provide increasing refinement of classes as you go deeper into the tree. Here are some tips for working with taxonomies when building a data integration solution.

    1. If the data is rich enough, you might not need taxonomies at all, as you may be able to find what you need using a keyword search. Taxonomies are only needed when there is no other data available to assist classification.

    2. Your taxonomy is never going to go away once you have it. Nodes are only going to be added to it, not removed. So keep it as small and simple as you can, and try to minimize the addition of new nodes.

    3. You have to understand what kind of the taxonomy is going to be used in the data integration solution. Most taxonomies are designed with human browsing in mind. On the other hand, they can be built with an intent to reduce the search space for an item when the data set is large. There may also be the need to automatically classify a data item into the taxonomy. The features that make a taxonomy detectable to business users are not be the same ones that make it easies to be processed by electronic systems.

    4. If you need a taxonomy for electronic systems, try to keep it small. This makes classifiers much easier to build.

    5. Have a precise data-labeling policy, don’t ever label a data point with both a parent and child class from the taxonomy.

You have to keep in mind that sometimes the need will arise to ingest a new data source into the existing system. This data source will have its own classification that will be not quite compatible with the existing one. This is why you should avoid deep and highly refined taxonomies in your data integration solution in general.

December 10, 2010

Data Integration Predictions for 2011

Filed under: Data Integration — Katherine Vasilega @ 7:59 am

December 9, 2010

Quality of Transformed Data in Data Integration

Filed under: Data Integration — Katherine Vasilega @ 7:41 am

December 6, 2010

Data Integration Models

Filed under: Data Integration — Katherine Vasilega @ 8:37 am

December 3, 2010

External Data in Business Data Integration

Filed under: Data Integration — Tags: — Katherine Vasilega @ 7:59 am

December 2, 2010

Data Integration Approaches For Effective Decision-Making

Filed under: Data Integration — Katherine Vasilega @ 8:14 am

November 26, 2010

Data Virtualization in Data Integration

Filed under: Data Integration — Katherine Vasilega @ 8:27 am

November 25, 2010

Two Major Data Integration Trends of 2010

Filed under: Data Integration — Katherine Vasilega @ 7:00 am

November 24, 2010

Customer Data Integration: Tips and Tricks

Filed under: Data Integration — Katherine Vasilega @ 8:02 am

November 23, 2010

Data Integration in the Cloud Tips

Filed under: Data Integration — Katherine Vasilega @ 6:33 am
« Older PostsNewer Posts »