Restricted access

October 20, 2008

The Dos and Don’ts of Data Integration

Filed under: Data Cleansing, Data Integration, Data Migration, Data Quality, Data Warehousing, EAI, ETL — Alena Semeshko @ 2:25 am

Don’t waste time and resources on creating what’s already there.
Extracting and normalizing customer data from multiple sources is the biggest challenge with client data management, according to the Aberdeen Group. OK, true, a lot of companies consider linking and migrating customer information between databases, files and applications a sticky, if not risky, process to deal with. Gartner says corporate developers spend approximately 65 percent of their effort building bridges between applications. That much! No wonder they risk losing lots and lots of data, not even mentioning the time and efforts this may involve. Why spend time on creating what’s already there?

Find an integration provider that suits you. There are plenty of vendors. Of course, there isn’t a universal integrator that would suit everyone, as each tries to cover a a certain area and solve a particular problem. So, you just need to spend a bit of time looking for the right vendor.

Don’t let expenses frighten you.
In today’s enterprises, most data integration projects never get built. Why? Because most companies consider the ROI (Return on Investment) on the small projects simply too low to justify bringing in expensive middleware. Yeah, so you have your customer data in two sources and want to integrate (or synchronize). But then you think “Hey, it costs too much, I might as well leave everything as it is. It worked up till now, it’ll work just as well in the future.” Then after a while you find yourself lost between the systems, the data they contain, trying to figure which information is more up-to-date and accurate? Guess what? You’re losing again.

If ROI is an issue, consider open source software. With open source data integration tools you could have your pie and eat it too. Open source can offer a cost-effective visual data integration solutions to the companies that previously found proprietary data integration, ETL, and EAI tools expensive and complicated.

Not having to pay license fees for BI and data integration software should make companies previously scared of insufficient ROI return to the data integration market.

September 2, 2008

Data Quality Metrics in Data Warehousing

Filed under: Data Quality, Data Warehousing — Alena Semeshko @ 3:26 am

A question was posed to a expert as to what metrics should be used for a data warehousing project.

The expert (William McKnight from Lucidity Consulting) recommended the following three as most valuable:

# Business return on investment (ROI) - Are you getting the bottom line success with your project?
# Data usage - Is your data used as intended by the users?
# Data gathering and availability - Is your data available to the extent it should be?

He also mentioned up time, cycle end times, successful loads and clean data levels as secondary technical metrics to pay attention to.

In short, you want to eliminate intolerable defects – as defined by the data stewards. These defects come in 10 different categories: referential integrity, uniqueness/deduplication, cardinality, subtype/supertype constructs, value domains/bounds, formatting errors, contingency conditions, calculations, correctness and conformance to “clean” set of values.

August 28, 2008

B-eye Network DWA Survey

Filed under: Data Warehousing — Alena Semeshko @ 5:50 am

Business Intelligence Network is holding an online survey about Data Warehouse Appliances (DWA). I wouldn’t mind seeing the results of this. You can get a copy of the DWA research report firsthand, by the way, by participating in the survey (and it won’t take more than 5 minutes).

Over here.

July 28, 2008

Maximizing Data Warehouse ROI

Filed under: Data Warehousing — Alena Semeshko @ 3:29 am

The post in Beye Blogs gets to the core of Data Warehousing and explores what you need to enhance your ROI. Here are some extracts:

Having most granular (or detailed) transaction level data is core to broad-basing the Data Warehouse applications.

There reasons for using Data Warehouse as a single reference information source are:

1. Maintain consistency
2. If your production data needs an offline fix (like standardizing customer and product IDs), its better to do that data-fix in one place. If you have separate enterprise reporting and analysis platforms, you will need to do that data transformation at two places, instead of one.
3. Data Auditability: A single information reference point having detailed data will provide a good audit-trail of your summary transactions/analysis.
4. ETL synergy: If you have diverse systems, and you want to have some level of information integration, its better to do it at one place. Doing ETL for summary data warehouse and a detailed reporting database, will almost double your efforts.
5. Overall platform ease: You maintain only one information infrastructure (administration, scheduling, publishing, performance tuning…).
6. Ease of Change Management: Any change in your information requirements, or changes in your source systems will be managed and done at one place.

So then with all these benefits, why is there so much fuss about granular data in data-warehouse? Perhaps because it’s better? That’s how:

1. Brings forth the real issues with transactional data: In summary data warehouses, you can ignore some of the transaction level data issues and do some patch-work to ensure that aggregated data has a level of acceptable quality. Bringing in granular data, will need more incisive surgery on your data issues. This will extend the time of implementation.
2. ETL efforts go up: This is related to the first point. Your key plumbing task in DW will become larger and more complex.
3. Existing robust and stable reporting and querying platforms: Why to fix, which ain’t broken? etc…

June 4, 2008

International Data Warehousing and Business Intelligence Summit 2008

Filed under: Data Warehousing — Alena Semeshko @ 1:01 am

Making plans to visit the upcoming International Data Warehousing and Business Intelligence Summit 2008? I’m sure it’s worth attending.

The event will be held in Residenza di Ripetta, Via di Ripetta 231, 00193 Roma (RM)

The main topics this summit will focus on include:

  • Beyond the Data Warehouse: New Approaches to Business Intelligence
  • Data Streams, Complex Events, and BI
  • BI at Your Service: A Look at the Role of SaaS
  • The Data Appliance Explosion: Using Appliances for Data Warehousing and Analytical Processing
  • Are Low-Cost and Open Source BI Solutions Right for You?
  • Voice of the Customer: Better CRM Through Text and BI Integration
  • Extending the Reach of Data Mining
  • Where are the BI and Data Warehousing Industries Heading?
  • An Evolutionary Approach to Master Data Management
  • Achieving RFID Information Excellence
  • Mining and Delivering Web Content for Business Benefit
  • Exploiting the Power of Search in BI

May 26, 2008

Data Integration in a Nutshell

Filed under: Data Integration, Data Warehousing — Alena Semeshko @ 11:09 pm

I just came across a superb article that chews data integration out for you the best possible way! For business users that do not write code, it contains a comprehensive description of data integration approaches to keep in mind when selecting an integration provider to work with. I personaly consider manual integration, where you need to do all the work and write code quite limited as compared to “application approach”, where a ready application will do everything for you.

The applications, which are specialized computer programs, would locate, retrieve and integrate the information for you. During the integration process, the applications must manipulate the data so that the information from one source is compatible with the information from the other source.

Most data integration system designers assume that the end goal is to create as little work for the end user as possible, so they tend to focus on applications and data warehousing techniques.” That’s the idea: the easier for the end user (I particularly mean corporate users that cannot afford to lose time on integrating their customer lists and data manually), the better!

May 19, 2008

ILM howtos

Filed under: Data Quality, Data Warehousing — Alena Semeshko @ 11:41 pm

There’s an insightful article by Mike Karp on ILM (information lifecycle managememnt) and the six steps of implementing a successful and efficient policy on data storage, verification, classification and management. Mike identifies the following steps to follow to ensure your ILM efficiency:
Stage 1. Preliminary
1) Determine whether your company’s data is answerable to regulatory demands.
2) Determine whether your company uses its storage in an optimal manner.

Stage 2. Identifying file type, users accessing the data and key words used.
1) Make a list of regulatory requirements that may apply. Get this from your legal department or compliance office.
2) Define stakeholder needs. You must understand what users need and what they consider to be nonnegotiable.
3) Third, verify the data life cycles. Verify the value change for each life cycle with at least two other sources, a second source within the department that owns the data (if that is politically impossible, raise the issue through management), and someone familiar with the potential legal issues.
4) Define success criteria and get them widely accepted.

Stage 3. Classification (aligning your stakeholders’ business requirements to the IT infrastructure).
0) Identifying the business value of each type of data object, i.e. understanding three things: what kind of data you are dealing with, who will be using it and what its keywords are.
1) Create classification rules.
2) Build retention policies.

When you engage with the vendors, make sure to understand their products’ capabilities in each of the following areas:
* Ability to tag files as compliant for each required regulation.
* Data classification.
* Data deduplication.
* Disaster recovery and business continuity.
* Discovery of compliance-answerable files across Windows, Linux, Unix and any other operating systems you may have.
* Fully automated file migration based on locally set migration policies.
* Integration with backup, recovery and archiving solutions already on-site.
* Searching (both tag-based and other metadata-based).
* Security (access control, identity management and encryption).
* Security (antivirus).
* Set policies to move files to appropriate storage devices (content-addressed storage, WORM tape).
* Finding and tagging outdated, unused and unwanted files for demotion to a lower storage tier.
* Tracking access to and lineage of objects through their life cycle.

Finally, when you know your vendor, you can look for solutions to automate the needed processes and phase-in.

See full article for more details.

March 20, 2008

5 things to Watch out for in Data Warehousing

Filed under: Data Cleansing, Data Integration, Data Quality, Data Warehousing — Tags: — Alena Semeshko @ 7:45 am

There’s been talk of the concept of data warehousing being misleading, failing to deliver efficient solutions at the enterprise level and frequently causing problems upon implementation. Problems like that, again, don’t come out of nowhere, there usually are good reasons behind them. In this post I’l try to sum up a few things you should definitely try to watch out for when tackling your data warehouses:

1) First and foremost – Data Quality. When your data is dirty, outdated and/or inconsistent upon entering the warehouse, the results you are gonna get won’t be any better, really. Data Warehousing is not supposed to deal with your erroneous data, it’s not supposed to perform data cleansing. These processes need to take place BEFORE your data gets even close to the warehouse, that I s, your data integration strategy needs to address low quality data problem.

2) Come to think of it, Data Integration is the second thing to watch out for. Do your integration tools live up to your requirements? Can your software handle the data volumes you have? Will it comply with the newly added to your warehouse source systems and subject areas? How high is the level of automation of your integration system? Can you avoid mannual intervention? You gotta ask yourself all of these questions before you complain that your warehous isn’t providing you with the quality of information you expected.

3) Next, dreaming too big. When you build sand castles you gotta realize they’ll disappear in a matter of days, even hours. Your can’t have it all and at the same time, you can’t have your pie and eat it too. Brreaking the project into small segments, giving them enough time to deliver and having patience is the key to having a pleasant experience with your data warehousing solution. What? Did you think you can fix all the mess in your data in a matter of days? =)

4) Then, don’t go rushing into solutions. Don’t panic. Yes, warehouse projects require time and effort on your part. Yes, it’s gonna be complicated at first. But that’s not the reason to stop with one project and rush into another. Stick with your first choice, fix it, work on it. Multiple projects will waste your resources and end up as another silo aimlessly taking up your corporate resources.

5) Finally, make sure you have a scalable architecture that you can redesign according to your increasing needs. Your business grows, sometimes grows quicker than you think (the number of customers increases, they have more information, more data to be processed) and you want your solution to continue to perform on the same level and live up to your expectations.

The list goes on actually, as there are more things to watch out for… but these are the first that come to mind. =)

March 5, 2008

How Data Warehousing Rules

Filed under: Data Warehousing — Tags: — Alena Semeshko @ 2:24 am

Back to yesterday’s post on BI. With Data Warehousing being an indispensible attribute of BI, I’d say it’s also one of the key components in making the company’s decision-making lifecycle more eficient and productive.

March 4, 2008

Why Data Warehousing? Why Business Intelligence?

Filed under: Data Warehousing — Tags: , — Alena Semeshko @ 8:33 am

In the world of Business Intelligence there’s no place for people who manage by gut. Auch that hurts, huh? But that’s true. People who use their intuition or gut feeling to make major decisions in business usually lose to those using BI in support of management decisions. It’s like with cars: your serviceman knows exactly what that noise under your hood means and what has to be repaired or replaced in your car, while you might only suspect that something’s wrong with the engine or breaks or gearbox and if you were to repair your car, you’d be more likely to break something else than repair what’s broken. Employing BI strategies and techniques, like, for instance, data warehousing, provides the security and assurance you need to keep your business up and be sure of your decisions.

When success depends on how quickly a company responds to rapidly changing market conditions, BI is where you turn for help. It fast-forwards the decision-making processes and provides you with the insight necessary to make the right decisions faster.

With the modern technologies of data integration, warehousing and analysis, you get a single complete view of your organization’s past, present and potential future with the major problematic areas already figured out for you. All that is left is for this perspective to be put into action.

*get going*

Newer Posts »