Restricted access

April 5, 2010

ETL Faces New Challenges

Filed under: Data Integration, ETL — Tags: , , — Olga Belokurskaya @ 3:30 am

Today, well-established data integration process is necessary for a sound business. Business information is a valuable asset; companies’ decision-makers depend greatly on the data they receive, its quality, value, and actuality. As the amounts of data companies work with grow exponentially, the requirements to ETL systems get more complex. Today, ETL providers face some new challenges along with traditional data integration issues:

Scalability. ETL systems need to be able to process large volumes of data that intend to keep growing. Moreover, today’s business reality requires getting more data in less time. So, scalable ETL is a must.

Operability. A large company’s IT system comprises multiple disparate sources of business-critical data, such as databases, CRM systems, etc. These days, ETL tool should have connectivity to all those systems. Ah, moreover, data integration between all data sources often requires complex transformations to make the data fit the formats common for this or that system.

Real-time data integration. This requirement is being heard more and more often. The need for real-time data demands from ETL systems the ability to process extract-transform-load operations and gather all the data in a standard, homogeneous environment in a really short period of time.

Finally, the Cloud. As cloud offerings get mature and provide some beneficial solutions (especially for small and mid-sized business), companies choose to move parts of their applications to the cloud. Providing the connectivity to cloud systems is a today’s ETL challenge, as well.

December 18, 2009

On a Couple of Misconceptions About ETL Tools

Filed under: Data Integration, ETL, Open Source — Tags: , — Olga Belokurskaya @ 8:20 am

When deciding to start a data integration process, many companies consider using ETL tools instead of hand-coding. Such a decision is justified by the fact – and many data integration experts agree with it – that hand coding is error prone, takes time and additional resources, etc. However, it’s also wrong to assume that an ETL tool will help to finish data integration project sooner, or will result in some substantial cost savings, according to a TDWI.

Their point is that though ETL tools definitely accelerate the process of data integration at some level, one should not leave aside time that is to be spent on ETL tools evaluation, selection, and implementation.

Another deception is about cost savings. The acquisition cost of ETL tools is quite sufficient, and the annual support cost is often overlooked when a decision is being made on selection and implementation of an ETL tool. Thus, companies have a bit wrong idea about the amount of savings they might have.

The misconceptions described above, are a source of inappropriate expectations, and as a result, wrong assessment of data integration initiative expenses, and at worst, failed data integration initiative.

I think, the situation’s a bit different, when we speak about open source ETL tools. First, there’s no such thing as annual support cost. Huge developer and user communities make it possible to receive support from other users, without paying for it. Then, license costs of open source ETL solutions are really low, which allows to redirect the released budget where there would be a demand for additional finance. So, here I see a real possibility to reduce the cost of data integration with the help of ETL tools.

What I agree with, is that selection process will take time, as well as deployment (including user training), though open source solutions are typically easier to deploy, compared to proprietary ETL tools. Companies should take time for proper evaluation of ETL tools, either open source or proprietary; and I do agree that the decision should be taken based on whether an ETL tool fits this peculiar company’s business needs best and is capable to provide a company with help in achieving their goals.

December 7, 2009

Choosing ETL That Fits Your Business Requirements II: Consider Open Source

Filed under: Data Integration, ETL, Open Source — Tags: , — Olga Belokurskaya @ 1:49 am

I suppose this posting to be a kind of a continuation of the previous one. Here I’m again about ETL and data integration solutions selection, but I’d like to concentrate on open source ETL. I won’t make any discovery if, again, repeat that today’s open source solutions are good enough for ETL operations, and  data integration and BI experts are expecting them to develop into solutions for master data management.

But today, open source ETL provides alternative to proprietary solutions which are usually costly and supposed to be used for more complex data integration processes, apart from mere ETL. However, for mid-sized and small businesses that, as a rule, have smaller budgets and smaller open source ETL solutions are a means to address their data integration needs.

But I was about business requirements, or rather how open source ETL tools may address company’s business requirements for data. Here I see several ways:

First, if a company by chance has a couple of their own developers, they could make necessary customization to company’s ETL, thanks to the availability of the code.

Then, as a rule, open source solutions are supported by developers’ communities, some of which are really powerful. So, the community behind the open source ETL that a company uses may help with needed functionality or customization of existing ones to meet company’s business requirements.

And don’t forget about the vendor itself. A company may address directly to the vendor of their open source ETL and require additional functionalities that meet company’s peculiar needs.

And, as a rule, any of the actions described above will cost less and the result will take less time to deliver than in case with proprietary data integration tools.

Well, though the posting sounds so bright, there still may be issues with open source, such as vendors that stop supporting their solutions, etc. However with the communities behind, and thanks to the openness of the code, the chances to overcome those issues seem to me higher than in case with proprietary solutions.

December 3, 2009

Choosing ETL That Fits Your Business Requirements

Filed under: Data Integration, ETL — Tags: , , — Olga Belokurskaya @ 3:07 am

In my previous posting, I touched upon the importance of defining business requirements before starting data integration initiative. Without those requirements, the process will comprise just blind gathering all kinds of data available at the enterprise with no clear purpose. In other words, data integration initiative may turn into some kind of monkey business. Okay, that’s clear.

Well, data integration tools selection, including ETL (extract, transform, and load) solutions, is a job that requires efforts but if done right, it’s worth it. What do I mean by this “done right”? The message is simple. When choosing an ETL tool, a company should bear in mind business requirements for data, and make their choice based on whether an ETL solution possesses functionalities that meet those requirements, or whether a vendor may add the needed functionality to their solution.

Look. You’ve defined your data integration strategy, business users have created the list of requirements for the data they would need to work with. So now it is clear what data should gather the future ETL tool, and what operations it should perform over that data. Now, having all the necessary criteria, you won’t be wandering blindly among multiple vendors, but will concentrate on those whose ETL solutions meet your criteria.