John Morris’s latest post, where he highlights the difference between data migration and ETL is, in fact, very much to the point. ETL is a much larger concept that needs more effort and time than data migration. Data migration, if properly defined, is a one-way long-term trip of data from one system to another. While ETL not only can be a repeated procedure, but is also much larger in scope, as it includes all sorts of data transformations (from data cleansing and data quality checking to certain elements of business intelligence).
So, if you look closely, the two are considerably different, not even parts of the same process.
ETL tools. Make one of your own or buy one from a trusted vendor? What’s best for your company?
Before jumping into anything or rushing to a popular vendor, analyze the degree to which an ETL tool will benefit your company and wether it might be wiser to build one of your own.
Building a customized solution of your own implies hiring technical staff for that purpose, but you have a greater chance of your final solution to be simple and match just your business needs.
Purchasing an ETL tool you are more limited by the market offerings in terms of customization. To add to that, you are likely to face the complexity of educating your in-house staff, which might take a load of your company’s time and resources. Regardless of this, however, most vendors still resort to purchasing ready-made solutions. How come? Well, try asking yourself the following questions and you might actually come to the same decision.
1. What are your goals, why do you need an ETL solution? Is this a one-time procedure with limited conditions or are you planning to make it a part of your organization’s structure and strategy? What do you want from your ETL tool? Specify priorities.
2. How much can you spend on your solution? How much do you want to spend?
3. How many data sources are you working with, and what kind are they? What functionality do they already have that might be helpful at the extraction/transformation.
4. How much time can you painlessly allow for the transformation process?, for your entire ETL process?
5. How much human resources and time can you dedicate to this project? (don’t forget about education)
6. If you decide to build your own solution, who is going to educate your staff? Are you competent enough in the process (etl, warehousing)?
7. Just how many ETL experts do you have in your company and how do you estimate their potential and skill? Are they replaceable?
Just these 7 for starters…
Savio Rodrigues has posted a brief review of Apatar in his InfoWorld Open Source weblog.
The really cool thing is that Apatar provides a visual designer and mapping tool in order to hide the complexity of ETL and data integration from the typical (business) user.
Yep, that’s the idea. Apatar was primarily created and designed as a business user-oriented tool. On top of visualization that Apatar provides, most of the things the user has to do manually is drag-and-drop icons to the visual integrator. That is, no coding whatsoever. And yet another of Apatar’s corporate user-oriented features is the connectivity itself. Apatar provides connectivity to corporate aplications like SalesForce, SugarCRM, GoldMine, etc. and can be used by pretty much any business user not familiat with development at all.
Making the most out of your customer database and relations management solution is what every company wants. No doubt about that. Nonetheless, a huge number of CRM approaches prove insufficient and inefficient.
Here are the six aspects of CRM deployment that Richard Boardman in his recent article calls essential:
1. Poorly defined requirements
2. The availability of internal staff
3. Sign offs
4. Data Good systems require good data, and, if the new system is to be populated with existing data, it’s important that the quality of that data is high. Many organisations are surprised at how many data sources they possess and how poor the data quality is. The cleansing of data and reconciliation of different versions of the same record in multiple data sources can be very time consuming. While there are tools that can help, this process tends to be very manual, and is not something that can be fully outsourced as it requires considerable input from the data owners.
5. User acceptance testing
6. User adoption
I still think data is the key element in this. It’s how you approach, structure and work with your data that makes a difference in your company’s progress. I’d break number four into more precise items like
1. Well-defined data requirements
2. Customer Data Integration & Data Quality (including ETL, data cleansing and everything related to it)
3. Data management, that among other includes following through with your requirements and cleansing procedures rather than adopting a once-in-a-lifetime/lifecycle (whatever you wanna call it) scheme.
But I agree with Richard, you still need to be “realistic about the demands these projects will place on the organisation and manage expectations accordingly. Too often CRM projects are deemed failures because they failed to meet impossibly demanding and often self-inflicted deadlines. A better review of what’s involved and a more analytical appraisal of the availability of resources to meet those demands will go a long way to ensure project success.“
ETL (data Extraction, Transformation and Loading) is an integral part of data warehousing.
In brief, ETL processes are used to extract data from various sources, transform it by cleaning and integrating, and finally load it into data warehouses. The data you get in the end is clean, well-structured, systematic, and ready to use. It sounds quite easy, but in reality there’s much more to it. There’s a catch to each step.
During the first step, data extraction, the challenge you are dealing with is the bulk of data from different sources. It might be different departments, databases, formats, reporting systems, etc. This scattered data needs to somehow be captured and moved into the staging database.
Next — data transformation. This step is the most complicated. It can be broken into four separate steps under the umbrella of transformation:
- data verification – comparing the extracted data with the DW quality standards. In case data doesn’t meet the outlined standards, it either gets rejected, or held to be reviewed by the administrator.
- data cleaning – the data left from the previous step is made more precise. (The techniques this stage includes are so many that they deserve a different post.)
- data mapping – merging data from different sources into a single interface, structuring it into columns and tying it together logically.
- data consolidation (or aggregation) – summarizing data from the previous step and performing overall calculations to provide the user with a more complete picture.
Finally, loading. This step simply uploads the data organized during transformation into a warehouse.
During this whole process, the one thing to be careful of is losing data. ETL process does not presuppose changing the initial data, it should only make it better, cleaner, more correct and organized.