Restricted access
 
 

October 19, 2010

ETL: ”T” For ”Transformation”

Filed under: ETL — Tags: — Katherine Vasilega @ 7:07 am

One of the main functions of an Extract, Transform, and Load (ETL) tool is to transform data. The transformation step is the most vital stage of building a structured data warehouse. With this post, I’d like to help you get a better understanding of the major transformation types in ETL. Here they are:

    • Format revision
    • Decoding of fields
    • Calculated and derived values
    • Splitting of single fields
    • Merging of information
    • Character set conversion
    • Unit of measurement conversion
    • Date/Time conversion
    • Summarization
    • Key restructuring
    • De-duplication

Today I will tell you about the following ETL transformation types: format revision, decoding of fields, calculated and derived values, splitting of single fields, and merging of information.

Format revision. Fields can contain numeric and text data types. If they do, you need to standardize and change the data type to text to provide values that could be correctly perceived by the users. The length of fields can also be different and you can standardize it.

Decoding of fields.
In multiple source systems, the same data items are described by a variety of field values. Also, many legacy systems are notorious for using cryptic codes to represent business values. This ETL transformation type changes codes into values that make sense to the end-users.

Calculated and derived values. Sometimes you have to calculate the total cost and the profit margin before data can be stored in the data warehouse, which is an example of the calculated value. You may also want to store customer’s age separately—that would be an example of the derived value.

Splitting single fields. The first name, middle name, and last name, as well as some other values, were stored as a large text in a single field in the earlier legacy systems. You need to store individual components of names and addresses in separate fields in your data repository to improve the operating performance by indexing and analyzing individual components.

Merging of information. This type of data transformation in ETL does not literally mean the merging of several fields to create a single field. In this case, merging of information stands for establishing the connection between different fields, such as product price, description, package types, and viewing these fields as a single entity.

I will continue to overview ETL transformation types in my next post. See you soon!

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URL

Leave a comment