Since months Pentaho ETL developers have been juggling with three distinct tools- Pentaho DI, SSIS, and PowerShell. Their experience has provided them new perspective on the cons and pros of every tool. In this post, they are sharing their experience with these tools and explain their significance. Before getting started, they first would like to explain ETL in brief and how the tools suits ETL landscape.
ETL is Extract Transform Load. Informatica was the initial ETL tool introduced in the mid nineties and continues to be one of the best tools in the market. Why these tools were intended?
ETL tools were intended as extracting data from multiple source systems, merge the data, and load into a data warehouse for analytics and reporting were conventionally time consuming and error prone tasks.
Today, when most people are unaware of what exactly an ETL tool is and how it solves their problem, many experienced developers got success in attaining their goals through a combination of SQL scripts and Shell scripts. Rest of the less experienced developers are ready to stay in their ‘comfort zone’ and use any programming language with which they are familiar, along with a bit of software maintenance and support.
ETL tools are intended and designed for following functions-
- Code readability
This is the significant difference between every other programming language and ETL tool. While many developers scoff at this visual approach, managers love it because they are able to scrutinize and partially understand the working of code. With ETL tool, user can see the data coursing through it from one step to the next.
- Data element mapping
Most of the task you are doing in data warehousing involves mapping of the source data element that assist in targeting schema. This is boring and involves risk to errors, especially when you are lining up an INSERT statement using SELECT statement. With ETL tools, the task becomes easier and safe.
- Impact Analysis
- Incremental loading
- Parallelism
- Job recovering and checkpointing
- Logging and monitoring
- Changing dimensions slowly
- Centralized error handling
- Pivoting and more
Fig- Showing how ETL works
Most professional Pentaho ETL developers are now trying and making efforts to understand ETL tools and also learning their uses. You can learn about them and read online updates related to ETL tools. A lot of informative articles are available in online database- you can search and collect them as much as you required.