Your mind goes blank. Your appendages go numb. You feel yourself plummeting into a state of crippling despair. You’ve been asked, for the umpteenth time, to do an in-depth comparison of the best ETL tools as research for your company’s next IT integration investment. Not only is the market for these tools heavily commoditized, it seems that all the vendors are touting different individual capabilities without offering a holistic, comprehensive solution. Finding one that is the right fit for your company on all fronts can feel like an almost hopeless task.
Before you wander out of your office mumbling incoherently to yourself, your eyes blank and unrecognizable, let us help you. Here are the ten most essential guiding questions to help frame your in-depth comparison of ETL tools and find the comprehensive solution that meets all your needs:
1. Does it allow non-technical users to design and configure without custom code?
Typically ETL tools require developers to build solutions, and this lengthens the development time and requires a dedicated team of programmers to maintain going forward.
2. What is the learning curve in terms of using the product?
It’s important that business user training is efficient, and a graphical, browser-based user interface will help streamline this process.
3. Does it support recoverable orchestrations (or pipelines)?
The ETL tool needs to provide a recoverable option if the server is shut down in the middle of an ETL flow execution, as well as restart from the point of the last successfully executed activity. Many ETL tools lack architecture to support recovery and persistence of process flow’s “state” at run-time. Having a state-based orchestration engine that uses checkpoints to track the run-time execution of process flow helps in recovering flows that are interrupted by system failures.
4. Does it support splitting large data into multiple chunks?
The tool needs a mapper which supports the processing of multiple chunks of data in parallel through concurrent threads. This is crucial for large or bulk data files processing.
5. Does the workflow handle dynamic binding?
This is important for content-based routing scenarios where multiple types of data need to be processed, and the behavior of the flow depends on the type of data being received. It’s better to have one “template” process that handles all data types versus creating multiple flows for each individual data type.
6. Does it provide the extensibility to add custom plugins into the SOA framework?
This is a very important feature that allows developers to leverage existing programs through use with the ETL tool. Being able to plug in executables like Java classes, Jars, SQL queries, and stored procedures, to name a few, can greatly increase the adaptability and efficiency of your ETL solution.
7. Does it have an API development framework for publishing and consuming SOAP and REST web services?
Some of the factors that should be considered are related to the publishing of orchestrations as web services and the ability to handle sessions and persisting contextual data within the run-time of the process flow. How easy it is to connect to OAuth based APIs? How does the user handle errors and issues related to connectivity, how are they reported, and what are the auto-recovery mechanisms to reinstate the session?
8. Does it allow looping through large data sets when processing bulk files with millions of data records?
With the target application putting limits on the amount of data it can accept at any given time, the ETL orchestration needs a looping mechanism to send the data in chunks. Along with this feature there are additional features that are equally important such as support for resubmitting specific data set, handling of errors in cases when the data is rejected by the target application, data validation during mapping, and notifications in case of errors.
9. How easy it is to monitor and track run-time workflows?
It should have monitoring dashboards that provide details on the status of each transaction, exceptions and how to information on how to correct run-time process and data exceptions.
10. Does it have features needed to build an ETL data flow?
It should have features such as secure FTP or file triggers to pull all the files at once and process them in parallel. A lot of ETL tools don’t have these triggers built-in and require third-party components to add this service into their software. Even if they do provide this feature they cannot concurrently process the files and do it sequentially, which impacts real-time data synchronizations with distributed systems.
Don’t let yourself sink into despair. With these questions guiding you, you will be able to narrow down your comparison and find the ETL software with the most comprehensive capabilities. Take a look at how we compare and contrast the ETL tools in market today. For more information, please don’t hesitate to reach out to me at: raman.singh@adeptia.com.