FREE DM Review Site Registration!
Sign-up today and access DM Review on the Web!

Your FREE registration entitles you to:

FREE email newsletters

FREE access to all DM Review content

FREE access to web seminars, resource portals, our white paper library and more!

   

What are your views on the advantages and/or disadvantages ETL tools and data modeling versus code?

Question: My question is twofold, but related to the use of tools as opposed to code. We have a small business intelligence (BI) team, and the members are very technical and very good analysts and developers. They do not believe that an extract, transform and load (ETL) or a data modeling tool makes sense in our environment. Could you please provide your views on the advantages and/or disadvantages of the use of a tool for ETL and data modeling versus code to provide the same functional capabilities?

 

Tom Haughey’s Answer: This is really three entirely different questions, so I will address it as such.

 

1. Do you need an ETL tool or can you write your own ETL?

 

If an organization already has a large body of transformation programs written in COBOL or some other language, then I would stick with that strategy for now. If you have such a body of code, you probably have either procedures in place or a code generator that simplifies code creation. Many organizations successfully use what I call “roll your own” (RYO) code to do ETL. This can be a very effective solution in many situations. The advantages of ETL tools are:

  • Retention of metadata. This is a major point because analytical applications are highly dependent on proper understanding of metadata.
  • Ease of use. Because most ETL tools are GUI based and have repositories, they have increased ease of use and ease of modification.
  • Built-ins. They have built in objects to handle recurring tasks such as aggregation. So, these do not need to be coded and recoded.
  • Skill. Because of the above factors, the skill level requirements for ETL tools are less than with SQL.
  • Support. There is a large experience base to fall back on. This includes customer experiences and vendor support.
  • Auditing. ETL tools assist with auditing because of their repositories and their ability to preserve versions.

If you do not have an existing body of ETL code and other templates in place, I would recommend going to the ETL tool. However, in my opinion, as long as you handle metadata efficiently in your current environment, I would consider sticking with your current RYO solution. This point needs emphasis. Good metadata (in whatever form it takes) is essential to the usability of a data warehouse. As long as this is provided, I see no immediate benefit in going to an ETL tool. However, consider also your future plans and the growth plans for the DW. If you anticipate a large growth in BI applications and in data sources, then conversion to an ETL too would be a sound investment and should be added to your DW strategy.

 

2. Do you need a data modeling tool in general?

  • There is no question about it. If you intend to be a serious data management organization, or especially a world-class data management organization, there is no other way. I’m not saying which one, or even which type of one, or which features you need, but you absolutely need one or you cannot maturely deal with management of data and with database design. A smaller organization may be able to survive well with a simpler tool like Visio; a larger more mature organization needs a tool like PowerDesigner, ERwin, ERStudio, Visible or others.

3. Do you need a data modeling tool for ETL?

  • It depends on how you plan to do staging and transformation in your ETL. It is always useful to model any data structure, whether it is a permanent data store or a temporary data structure. Some ETL staging structures have special purposes in that they are not used for transaction processing or reporting but for the transformation and preparation of data. It is often acceptable to use various file structures rather than relational tables. There may be a good deal of inevitable redundancy and variability to these file structures, for example when the data is being sorted and aggregated. Consequently, data modeling may not seem like a natural choice. But I stick to my earlier point, it is always useful to model any data structure.
  • However, some structures really should be modeled, such as, those that are used in RI checking, shaping history, generating aggregates and managing surrogate keys. These should be modeled because there are significant relationships within these structures and between them and the output tables.

Sid Adelman’s Answer: Data modeling tools are inexpensive and writing code to provide the same function is pointless. You have too much work to do to try to duplicate this capability.

 

With ETL tools, it depends on the complexity of your cleansing and transformations and the volatility of expected changes in the source and the target. If your ETL process is, as Larissa Moss puts it, just suck and plunk, you don’t need an ETL tool. If you plan to make significant transformations, aggregations and cleansing, or if your sources are constantly changing and you are making serious changes to your data warehouse (DW) database, you will want an ETL tool and some ETL tools can help with performance when you are loading very large tables.

 

Chuck Kelley’s Answer: Most likely they are young and/or do not do much documentation. Almost every site I go into (or have been to) suffers from the same view (ETL more than data modeling). Ask them why they are using SQL or C or Java or any other language for that matter. Why should they not just write it in machine code? The answer is that while in the short term it might take longer to develop, maintenance and documentation will be easier moving forward. Unless they want to be maintenance programmers the rest of their lives, I would think you should get them to start using a tool.

 


Tom Haughey is the president of InfoModel LLC, a training and consulting company specializing in data warehousing and data management. He has worked on dozens of database and data warehouse projects for more than two decades. Haughey was former CTO for Pepsi Bottling Group and director of enterprise data warehousing for PepsiCo. He may be reached at (201) 337-9094 or via e-mail at tom.haughey@InfoModelUSA.com.

Sid Adelman is a principal in Sid Adelman & Associates, an organization specializing in planning and implementing data warehouses, in data warehouse and BI assessments, and in establishing effective data architectures and strategies. He is a regular speaker at DW conferences. Adelman chairs the "Ask the Experts" column on www.dmreview.com. He is a frequent contributor to journals that focus on data warehousing. He co-authored Data Warehouse Project Management and is the principal author on Impossible Data Warehouse Situations with Solutions from the Experts and Data Strategy. He can be reached at (818) 783-9634 or sid@sidadelman.com. Visit his Web site at www.sidadelman.com.

Chuck Kelley is an internationally known expert in database and data warehousing technology. He has 30 years of experience in designing and implementing operational/production systems and data warehouses. Kelley has worked in some facet of the design and implementation phase of more than 50 data warehouses and data marts. He also teaches seminars, co-authored four books on data warehousing and has been published in many trade magazines on database technology, data warehousing and enterprise data strategies. He can be contacted at chuckkelley@usa.net.

For more information on related topics, visit the following channels:



Industry Vendors