Archive

Archive for February, 2010

ETL Pattern: Staged Refresh

February 12th, 2010

Here is a fairly common challenge: perform a complete refresh of the data in a target table. It seems pretty basic and easy. All of the existing data is replaced with the new load. There is no change detection required; just a simple load.

Yet, more often than not, when I go into a client that is having ETL issues I find that they have tackled this challenge with a poor design. A poor design that ultimately results in errors and ongoing headaches for their support staff. This is so common that I sometimes wonder if I have been following the same shortsighted developer from client to client.

This post is my attempt to jump ahead of this mystery developer and hopefully prevent some future headaches. Essentially, the poorly designed ETL process follows this pattern:

etl1

So, what is wrong with this? First, by dropping the existing table it effectively makes the table unavailable until the load completes. This can be particularly troublesome when the load process is lengthy. If the load is scheduled during off hours this might not be a problem. Yet, this downtime can be avoided.

Second, if the load fails you are left with an empty table. In my mind this is the most egregious issue with this pattern. An error leaves you with nothing. This can easy be the difference between an emergency and a simple annoyance. If the process had left the original data in place it would be old and stale but at least it is still accessible.

Here is a modified pattern that helps reduce the issues with the original:

ETL2

This is a classic pattern for handling a complete refresh. The load itself is completed against a staging table. This leaves the existing target in place and doesn’t interrupt use while the load is running. The target table is only ever dropped after a quality check has been performed. This helps ensure that the load was successful and should prevent the empty table issue that the original process had.

There is nothing complicated about this pattern. It is still simple and straight forward. But, unlike the original, some forethought is put into dealing with failure. I hope this encourages you to take a look at how your ETL deals with problems. Does it handle it gracefully, or does it leave you with an emergency?

Looking To Learn PowerShell?

February 10th, 2010

Microsoft recently released an updated version of the PowerShell Quick Reference Guide. Essentially, it is a printable booklet that has an overview of commonly used PowerShell commands. So, if you’re learning PowerShell or just need a helpful reminder, check it out.

By the way, I cannot stress enough how useful PowerShell can be when working in a Windows Server environment. It really is worth learning even if it isn’t something you plan on using on a daily basis. If you don’t know it, now is a good time to start!