Archive

Archive for the ‘Blog’ Category

ETL Pattern: Staged Refresh

February 12th, 2010
Comments Off

Here is a fairly common challenge: perform a complete refresh of the data in a target table. It seems pretty basic and easy. All of the existing data is replaced with the new load. There is no change detection required; just a simple load.

Yet, more often than not, when I go into a client that is having ETL issues I find that they have tackled this challenge with a poor design. A poor design that ultimately results in errors and ongoing headaches for their support staff. This is so common that I sometimes wonder if I have been following the same shortsighted developer from client to client.

This post is my attempt to jump ahead of this mystery developer and hopefully prevent some future headaches. Essentially, the poorly designed ETL process follows this pattern:

etl1

So, what is wrong with this? First, by dropping the existing table it effectively makes the table unavailable until the load completes. This can be particularly troublesome when the load process is lengthy. If the load is scheduled during off hours this might not be a problem. Yet, this downtime can be avoided.

Second, if the load fails you are left with an empty table. In my mind this is the most egregious issue with this pattern. An error leaves you with nothing. This can easy be the difference between an emergency and a simple annoyance. If the process had left the original data in place it would be old and stale but at least it is still accessible.

Here is a modified pattern that helps reduce the issues with the original:

ETL2

This is a classic pattern for handling a complete refresh. The load itself is completed against a staging table. This leaves the existing target in place and doesn’t interrupt use while the load is running. The target table is only ever dropped after a quality check has been performed. This helps ensure that the load was successful and should prevent the empty table issue that the original process had.

There is nothing complicated about this pattern. It is still simple and straight forward. But, unlike the original, some forethought is put into dealing with failure. I hope this encourages you to take a look at how your ETL deals with problems. Does it handle it gracefully, or does it leave you with an emergency?

Author: Pierre LaFromboise Categories: Blog Tags: ,

Looking To Learn PowerShell?

February 10th, 2010
Comments Off

Microsoft recently released an updated version of the PowerShell Quick Reference Guide. Essentially, it is a printable booklet that has an overview of commonly used PowerShell commands. So, if you’re learning PowerShell or just need a helpful reminder, check it out.

By the way, I cannot stress enough how useful PowerShell can be when working in a Windows Server environment. It really is worth learning even if it isn’t something you plan on using on a daily basis. If you don’t know it, now is a good time to start!

BI Style Guides

January 21st, 2010
Comments Off

While catching up on blog posts I came across a post by Patrick Husting from last month. The post titled “Better looking charts in Excel 2007/2010” provides a before and after example of a chart in Excel. The simple example is meant to show the importance of adding a little style to your reports.

I think this is an often overlooked aspect of a complete BI initiative. Too often the focus is only on the technical aspects and does not ultimately address user adoption. While there are several things that can contribute to low user adoption having ugly and unintuitive reports doesn’t help.

With this in mind, I would argue that taking the time and effort to design a comprehensive style guide is well worth the investment. If done properly it should help a project realize ROI. So, when you’re planning your next project consider budgeting time for style planning.

Anyhow, check out Patrick’s blog and let me know what you think.

SQL Server 2008 R2 Official Release Date

January 20th, 2010
Comments Off

Just a heads up for those who didn’t notice: yesterday Microsoft released the official release date for the next version of SQL Server. It will be generally available May 2010 and should appear on the May pricelist.

I have been looking forward to some of the improvements in the upcoming release and am particularly excited about Master Data Services. Here is a quick overview of the key features:

If you’d like to give the last CTP a try you can download it here.

Author: Pierre LaFromboise Categories: Blog Tags:

T-SQL Snippet: Strip Time from Datetime

January 15th, 2010
Comments Off

I would have thought that this would be fairly common knowledge but I am often asked for a quick example of how to strip off the time element of a SQL datetime type. So, here is an example followed by an explanation of how and why it works:

SELECT DATEADD(d,DATEDIFF(d,0,GETDATE()),0)

Technically, this doesn’t strip the time but instead sets it to midnight of current date. So, what is this snippet doing? To start we need to know a little bit about how datetime is stored in SQL Server. The datetime type is an 8 byte type. Essentially, it is stored as two distinct 4 byte integers. The first 4 byte integer represents the number of days since the base date for SQL Server. The base date for the datetime type is 01/01/1900. The second 4 byte integer that represents the milliseconds since midnight.

With that in mind, the following snippet should return this base date:

SELECT CAST(0 as datetime)

Now, back to dissecting the original snippet, I am using the built-in DATEDIFF function to return an integer. The integer is the number of days that have elapsed since the base date of 01/01/1900. With the knowledge of how a datetime is stored you might be tempted to just CAST directly from the datetime to an int. However, depending on the time of day, it might cause the int to round up. So, it is better to just rely on the DATEDIFF.

Finally, to convert this integer back to a datetime type I am using the DATEADD function. Once again, we are using the base date and adding the integer to it to give us today’s date with a midnight time component. Pretty simple, right?

Author: Pierre LaFromboise Categories: Blog Tags:

WP-simplesyntaxhighlighter Update

October 29th, 2009
Comments Off

A new version of the core SyntaxHighlighter that wp-simplesyntaxhighlighter wraps has been released. As a result, I have released a new version of the WordPress plug-in that reflects this latest revision.

You can find a quick list of the changes here.

BI Revision/Version/Source Control

October 28th, 2009
Comments Off

This is a continuation on my previous rants on borrowing traditional software development concepts for a BI project. As the title suggests, the topic today is source control.

No matter what you want to call it, a cornerstone of any good software project should be source control. This is pretty fundamental for software projects but it hasn’t been a priority for many in BI. I am not entirely sure why this is. Is it just an oversight? Are BI project managers oblivious to the existence of good source control tools? I’m not sure what the answer is but I am sure that BI teams should be using it.

The benefits of source control should be well known. So, I am not going to rehash them here. However, I would like to go over a few things to keep in mind:

Full Coverage

Source control can be applied to most parts of a BI project. Source control concepts can apply to data models, cube structures, ETL, and even the documentation for the project. Pretty much anything that changes over time can benefit. So, try to think beyond just source code.

Version Labels

Having the ability to identify something makes it infinitely easier to discus it. This same concept applies to components of a project. It doesn’t really matter whether it is some sort of name or just a revision number. The benefit is that it allows you to have a common term to refer to a snapshot of a project.

I strongly suggest that the labeling is universal and span all of the projects components. In other words, version 2.1 could universally refer to a snapshot of the ETL as it does to the data model. And, when you start introducing formal issue tracking and end-user team portals it becomes even more important (more on these topics at a later date).

Ultimately what I am getting here is that checking in revisions isn’t enough. Develop a consistent labeling system for releases. And, use the commenting features of your source control system to identify changes; all changes!

On Cost

There are many source control systems out there. And, many of these are free. So, cost should not be an excuse. Additionally, many of the design tools on the market today have some flavor integrated into the development environment. Don’t make excuses, find something that works for your team and use it!

PowerShell Snippet: Installed Applications

October 23rd, 2009
Comments Off

I am finally getting around to installing Windows 7 on my laptop. Before I wipe everything I wanted to have an idea of what I currently have installed. So, I put together a little PowerShell script:

Get-WMIObject Win32_product | sort-object vendor | format-table name, version, vendor, caption -autosize | out-file installed.txt -width 250

It is a pretty simple example. It uses the Win32_product WMI class to get a list of installed application. I then sort by vendor, format as a table, and output to a text file. As I said, pretty simple.

Author: Pierre LaFromboise Categories: Blog Tags:

SharePoint 2010

October 22nd, 2009
Comments Off

The 2009 SharePoint conference wrapped up earlier today. Unfortunately, I was unable to attend. However, I did try to keep up by watching all of the tweets and blog posts that were being generated by the attendees. Here is a quick list of some of the more interesting SharePoint 2010 news:

  • SharePoint 2010 beta should be available in November (sign-up here)
  • Project Gemini gets a product name: PowerPivot
  • SharePoint 2010 will run locally for developers under Vista and Windows 7 x64
  • No more 32-bit, 64bit only
  • IE6 will no longer be supported
  • New PowerShell CmdLets for managing SharePoint
  • New Document Center template optimized to supports hundreds of thousands of documents
  • Windows SharePoint Services rebranded as Microsoft SharePoint Foundation
  • Enterprise Metadata – includes tagging and ratings
  • Integrated support for Silverlight
  • Groove reworked as SharePoint Workspace
  • Improved Mobile Web experience
  • Sandbox isolation for custom code
  • Business Data Catalog becomes Business Connectivity Services – no longer read-only
  • Improved APIs – including list access through REST, LINQ, ATOM, and JSON

You can watch some of the highlights of the conference here.

Author: Pierre LaFromboise Categories: Blog Tags:

BI Release Management

October 22nd, 2009
Comments Off

Previously, I had discussed treating Business Intelligence projects in a similar manner to software development. Continuing on this subject another often overlooked item in BI projects is release management.

So, what do I mean by release management? Release management is a large and ever evolving topic in software development and often speaks to the entirety of an IT organization as opposed to a single project. However, for the purpose of this post I am really only interested in one thing: protecting the live production environment.

I have had numerous clients who seem to believe it is alright to develop directly against a live production environment. Yet, more often than not, if you walk down the hall in these same organizations you’ll see that their custom development teams follow strict guidelines for deploying new code to production. Why is this?

Often times these organizations look at BI as being non-critical. If a new release results in unexpected downtime it is viewed to be not as important as the uptime of a transactional business system. And, in an immature BI environment this is probably true. However, if an organization wants to realize the ROI needed to justify continuation of a BI program they are going to want to reach maturity.

Holding back on a sound release management plan or treating a BI project like a second-class citizen is only going to work to slow this maturity. It will hurt the overall perception of the program and will result in poor user adoption. If a user cannot count on a system to be fully tested and available why would they trust it? Ultimately, I think you will find it is far cheaper to start these processes as early as possible.

Fortunately, it is pretty easy to get going. Start with the environments. Have a formal development, testing, and production environment. This doesn’t need to be difficult or costly. The development and test environments do not need to be as robust as production. They just need to be able to isolate the changes from production. In most cases you’ll be able to use a subset of the overall data for development and testing purposes.

Next, establish some guidelines for promoting changes to the individual environments. If you’re using one of the agile methodologies for BI development it most likely already has an established process for this that you can follow. Overall, the governance behind this process can evolve and expand with the BI program.

Finally, the results should be a release process that allows developers to continue enhancements of the system, testers to verify the quality of the changes, and the end users rely on the availability and accuracy of production. I firmly believe that this is an essential part of any successful BI project. Can you proceed without it? Yes, but you’re going to pay for it in the long run.