Archive

Author Archive

Friday Bullets

March 5th, 2010

Lately, I have been busier than not. Normally this is a good thing but it has resulted in me neglecting this blog more than I would like. I have several topics in the works but the free time just hasn’t been available to wrap them up.

So, in the meantime, I’d like to start a new (hopefully) weekly feature. The idea is quite simple: a list of news and other things that I have encountered over the week that I feel might be of interest to readers. Here it is:

ETL Pattern: Staged Refresh

February 12th, 2010

Here is a fairly common challenge: perform a complete refresh of the data in a target table. It seems pretty basic and easy. All of the existing data is replaced with the new load. There is no change detection required; just a simple load.

Yet, more often than not, when I go into a client that is having ETL issues I find that they have tackled this challenge with a poor design. A poor design that ultimately results in errors and ongoing headaches for their support staff. This is so common that I sometimes wonder if I have been following the same shortsighted developer from client to client.

This post is my attempt to jump ahead of this mystery developer and hopefully prevent some future headaches. Essentially, the poorly designed ETL process follows this pattern:

etl1

So, what is wrong with this? First, by dropping the existing table it effectively makes the table unavailable until the load completes. This can be particularly troublesome when the load process is lengthy. If the load is scheduled during off hours this might not be a problem. Yet, this downtime can be avoided.

Second, if the load fails you are left with an empty table. In my mind this is the most egregious issue with this pattern. An error leaves you with nothing. This can easy be the difference between an emergency and a simple annoyance. If the process had left the original data in place it would be old and stale but at least it is still accessible.

Here is a modified pattern that helps reduce the issues with the original:

ETL2

This is a classic pattern for handling a complete refresh. The load itself is completed against a staging table. This leaves the existing target in place and doesn’t interrupt use while the load is running. The target table is only ever dropped after a quality check has been performed. This helps ensure that the load was successful and should prevent the empty table issue that the original process had.

There is nothing complicated about this pattern. It is still simple and straight forward. But, unlike the original, some forethought is put into dealing with failure. I hope this encourages you to take a look at how your ETL deals with problems. Does it handle it gracefully, or does it leave you with an emergency?

Looking To Learn PowerShell?

February 10th, 2010

Microsoft recently released an updated version of the PowerShell Quick Reference Guide. Essentially, it is a printable booklet that has an overview of commonly used PowerShell commands. So, if you’re learning PowerShell or just need a helpful reminder, check it out.

By the way, I cannot stress enough how useful PowerShell can be when working in a Windows Server environment. It really is worth learning even if it isn’t something you plan on using on a daily basis. If you don’t know it, now is a good time to start!

BI Style Guides

January 21st, 2010

While catching up on blog posts I came across a post by Patrick Husting from last month. The post titled “Better looking charts in Excel 2007/2010” provides a before and after example of a chart in Excel. The simple example is meant to show the importance of adding a little style to your reports.

I think this is an often overlooked aspect of a complete BI initiative. Too often the focus is only on the technical aspects and does not ultimately address user adoption. While there are several things that can contribute to low user adoption having ugly and unintuitive reports doesn’t help.

With this in mind, I would argue that taking the time and effort to design a comprehensive style guide is well worth the investment. If done properly it should help a project realize ROI. So, when you’re planning your next project consider budgeting time for style planning.

Anyhow, check out Patrick’s blog and let me know what you think.

SQL Server 2008 R2 Official Release Date

January 20th, 2010

Just a heads up for those who didn’t notice: yesterday Microsoft released the official release date for the next version of SQL Server. It will be generally available May 2010 and should appear on the May pricelist.

I have been looking forward to some of the improvements in the upcoming release and am particularly excited about Master Data Services. Here is a quick overview of the key features:

If you’d like to give the last CTP a try you can download it here.

T-SQL Snippet: Strip Time from Datetime

January 15th, 2010

I would have thought that this would be fairly common knowledge but I am often asked for a quick example of how to strip off the time element of a SQL datetime type. So, here is an example followed by an explanation of how and why it works:

SELECT DATEADD(d,DATEDIFF(d,0,GETDATE()),0)

Technically, this doesn’t strip the time but instead sets it to midnight of current date. So, what is this snippet doing? To start we need to know a little bit about how datetime is stored in SQL Server. The datetime type is an 8 byte type. Essentially, it is stored as two distinct 4 byte integers. The first 4 byte integer represents the number of days since the base date for SQL Server. The base date for the datetime type is 01/01/1900. The second 4 byte integer that represents the milliseconds since midnight.

With that in mind, the following snippet should return this base date:

SELECT CAST(0 as datetime)

Now, back to dissecting the original snippet, I am using the built-in DATEDIFF function to return an integer. The integer is the number of days that have elapsed since the base date of 01/01/1900. With the knowledge of how a datetime is stored you might be tempted to just CAST directly from the datetime to an int. However, depending on the time of day, it might cause the int to round up. So, it is better to just rely on the DATEDIFF.

Finally, to convert this integer back to a datetime type I am using the DATEADD function. Once again, we are using the base date and adding the integer to it to give us today’s date with a midnight time component. Pretty simple, right?

Author: Pierre LaFromboise Tags:

WP-simplesyntaxhighlighter Update

October 29th, 2009

A new version of the core SyntaxHighlighter that wp-simplesyntaxhighlighter wraps has been released. As a result, I have released a new version of the WordPress plug-in that reflects this latest revision.

You can find a quick list of the changes here.

BI Revision/Version/Source Control

October 28th, 2009

This is a continuation on my previous rants on borrowing traditional software development concepts for a BI project. As the title suggests, the topic today is source control.

No matter what you want to call it, a cornerstone of any good software project should be source control. This is pretty fundamental for software projects but it hasn’t been a priority for many in BI. I am not entirely sure why this is. Is it just an oversight? Are BI project managers oblivious to the existence of good source control tools? I’m not sure what the answer is but I am sure that BI teams should be using it.

The benefits of source control should be well known. So, I am not going to rehash them here. However, I would like to go over a few things to keep in mind:

Full Coverage

Source control can be applied to most parts of a BI project. Source control concepts can apply to data models, cube structures, ETL, and even the documentation for the project. Pretty much anything that changes over time can benefit. So, try to think beyond just source code.

Version Labels

Having the ability to identify something makes it infinitely easier to discus it. This same concept applies to components of a project. It doesn’t really matter whether it is some sort of name or just a revision number. The benefit is that it allows you to have a common term to refer to a snapshot of a project.

I strongly suggest that the labeling is universal and span all of the projects components. In other words, version 2.1 could universally refer to a snapshot of the ETL as it does to the data model. And, when you start introducing formal issue tracking and end-user team portals it becomes even more important (more on these topics at a later date).

Ultimately what I am getting here is that checking in revisions isn’t enough. Develop a consistent labeling system for releases. And, use the commenting features of your source control system to identify changes; all changes!

On Cost

There are many source control systems out there. And, many of these are free. So, cost should not be an excuse. Additionally, many of the design tools on the market today have some flavor integrated into the development environment. Don’t make excuses, find something that works for your team and use it!

PowerShell Snippet: Installed Applications

October 23rd, 2009

I am finally getting around to installing Windows 7 on my laptop. Before I wipe everything I wanted to have an idea of what I currently have installed. So, I put together a little PowerShell script:

Get-WMIObject Win32_product | sort-object vendor | format-table name, version, vendor, caption -autosize | out-file installed.txt -width 250

It is a pretty simple example. It uses the Win32_product WMI class to get a list of installed application. I then sort by vendor, format as a table, and output to a text file. As I said, pretty simple.

SharePoint 2010

October 22nd, 2009

The 2009 SharePoint conference wrapped up earlier today. Unfortunately, I was unable to attend. However, I did try to keep up by watching all of the tweets and blog posts that were being generated by the attendees. Here is a quick list of some of the more interesting SharePoint 2010 news:

  • SharePoint 2010 beta should be available in November (sign-up here)
  • Project Gemini gets a product name: PowerPivot
  • SharePoint 2010 will run locally for developers under Vista and Windows 7 x64
  • No more 32-bit, 64bit only
  • IE6 will no longer be supported
  • New PowerShell CmdLets for managing SharePoint
  • New Document Center template optimized to supports hundreds of thousands of documents
  • Windows SharePoint Services rebranded as Microsoft SharePoint Foundation
  • Enterprise Metadata – includes tagging and ratings
  • Integrated support for Silverlight
  • Groove reworked as SharePoint Workspace
  • Improved Mobile Web experience
  • Sandbox isolation for custom code
  • Business Data Catalog becomes Business Connectivity Services – no longer read-only
  • Improved APIs – including list access through REST, LINQ, ATOM, and JSON

You can watch some of the highlights of the conference here.