Friday Bullets
Lately, I have been busier than not. Normally this is a good thing but it has resulted in me neglecting this blog more than I would like. I have several topics in the works but the free time just hasn’t been available to wrap them up.
So, in the meantime, I’d like to start a new (hopefully) weekly feature. The idea is quite simple: a list of news and other things that I have encountered over the week that I feel might be of interest to readers. Here it is:
- 4 Free WPF Utilities I Use Every Day, that come with source!
“Here are 4 WPF utilities I use almost daily, that also have the excellent property of coming with source-code included.” – [Learn WPF] - Visual Studio 2010 Reviewer’s Guide
“Visual Studio 2010 Reviewer’s Guide is a detail document highlighting the new features and capabilities of the Visual Studio 2010 product.” – [Microsoft] - Microsoft’s Courier ‘digital journal’
“…Courier will function as a ‘digital journal,’ and it’s designed to be seriously portable: it’s under an inch thick, weighs a little over a pound, and isn’t much bigger than a 5×7 photo when closed” – [engadget] - Break with the Past, Bright New Future
“Windows Phone Application Development Platform build on XNA and Silverlight” – [Ozymandias] - Microsoft open-sources clever U-Prove identity framework
“U-Prove allows the creation of secure ID tokens, which are pieces of data that incorporate whatever information I need for a given task—but no more—along with cryptographic protection to ensure that they can’t be forged, reused, traced back to me, or linked to other tokens that I have issued.” – [ars technica] - SQL Server 2008 R2 UNICODE Compression – what happens in the background?
“The current implementation of ROW compression is very simple to understand; all columns are implicitly converted to variable length columns under the covers. I assumed that Unicode compression would be equally intuitive.” – [SQL CAT]
ETL Pattern: Staged Refresh
Here is a fairly common challenge: perform a complete refresh of the data in a target table. It seems pretty basic and easy. All of the existing data is replaced with the new load. There is no change detection required; just a simple load.
Yet, more often than not, when I go into a client that is having ETL issues I find that they have tackled this challenge with a poor design. A poor design that ultimately results in errors and ongoing headaches for their support staff. This is so common that I sometimes wonder if I have been following the same shortsighted developer from client to client.
This post is my attempt to jump ahead of this mystery developer and hopefully prevent some future headaches. Essentially, the poorly designed ETL process follows this pattern:
So, what is wrong with this? First, by dropping the existing table it effectively makes the table unavailable until the load completes. This can be particularly troublesome when the load process is lengthy. If the load is scheduled during off hours this might not be a problem. Yet, this downtime can be avoided.
Second, if the load fails you are left with an empty table. In my mind this is the most egregious issue with this pattern. An error leaves you with nothing. This can easy be the difference between an emergency and a simple annoyance. If the process had left the original data in place it would be old and stale but at least it is still accessible.
Here is a modified pattern that helps reduce the issues with the original:
This is a classic pattern for handling a complete refresh. The load itself is completed against a staging table. This leaves the existing target in place and doesn’t interrupt use while the load is running. The target table is only ever dropped after a quality check has been performed. This helps ensure that the load was successful and should prevent the empty table issue that the original process had.
There is nothing complicated about this pattern. It is still simple and straight forward. But, unlike the original, some forethought is put into dealing with failure. I hope this encourages you to take a look at how your ETL deals with problems. Does it handle it gracefully, or does it leave you with an emergency?
Looking To Learn PowerShell?
Microsoft recently released an updated version of the PowerShell Quick Reference Guide. Essentially, it is a printable booklet that has an overview of commonly used PowerShell commands. So, if you’re learning PowerShell or just need a helpful reminder, check it out.
By the way, I cannot stress enough how useful PowerShell can be when working in a Windows Server environment. It really is worth learning even if it isn’t something you plan on using on a daily basis. If you don’t know it, now is a good time to start!
BI Style Guides
While catching up on blog posts I came across a post by Patrick Husting from last month. The post titled “Better looking charts in Excel 2007/2010” provides a before and after example of a chart in Excel. The simple example is meant to show the importance of adding a little style to your reports.
I think this is an often overlooked aspect of a complete BI initiative. Too often the focus is only on the technical aspects and does not ultimately address user adoption. While there are several things that can contribute to low user adoption having ugly and unintuitive reports doesn’t help.
With this in mind, I would argue that taking the time and effort to design a comprehensive style guide is well worth the investment. If done properly it should help a project realize ROI. So, when you’re planning your next project consider budgeting time for style planning.
Anyhow, check out Patrick’s blog and let me know what you think.
SQL Server 2008 R2 Official Release Date
Just a heads up for those who didn’t notice: yesterday Microsoft released the official release date for the next version of SQL Server. It will be generally available May 2010 and should appear on the May pricelist.
I have been looking forward to some of the improvements in the upcoming release and am particularly excited about Master Data Services. Here is a quick overview of the key features:
- Master Data Services
- PowerPivot
- Application & Multi-Server Management
- Improved Hyper-V Support – Live migration between hosts
- StreamInsight
- New Data Center Edition – Increased capacity (256 logical processors, max OS RAM, etc.)
- New Parellel Data Warehouse Edition
If you’d like to give the last CTP a try you can download it here.
T-SQL Snippet: Strip Time from Datetime
I would have thought that this would be fairly common knowledge but I am often asked for a quick example of how to strip off the time element of a SQL datetime type. So, here is an example followed by an explanation of how and why it works:
SELECT DATEADD(d,DATEDIFF(d,0,GETDATE()),0)
Technically, this doesn’t strip the time but instead sets it to midnight of current date. So, what is this snippet doing? To start we need to know a little bit about how datetime is stored in SQL Server. The datetime type is an 8 byte type. Essentially, it is stored as two distinct 4 byte integers. The first 4 byte integer represents the number of days since the base date for SQL Server. The base date for the datetime type is 01/01/1900. The second 4 byte integer that represents the milliseconds since midnight.
With that in mind, the following snippet should return this base date:
SELECT CAST(0 as datetime)
Now, back to dissecting the original snippet, I am using the built-in DATEDIFF function to return an integer. The integer is the number of days that have elapsed since the base date of 01/01/1900. With the knowledge of how a datetime is stored you might be tempted to just CAST directly from the datetime to an int. However, depending on the time of day, it might cause the int to round up. So, it is better to just rely on the DATEDIFF.
Finally, to convert this integer back to a datetime type I am using the DATEADD function. Once again, we are using the base date and adding the integer to it to give us today’s date with a midnight time component. Pretty simple, right?
WP-simplesyntaxhighlighter Update
A new version of the core SyntaxHighlighter that wp-simplesyntaxhighlighter wraps has been released. As a result, I have released a new version of the WordPress plug-in that reflects this latest revision.
You can find a quick list of the changes here.
BI Revision/Version/Source Control
This is a continuation on my previous rants on borrowing traditional software development concepts for a BI project. As the title suggests, the topic today is source control.
No matter what you want to call it, a cornerstone of any good software project should be source control. This is pretty fundamental for software projects but it hasn’t been a priority for many in BI. I am not entirely sure why this is. Is it just an oversight? Are BI project managers oblivious to the existence of good source control tools? I’m not sure what the answer is but I am sure that BI teams should be using it.
The benefits of source control should be well known. So, I am not going to rehash them here. However, I would like to go over a few things to keep in mind:
Full Coverage
Source control can be applied to most parts of a BI project. Source control concepts can apply to data models, cube structures, ETL, and even the documentation for the project. Pretty much anything that changes over time can benefit. So, try to think beyond just source code.
Version Labels
Having the ability to identify something makes it infinitely easier to discus it. This same concept applies to components of a project. It doesn’t really matter whether it is some sort of name or just a revision number. The benefit is that it allows you to have a common term to refer to a snapshot of a project.
I strongly suggest that the labeling is universal and span all of the projects components. In other words, version 2.1 could universally refer to a snapshot of the ETL as it does to the data model. And, when you start introducing formal issue tracking and end-user team portals it becomes even more important (more on these topics at a later date).
Ultimately what I am getting here is that checking in revisions isn’t enough. Develop a consistent labeling system for releases. And, use the commenting features of your source control system to identify changes; all changes!
On Cost
There are many source control systems out there. And, many of these are free. So, cost should not be an excuse. Additionally, many of the design tools on the market today have some flavor integrated into the development environment. Don’t make excuses, find something that works for your team and use it!
PowerShell Snippet: Installed Applications
I am finally getting around to installing Windows 7 on my laptop. Before I wipe everything I wanted to have an idea of what I currently have installed. So, I put together a little PowerShell script:
Get-WMIObject Win32_product | sort-object vendor | format-table name, version, vendor, caption -autosize | out-file installed.txt -width 250
It is a pretty simple example. It uses the Win32_product WMI class to get a list of installed application. I then sort by vendor, format as a table, and output to a text file. As I said, pretty simple.
SharePoint 2010
The 2009 SharePoint conference wrapped up earlier today. Unfortunately, I was unable to attend. However, I did try to keep up by watching all of the tweets and blog posts that were being generated by the attendees. Here is a quick list of some of the more interesting SharePoint 2010 news:
- SharePoint 2010 beta should be available in November (sign-up here)
- Project Gemini gets a product name: PowerPivot
- SharePoint 2010 will run locally for developers under Vista and Windows 7 x64
- No more 32-bit, 64bit only
- IE6 will no longer be supported
- New PowerShell CmdLets for managing SharePoint
- New Document Center template optimized to supports hundreds of thousands of documents
- Windows SharePoint Services rebranded as Microsoft SharePoint Foundation
- Enterprise Metadata – includes tagging and ratings
- Integrated support for Silverlight
- Groove reworked as SharePoint Workspace
- Improved Mobile Web experience
- Sandbox isolation for custom code
- Business Data Catalog becomes Business Connectivity Services – no longer read-only
- Improved APIs – including list access through REST, LINQ, ATOM, and JSON
You can watch some of the highlights of the conference here.