What Is Business Intelligence?

Rolf Tesmer digs into the concept of BI:

Hunting the web for the general definition pulls up many one liners – and yes I guess everyone who is anyone will have a way to define it, and that definition is (or should) be based on their own experiences with building, deploying or supporting BI solutions.

If you are looking for a nice short collection of some of those definitions – and a further explanation of why you need BI – then this is a great post (http://www.jamesserra.com/archive/2013/03/why-you-need-business-intelligence/)

Rolf unpacks the definition and gives us some insight into the nature of Business Intelligence.

Options To Capture Changed Data

Koen Verbeeck looks at various ways of capturing changed data:

  • In some very rare cases, you can actually use change data capture or change tracking on the source system. If you get one of those features implemented, you’re golden. But most of the time you’re not, as a lot of administrators don’t like them because of potential performance impact.

Koen lists several options.  One additional option is to use triggers to capture changes in a queue table.  If you are dealing with SCD-1 changes (in which you do not need a full reckoning of history) or periodic SCD-2 (in which you keep history but are okay with smashing some changes together if they’re within a time period between ETL loads), loading IDs of changed records into a queue table is reasonably efficient and gets around trying to make sure everybody updates the modified date.  It has its own drawbacks, though, starting with it using triggers…

Case Statement Short-Circuiting

Richie Lee talks about using the CASE statement to short-circuit a logical expression:

The issue here is that SQL is a declarative language: unlike procedural languages, there is no guarantee on the ordering of the operations, because optimizers. And SQL Server decides to do something other than what we’d expect: it tries to evaluate the value “Apu” as a date. But by using a CASE expression we can force the optimizer to take the input and match it to the expression (in this case, when a value is a date then convert it to a date) before checking if the value is older than 7 days.

This does work most of the time, but there are exceptions, so as always, test your code.

Incremental Loading Using Datetime Columns

Reza Rad shows a pattern for implementing incremental loads using modified dates:

The idea behind this method is to store the latest ETL run time in a config or log table, and then in the next ETL run just load records from the source table that have modified (with their modified date greater than or equal to) after the latest ETL run datetime. This will create the change set for the data table. The change set might contains inserted, updated, or deleted records. to identify which change happened on the record you need to compare the change set with existing records and separate inserted, updated, and deleted records. This change set with the action on each record can be inserted into staging tables, and then be used to apply on the fact table based on appropriate action.

In my experience, the hardest part about this is making sure people update ModifiedTime when they update rows in the table.

Powershell Hashtables

Steve Jones shows us how to implement a hashtable in Powershell:

With that code, I could easily solve the puzzle. However I was struck by the various ways I work with the hash tables. I use braces, {}, to declare the table. I use brackets, [], to access elements and then parenthesis, (), when calling methods. All of that makes programming sense, but it’s something to keep in mind, especially as those three marks mean different things in Python.

Hashtables are easy to implement in Powershell and are extremely useful.

UPDATE FROM Clause Usage

Rob Farley shows us UPDATE FROM:

1. It can let you access data in the columns of those tables, to use in predicates or expressions.

2. It can let you filter the data in the base table, by only allowing rows which match, such as when using an inner join or right outer join.

3. It can cause rows in the base table to be returned multiple times, if multiple rows in the joined table match a single row in the base table.

4. It can introduce NULL rows, if a full or right outer join is being done (or a left outer join with the base table second) and there are rows in the joined table that don’t match any rows in the base table.

This is a useful bit of T-SQL-specific syntax, but it’s a sharper edge than most UPDATE statements.  For a look back in history, Hugo Kornelis wanted to deprecate this syntax with the release of SQL Server 2008 (though MERGE has its own bugs and “Won’t Fix” problems, so in retrospect, perhaps it’s best that we still have UPDATE FROM).


January 2016
« Dec Feb »