Press "Enter" to skip to content

Month: January 2016

Shredding XML

Tim Peters introduces us to shredding multi-level XML:

The below XML has data nested in different levels that requires the nodes method to join them together. The nodes method accepts a XML string and returns a rowset. That rowset can then be used with CROSS APPLY to effectively link your way down.

nodes (XQuery) as Table(Column)

The tabular format I need requires data from 3 different levels of this XML gob and I need to wade through 5 “tables” to get there.

Shredding XML is something you occasionally need to do.

Comments closed

What Is Business Intelligence?

Rolf Tesmer digs into the concept of BI:

Hunting the web for the general definition pulls up many one liners – and yes I guess everyone who is anyone will have a way to define it, and that definition is (or should) be based on their own experiences with building, deploying or supporting BI solutions.

If you are looking for a nice short collection of some of those definitions – and a further explanation of why you need BI – then this is a great post (http://www.jamesserra.com/archive/2013/03/why-you-need-business-intelligence/)

Rolf unpacks the definition and gives us some insight into the nature of Business Intelligence.

Comments closed

Options To Capture Changed Data

Koen Verbeeck looks at various ways of capturing changed data:

  • In some very rare cases, you can actually use change data capture or change tracking on the source system. If you get one of those features implemented, you’re golden. But most of the time you’re not, as a lot of administrators don’t like them because of potential performance impact.

Koen lists several options.  One additional option is to use triggers to capture changes in a queue table.  If you are dealing with SCD-1 changes (in which you do not need a full reckoning of history) or periodic SCD-2 (in which you keep history but are okay with smashing some changes together if they’re within a time period between ETL loads), loading IDs of changed records into a queue table is reasonably efficient and gets around trying to make sure everybody updates the modified date.  It has its own drawbacks, though, starting with it using triggers…

Comments closed

Case Statement Short-Circuiting

Richie Lee talks about using the CASE statement to short-circuit a logical expression:

The issue here is that SQL is a declarative language: unlike procedural languages, there is no guarantee on the ordering of the operations, because optimizers. And SQL Server decides to do something other than what we’d expect: it tries to evaluate the value “Apu” as a date. But by using a CASE expression we can force the optimizer to take the input and match it to the expression (in this case, when a value is a date then convert it to a date) before checking if the value is older than 7 days.

This does work most of the time, but there are exceptions, so as always, test your code.

Comments closed

Incremental Loading Using Datetime Columns

Reza Rad shows a pattern for implementing incremental loads using modified dates:

The idea behind this method is to store the latest ETL run time in a config or log table, and then in the next ETL run just load records from the source table that have modified (with their modified date greater than or equal to) after the latest ETL run datetime. This will create the change set for the data table. The change set might contains inserted, updated, or deleted records. to identify which change happened on the record you need to compare the change set with existing records and separate inserted, updated, and deleted records. This change set with the action on each record can be inserted into staging tables, and then be used to apply on the fact table based on appropriate action.

In my experience, the hardest part about this is making sure people update ModifiedTime when they update rows in the table.

Comments closed

Powershell Hashtables

Steve Jones shows us how to implement a hashtable in Powershell:

With that code, I could easily solve the puzzle. However I was struck by the various ways I work with the hash tables. I use braces, {}, to declare the table. I use brackets, [], to access elements and then parenthesis, (), when calling methods. All of that makes programming sense, but it’s something to keep in mind, especially as those three marks mean different things in Python.

Hashtables are easy to implement in Powershell and are extremely useful.

Comments closed

UPDATE FROM Clause Usage

Rob Farley shows us UPDATE FROM:

1. It can let you access data in the columns of those tables, to use in predicates or expressions.

2. It can let you filter the data in the base table, by only allowing rows which match, such as when using an inner join or right outer join.

3. It can cause rows in the base table to be returned multiple times, if multiple rows in the joined table match a single row in the base table.

4. It can introduce NULL rows, if a full or right outer join is being done (or a left outer join with the base table second) and there are rows in the joined table that don’t match any rows in the base table.

This is a useful bit of T-SQL-specific syntax, but it’s a sharper edge than most UPDATE statements.  For a look back in history, Hugo Kornelis wanted to deprecate this syntax with the release of SQL Server 2008 (though MERGE has its own bugs and “Won’t Fix” problems, so in retrospect, perhaps it’s best that we still have UPDATE FROM).

Comments closed

Migrating Always Encrypted Data

Nitish Upreti shows us how to migrate data encrypted using Always Encrypted:

Our customers use the SQL platform to store volumes of high-valued data assets for their organization. With Always Encrypted, we want to deliver additional security while ensuring complete integrity of stored user data. To achieve this, in a regular workflow SQL Server/Azure SQL Database disallows an application to insert data directly into an encrypted column. If the application has not enabled Always Encrypted in the connection string, an insert statement targeting an encrypted column will fail. To insert a value into a column protected with Always Encrypted, the application must connect to the database with Always Encrypted enabled in the connection string and pass a plaintext value of the datatype configured for the target column. Subsequently, the SQL client driver encrypts the value and sends the ciphertext to the database. This ensures plaintext data is encrypted and stored appropriately.

Read the whole thing.

Comments closed

Restoring An Azure SQL Database

Grant Fritchey shows us how to restore a database hosted in Azure SQL Database:

The first, and most important thing to notice here is that it’s supplying me with a new name. I can change that to anything I want as long as it’s not the name of a database already in existence on my Azure SQL Database Server. You read that correctly, you can restore a database to Azure SQL Database, but there is no WITH REPLACE option. The restore creates a new database. This is important. In a recovery scenario, you need to be sure that you’re prepared to deal with this fact. How could you replace the existing database? Immediately run a pair of ALTER DATABASE commands to change the name of the existing database to something else and then change the name of your newly created database to the old name. That’s your choice.

There are a couple of gotchas, so if you are administering Azure SQL Database instances, be aware of these.

Comments closed

ODBC Driver 13 In Preview

The Microsoft ODBC Driver 13 is now available (in preview form):

The preview ODBC drivers for Linux now supports Ubuntu, RedHat and SUSE. This is Microsoft’s first ODBC Driver for SQL Server release supporting Ubuntu. You can now enjoy enterprise level support while connecting to SQL Server from Ubuntu. It also updates the drivers to unixODBC driver manager 2.3.1 support.

Full interoperability with distributions of Linux is something I’ve waited a long, long time for.  This is one tiny step closer.

Comments closed