Incremental Loading Using Datetime Columns

Reza Rad shows a pattern for implementing incremental loads using modified dates:

The idea behind this method is to store the latest ETL run time in a config or log table, and then in the next ETL run just load records from the source table that have modified (with their modified date greater than or equal to) after the latest ETL run datetime. This will create the change set for the data table. The change set might contains inserted, updated, or deleted records. to identify which change happened on the record you need to compare the change set with existing records and separate inserted, updated, and deleted records. This change set with the action on each record can be inserted into staging tables, and then be used to apply on the fact table based on appropriate action.

In my experience, the hardest part about this is making sure people update ModifiedTime when they update rows in the table.

Powershell Hashtables

Steve Jones shows us how to implement a hashtable in Powershell:

With that code, I could easily solve the puzzle. However I was struck by the various ways I work with the hash tables. I use braces, {}, to declare the table. I use brackets, [], to access elements and then parenthesis, (), when calling methods. All of that makes programming sense, but it’s something to keep in mind, especially as those three marks mean different things in Python.

Hashtables are easy to implement in Powershell and are extremely useful.

UPDATE FROM Clause Usage

Rob Farley shows us UPDATE FROM:

1. It can let you access data in the columns of those tables, to use in predicates or expressions.

2. It can let you filter the data in the base table, by only allowing rows which match, such as when using an inner join or right outer join.

3. It can cause rows in the base table to be returned multiple times, if multiple rows in the joined table match a single row in the base table.

4. It can introduce NULL rows, if a full or right outer join is being done (or a left outer join with the base table second) and there are rows in the joined table that don’t match any rows in the base table.

This is a useful bit of T-SQL-specific syntax, but it’s a sharper edge than most UPDATE statements.  For a look back in history, Hugo Kornelis wanted to deprecate this syntax with the release of SQL Server 2008 (though MERGE has its own bugs and “Won’t Fix” problems, so in retrospect, perhaps it’s best that we still have UPDATE FROM).

Migrating Always Encrypted Data

Nitish Upreti shows us how to migrate data encrypted using Always Encrypted:

Our customers use the SQL platform to store volumes of high-valued data assets for their organization. With Always Encrypted, we want to deliver additional security while ensuring complete integrity of stored user data. To achieve this, in a regular workflow SQL Server/Azure SQL Database disallows an application to insert data directly into an encrypted column. If the application has not enabled Always Encrypted in the connection string, an insert statement targeting an encrypted column will fail. To insert a value into a column protected with Always Encrypted, the application must connect to the database with Always Encrypted enabled in the connection string and pass a plaintext value of the datatype configured for the target column. Subsequently, the SQL client driver encrypts the value and sends the ciphertext to the database. This ensures plaintext data is encrypted and stored appropriately.

Read the whole thing.

Restoring An Azure SQL Database

Grant Fritchey shows us how to restore a database hosted in Azure SQL Database:

The first, and most important thing to notice here is that it’s supplying me with a new name. I can change that to anything I want as long as it’s not the name of a database already in existence on my Azure SQL Database Server. You read that correctly, you can restore a database to Azure SQL Database, but there is no WITH REPLACE option. The restore creates a new database. This is important. In a recovery scenario, you need to be sure that you’re prepared to deal with this fact. How could you replace the existing database? Immediately run a pair of ALTER DATABASE commands to change the name of the existing database to something else and then change the name of your newly created database to the old name. That’s your choice.

There are a couple of gotchas, so if you are administering Azure SQL Database instances, be aware of these.

ODBC Driver 13 In Preview

The Microsoft ODBC Driver 13 is now available (in preview form):

The preview ODBC drivers for Linux now supports Ubuntu, RedHat and SUSE. This is Microsoft’s first ODBC Driver for SQL Server release supporting Ubuntu. You can now enjoy enterprise level support while connecting to SQL Server from Ubuntu. It also updates the drivers to unixODBC driver manager 2.3.1 support.

Full interoperability with distributions of Linux is something I’ve waited a long, long time for.  This is one tiny step closer.

Filtered Indexes Are Tricky

Kevin Eckart investigates filtered indexes not being used:

This warning is telling me that Parameterization is to blame for the filtered index not being used. From here, I see 3 options.

  1. Remove the parameters and use literals. (not practical)

  2. Use Dynamic SQL

  3. Use OPTION(RECOMPILE) at the bottom of the query.

This is the classic issue with filtered indexes:  you expect them to be used, but when you check the plan, they aren’t.

Installing TFS

Sifiso Ndlovu shares some tips regarding installing Team Foundation Server:

Although, you can get away with using convenience names (i.e. a dot, (local), or locahost) as SQL Server identifier name during the configuration of a SharePoint server farm as shown in Figures 2 & 3, such a practice is not allowed during configuration of TFS (as shown in Figure 1).

Check this out before installing your own TFS server.  Or use Visual Studio Online or GitHub or BitBucket or …

Optimizing Clustered Columnstore Inserts

Sunil Agarwal talks about clustered columnstore index logging:

There you have it; our recommendation is to choose a batchsize of > 102400 to get benefits of minimal logging with clustered columnstore index. In the next blog, I will discuss parallel bulk import and locking optimizations.

My experience is that you really want to insert in large batches.

Power BI Matrices

Meagan Longoria wants to stack groups of measures in Power BI:

To my surprise, Power BI only lets you put multiple values on columns in a matrix. You can’t stack metrics vertically. Note: this is true as of 8 Jan 2016 but may change in the future. If you agree that this should be a feature in Power BI, please make your voice heard and vote for this idea on the Power BI forum and encourage others to vote for it as well.

The answer is a little complex.  Considering how frequently Power BI gets updated, hopefully they’ll make this a bit easier in the near future.


May 2017
« Apr