Press "Enter" to skip to content

Author: Kevin Feasel

TDE With Database Mirroring

I have a post on setting up database mirroring when the underlying database uses Transparent Data Encryption:

 Now it’s time to take some backups. First, let’s back up the various keys and certificates:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
USE [master]
GO
--Back up the service master key
--Note that the password here is the FILE password and not the KEY password!
BACKUP SERVICE MASTER KEY TO FILE = 'C:\Temp\ServiceMasterKey.key' ENCRYPTION BY PASSWORD = 'Service Master Key Password';
GO
--Back up the database master key
--Again, the password here is the FILE password and not the KEY password.
BACKUP MASTER KEY TO FILE = 'C:\Temp\DatabaseMasterKey.key' ENCRYPTION BY PASSWORD = 'Database Master Key Password';
GO
--Back up the TDE certificate we created.
--We could create a private key with password here as well.
BACKUP CERTIFICATE [TDECertificate] TO FILE = 'C:\Temp\TDECertificate.cert'
    WITH PRIVATE KEY (FILE = 'C:\Temp\TDECertificatePrivateKey.key', ENCRYPTION BY PASSWORD = 'Some Private Key Password');
GO

Click through for the details.

Comments closed

SSAS And Power BI Performance Issue

Chris Webb describes an issue with SSAS Multidimensional and Power BI-generated DAX causing a performance problem:

This query has something in it – I don’t know what – that means that it cannot make use of the Analysis Services Storage Engine cache. Every time you run it SSAS will go to disk, read the data that it needs and then aggregate it, which means you’ll get cold-cache performance all the time. On a big cube this can be a big problem. This is very similar to problems I’ve seen with MDX queries on Multidimensional and which I blogged about here; it’s the first time I’ve seen this happen with a DAX query though. I suspect a lot of people using Power BI on SSAS Multidimensional will have this problem without realising it.

This problem does not occur for all tables – as far as I can see it only happens with tables that have a large number of rows and two or more hierarchies in. The easy way to check whether you have this problem is to refresh your report, run a Profiler trace that includes the Progress Report Begin/End and Query Subcube Verbose events (and any others you find useful) and then refresh the report again by pressing the Refresh button in Power BI Desktop without changing it at all. In your trace, if you see any of the Progress Report events appear when that second refresh happens, as well as Query Subcube Verbose events with an Event Subclass of Non-cache data, then you know that the Storage Engine cache is not being used.

This doesn’t look to be a quick fix, so do read the whole thing to help figure out how to avoid this issue.

Comments closed

HBase Transactions

George Leopold describes Omid:

The transaction manager utilizes a lock-free approach to support multiple clients and relies on a centralized conflict detection component to resolve write-set collisions among concurrent transactions. Developers added that Omid requires no modifications to the underlying HBase key-value data store.

It also features a simplified API that mimics transaction manager APIs in relational databases. Client and server configuration processes also were simplified to help both application developers and system administrators.

Filing this one under the “What’s old is new again” category.

Comments closed

Query Store Isn’t A Forensics Engine

Grant Fritchey shows that Query Store has a limited capability of finding “ill-behaving” queries at a point in time:

Here’s a great question I received: We had a problem at 9:02 AM this morning, but we’re not sure what happened. Can Query Store tell us?

My first blush response is, no. Not really. Query Store keeps aggregate performance metrics about the queries on the database where Query Store is enabled. Aggregation means that we can’t tell you what happened with an individual call at 9:02 AM…

Well, not entirely true.

Query Store isn’t a total solution for “Why was the system slow at XX:XX?” types of questions.  This does not diminish its value as long as you do not try to treat it as your only monitoring solution.

Comments closed

Slicer Filter Workaround

Reza Rad has a workaround for cases in which you want to filter a Power BI slicer:

The idea of this blog post came from a question that one of students in my Power BI course asked to me, and I’ve found this as a high demand in internet as well. So I’ve decided to write about it.

You might have too many items to show in a slicer. a slicer for customer name when you have 10,000 customers isn’t meaningful! You might be only interested in top 20 customers. Or you might want to pick few items to show in the slicer. With all other visual types (Such as Bar chart, Column chart, line chart….) you can simply define a visual level filter on the chart itself. Unfortunately this feature isn’t supported at the time of writing this post for Slicers. However the demand for this feature is already high! you can see the idea published here in Power BI user voice, so feel free to vote for such feature :)

As Reza notes, this might get resolved fairly soon.  Until then, check out his solution.

Comments closed

Don’t Use Double Dot

Chris Bell warns against using double dot syntax:

I am finding more and more cases where SQL code is being created using the double dot or period for the 2 part naming convention.

For example, instead of using dbo.table1 I am seeing ..table1.

I don’t know who suggested this in the first place, but it is not a good idea. Sure it works and does what you expect, but there is a HUGE risk with doing this. When you use the .. syntax, you are telling the code to use whatever the default schema is for the user that is running the query. By default that is the dbo schema, but there is no guarantee that all systems are going to be that way.

Read on to understand why this is a big deal.

Comments closed

Transaction Names Are Case Sensitive

Clive Strong notes that transaction names in SQL Server are case sensitive:

I had an issue today running a colleague’s code (the rollback and commit were commented out, but that is another story). The code failed and I tried to rollback the transaction but received this error message;

Msg 6401, Level 16, State 1, Line 5
Cannot roll back t1. No transaction or savepoint of that name was found.

I can’t remember the last time I named a transaction, but if you are in that habit, it’s important to remember.

Comments closed

SSRS Express And Azure Limitations

William Assaf points out that SQL Server Reporting Services Express Edition cannot connect to Azure SQL Database:

Express editions of SQL Server Reporting Service, from SQL 2016 on down, cannot connect to Azure SQL Databases. Turns out, getting something for free does have some significant limitations.

For example, you’ll see an error message “The Report Server has encountered a configuration error” on a data source page, when creating a new SSRS data source in the Report Manager website. What you may have not noticed on this page was the possible values in the Data Source Type drop down list.

This is an important limitation if you were thinking of living on the free tier of SSRS.

Comments closed

Against Visual Programming Languages

Ian Hellstrom has a critique of visual programming languages for data engineers:

Anyone with a software development background who has ever dealt with visual ETL tools may have marvelled at the lack of proper version control and diff tools that go with it. Some tools come with their own built-in VCS, while others allow you to use any or no VCS at all. The difficulty lies in the fact that the visual representation is often stored as an XML (or JSON) file. So, if a box is moved by 1 pixel, the file is different. You could argue that it’s indeed different because the layout is different, but you could equally make the case that the logic has not changed. This argument is moot though: it is technically possible to ensure that the tool auto-aligns blocks and routes/colours arrows, very much like yEd does (via menu items). Some users may not be happy with the reduced control over the way the flow looks, but others may rejoice that version control has become usable.

ETL (and ORM) tools often auto-generate code that is not particularly tuned for the data source in question. I have encountered many odd nested loops where simple hash joins would have been more appropriate if only the predicates had been pushed down properly (and if only the tool had evaluated blocks lazily). Aggregations and timestamp-based filters are also often a cause for performance issues. Again, performance is technically solvable, so this may be a valid argument against visual tools in data engineering now but perhaps not tomorrow.

This is a good argument against VPLs, although there are a couple of good arguments for VPLs, including how it’s easier to see if the overall architecture of a flow looks correct.  In the end, I like the compromise that Biml offers Integration Services developers:  write code but visualize results.

Comments closed

Comparing Impala To Redshift

Mostafa Mokhtar, et al, have a comparison of Apache Impala to Amazon Redshift:

For this analysis, we used TPC-DS on a 3TB dataset and selected 70 out of 99 the queries that run without any modifications or uses variants on both Redshift and Impala. We wanted to use a larger dataset (similar to what we’ve used in previous benchmarks), but due to Redshift’s data load times, we had to reduce the data size. (Note: This benchmark is derived from the TPC-DS benchmark and, as such, is not directly comparable to published TPC-DS results.)

This is coming from one of the two vendors, so take it with however many grains of salt you’d like.

Comments closed