Press "Enter" to skip to content

Author: Kevin Feasel

Populating a Data Vault Model with Azure Data Factory

Rayis Imayev gives us an example of ELT into a Data Vault model using Azure Data Factory:

To make a full transition from the existing  DW model to an alternative Data Vault I removed all Surrogate Keys and other attributes that are only necessary to support Kimball data warehouse methodology. Also, I needed to add necessary Hash keys to all my Hub, Link and Satellite tables. The target environment for my Data Vault would be SQL Azure database and I decided to use a built-in crc32 function of the Mapping Data Flow to calculate hash keys (HK) of my business data sourcing keys and composite hash keys of satellite tables attributes (HDIFF).

Data Vault is somewhere on my list of things to learn. It’s not at the top of the list, but that’s not a slight against it.

Comments closed

Moving a Power BI Data Model to Tabular

Ginger Grant provides some tips on migrating from a Power BI data model to an Analysis Services Tabular model:

Unless you are upgrading to analysis services on SQL Server 2019, chances are you are going to have to review your DAX code and make some modifications as DAX on the other versions of SQL Server are not the same as Power BI. I was upgrading to AS on SQL Server 2016, there were some commands that I had to manual edit out of the JSON file. If you have any new DAX commands, take them out of your Power BI Model which means you will not have to manually edit the JSON file to remove them when the new commands are flagged as errors. Make sure your Power BI Model does not include commands such as SELECTEDVALUE, GENERATESERIES as well as all of the automatically generated date hierarchies. After your Power BI desktop file is clean, leave it running as you are going to need to have it running for the next step.

Click through for more details.

Comments closed

Auditing Azure Analysis Services

Kasper de Jonge shows how you can audit an Azure Analysis Services cube:

So the question was: how can I see who connected to my AS Azure database and what queries where send? Initially I thought of ways I used to do this in the on premises world. Capture profiler traces or XEvents by writing code and then store it somewhere for processing. It looks like was not alone in these, even the AS team itself had ways to capture XEvents and store them: https://azure.microsoft.com/en-us/blog/using-xevents-with-azure-analysis-services/

But it turns out it is much more smooth, simple and elegant by leveraging Azure’s own products. In this case we will be using Azure Log Analytics. It already documented in the official documentation here.

Click through for a demo.

Comments closed

Using Java in SQL Server 2019

Niels Berglund has an update on writing Java code in SQL Server 2019:

In CTP 2.5 and onwards when you write Java code for SQL Server you implement your code using the Microsoft Extensibility SDK for Java, (SDK). The SDK acts sort of like an interface as it exposes abstract classes that your code need to extend/target, (more about that later).

The SDK comes in the form of a .jar file, and you download the SDK from here.

Niels dives deep into the topic, so set aside a bit of time to read through this one.

Comments closed

Amazon Redshift ETL Tips

The Blendo team shares a few tips around ETL’ing data to Amazon Redshift:

2. The WLM Method
Use Amazon Redshift’s WLM (workload management) for defining a dedicated queue for the ETL process. Configuring the ETL queue with a small number of slots will help in avoiding excessive COMMITs. Also, avoid COMMITing separately for each transaction since commits are expensive.
Instead, surround multiple steps of the ETL process by a BEGIN…END statement. You can perform COMMIT only after all transformation logic is executed.

Click through for the set of tips.

Comments closed

Message Buses on the Market

Alex Woodie walks us through some of the options available for message buses:

Apache Kafka

Apache Kafka is a distributed open source messaging bus that was written in Java and Scala. The software implements a publish and subscribe messaging system that’s capable of moving large amounts of event data from sources to sinks, in a high-throughput manner with minimal latency and strong consistency guarantees. The software relies on Apache Zookeeper for management of the underlying cluster.

Kafka is based on the concept of producers and consumers. Event data originating from producers is stored timestamped partitions that are housed within Kafka topics. Meanwhile, consumer processes can read the data stored in Kafka partitions. Kafka automatically replicates partitions across multiple brokers (or nodes in the cluster), which allows Kafka to scale its message streaming service in a fault-tolerant manner.

Click through for descriptions of several good options. And if you want a big list, queues.io has one for you.

Comments closed

TDE + Copy-Only Backups

Jovan Popvic shows how you can take a copy-only backup of a TDE-protected database in Azure SQL Managed Instance:

We are recommending to rely on automatic backups only, with the build-in restore functionality to restore a database from a point-in-time, restore a database to another instance (for instance from production to dev)or Geo-restore functionalities to move your database. These automatic backups can be kept up to 35 days. These built-in automatic backups are secure and enables you to be fully compliant. In this scenario COPY_ONLY backups are only in some specific cases.

Strict TDE protection don’t allow you to take your own custom backups. If you need a backup of a TDE protected database, you would need to temporary disable TDE, take a backup, and then enable TDE again.

It’s not really a Managed Instance-specific thing, but rather TDE: if you want to take a non-encrypted backup of an encrypted database, you’ve got to kill encryption first.

Comments closed

Creating R Visuals in Power BI

Dave Mason takes us through showing an R-based visual in Power BI:

The R engine isn’t included with the installation of Power BI desktop. I won’t go into detail on this, so just know you’d need to install that separately. I had already installed the R component as part of Machine Learning Services for SQL Server 2017. I also had RStudio installed. Within Power BI desktop, take a moment to click File | Options and settings | Options to open the Options page. Then click R scripting in the list of Global Options. Here you’ll see options to set the R home directory and the desired R IDE.

Click through for the demo.

Comments closed