Press "Enter" to skip to content

Curated SQL Posts

Connecting to Azure Databricks from Power BI

Gerhard Brueckl walks us through the Power BI connector to Azure Databricks:

I work a lot with Azure Databricks and a topic that always comes up is reporting on top of the data that is processed with Databricks. Even though notebooks offer some great ways to visualize data for analysts and power users, it is usually not the kind of report the top-management would expect. For those scenarios, you still need to use a proper reporting tool, which usually is Power BI when you are already using Azure and other Microsoft tools.

So, I am very happy that there is finally an official connector in PowerBI to access data from Azure Databricks! Previously you had to use the generic Spark connector (docs) which was rather difficult to configure and did only support authentication using a Databricks Personal Access Token.

Click through to see how it works.

Comments closed

Azure Data Studio September 2020 Release

Alan Yu announces the September 2020 release of Azure Data Studio:

When trying out notebooks for the first time, many users were not familiar with Markdown, or users would always have to look up the syntax. Over time, we added a Markdown toolbar to help make it easier to remember Markdown syntax, which made many users happy, but we thought we could do even better. We wanted to make it as easy to write in notebook text cells as you would in an email or typing a document.

Through embracing hackathons and open source, and driven by the passion to do more for our users, we are excited to announce Rich Text Mode, also known as WYSIWYG Mode (what-you-see-is-what-you-get).

There’s a lot in this release.

Comments closed

Using Synonyms in SQL Server

Greg Larsen takes us through the ins and outs of synonyms in SQL Server:

Once a database object has been created, and lots of application code has been written that references the object, it becomes a nightmare to rename the object. The nightmare comes from the amount of effort and coordination work required to make the name change without the application failing. If just one place is missed when coordinating the rename, the outcome could be disastrous. This is where a synonym can help minimize the risk associated with renaming a base object.

I’ll admit that I don’t really think about synonyms much and have used them at most a couple of times in my career. I can see where they’d be useful, but that comes at the risk of something going wrong and people not even realizing they exist.

Comments closed

Thoughts on Using Source Control

Kevin Chant shares some thoughts:

In this post I want to cover more thoughts about SQL Server professionals using version control. Because I have had some interesting conversations since my last post about it.

In a previous post I covered how SQL Server professionals can benefit from using version control. Which you can read in detail here.

Now I want to clarify a few things relating to it as well.

Read on for those thoughts.

Comments closed

An Introduction to Data Vault

Tino Zishiri walks us through the basics of the Data Vault modeling technique:

The Data Vault methodology also addresses a common limitation that relates to the dimensional model approach. There are many good things to say about dimensional modelling, it’s a perfect fit for doing analytics, it’s easy for business analysts to understand, it’s performant over large sets of data, the list goes on.

That said, the data vault methodology addresses the limitations of having a “fixed” model. Dimensional modelling’s resilience to change or “graceful extensibility”, as some would say, is well documented. It’s capable of handling changing data relationships which can be implemented without affecting existing BI apps or query results. For example, facts consistent with the grain of an existing fact table can be added by creating new columns. Moreover, dimensions can be added to an existing fact table by creating new foreign key columns, presuming they don’t alter the fact table’s grain.

The most interesting thing to me about Data Vault is that it’s very popular in Europe and almost unheard-of in North America. That’s the impression I get, at least.

Comments closed

Refreshing a Power BI Dataflow without Refreshing Downstream Dataflows

Matthew Roche wants to limit the refresh zone of influence:

The email included a screen shot from the lineage view of a Power BI workspace, some context about working to troubleshoot a problem, and the question “We want to refresh this dataflow, and not have it refresh the downstream dataflows. Is this possible?”

I almost said no, and then I remembered this post and realized the answer was “yes, sort of.”

Click through to see how it all fits together. And I’m in favor of buying Matthew a sword—can’t have too many of those.

Comments closed

The Session Window in Flink

Kundan Kumarr continues a series on windows in Apache Flink:

In the real world, all the work that we do online- Visiting a website, Clicking around the website, do online transactions, and so on are in sessions. We might just go to an e-commerce website like amazon, looking for products, clicking around for a bit, and then stop. All is done within a session. There is a use case where these websites may want to track pages that we visited in a single session. For that, it needs to group all clicks together which are streaming in, based on a session. These streaming use cases can be implemented easily by Flink Session window.

The Session windows assigner groups elements by sessions of activity. Session windows do not overlap and do not have a fixed start and end time. The number of entities within a session window is not fixed. Because it is a user who defines typically how long the session would be. A session window closes when it does not receive elements for a certain period of time, i.e., when a gap of inactivity occurred. For example, once we have been idle on the amazon website let say for 1 minute that is the end of the previous session and if go back to the site after 1 sec it will start a new session. The way it would determine the session is the pause between one click and another click.

Click through for a depiction and an example.

Comments closed

The Problem with VM Backups of SQL Server

Sean Gallardy turns a problem on its head:

Now let’s get to the main point, which is how long the VM stays paused or stunned – remember, this is a “small” or “short” amount of time, one might even say “trivial”. When it is kept this short to where it’s “trivial” as in less than a second then all is good and you most likely won’t notice it except in very high workloads… but we should be running with VSS integration and not VM level so it’s still incorrect, but hey. When this time is not short of trivial then GOOD things start to happen, most notably that high availability kicks in.

I appreciate the framing of this post, as the failover wasn’t a problem; it merely exposes the actual problem.

Comments closed

Space Savings from Separate Date and Time Columns in Power BI

Shabnam Watson runs an experiment:

As you may have already heard, one of the easiest ways to reduce a Power BI model (dataset) size is by splitting DateTime columns into separate Date and Time columns but the question is how much space reduction can you achieve by doing so. As I show in this blog post, the reduction can be significant and up to % 80 or % 90 depending on the number and cardinality of the datetime columns.

That’s a lot of savings.

Comments closed