Press "Enter" to skip to content

Curated SQL Posts

Fixing Formula.Firewall Issues in Power Query

Imke Feldmann shorts out a firewall issue:

Formula.Firewall issues can hit you when designing your queries or even “out of the blue” when suddenly refreshes in the service are failing due to changes in the query evaluation.
You will find a lot of methods published on the internet which are good and cover different scenarios. But there is also a very quick fix method that I learned from Miguel Escobar that I want to demonstrate in this post. This will basically circumvent the data privacy level, so make sure that you understand the implications (risk of data leakage from one source to another). If not, please read Miguels article first!

After reading Miguel’s post, read on for a fix.

Comments closed

Certifying Content in Power BI

Soheil Bakhshi certifies the quality of this Power BI content:

In the previous post, we discussed that a Power BI administrator must enable certification and grant sufficient rights to the security groups. Therefore, all members of the specified security group are authorised to certify the content. If you are a Power BI administrator, follow these steps to do so:

This post is a step-by-step guide to enabling content certification, as well as how to certify specific types of content.

Comments closed

File Seeding with dbt

Ust Oldfield sneaks in a file:

File seeding is a useful way of maintaining and deploying static, or very slowly changing, reference data for a data model to use repeatably and reliably across environments, whilst benefitting from source control.

If you aren’t in the Databricks world, this also feels like a job for DVC.

Comments closed

Sharing Results between Notebooks with MSSparkUtils

Liliam Leme provides an answer to a common Synapse Spark pool question:

I’ve been reviewing customer questions centered around “Have I tried using MSSparkUtils to solve the problem?”

One of the questions asked was how to share results between notebooks. Every time you hit “run” in a notebook, it starts a new Spark cluster which means that each notebook would be using different sessions. Making it impossible to share results between executions of notebooks. MSSparkUtils offers a solution to handle this exact scenario. 

Read on to see what MSSparkUtils is and how it helps in this case.

Comments closed

Becoming Familiar with MLflow

Tomaz Kastrun continues an advent series on Azure ML:

MLFlow is an open-source framework for registering, managing and tracking machine learning models. It is multiplatform, bringing consistent model training and model consumption across different platforms. This means, that training a model locally and uploading it to Azure or training a model on remote compute instances and downloading it, is a great feature for MLflow.

You can use MLflow with Azure CLI, Azure Python SDK or in the studio and it will deliver a consistent experience (note, some functionalities are limited to the language).

Click through for a quick overview of MLflow.

Comments closed

Running Power BI Report Server

Reza Rad stays on-premises:

Power BI is not only a cloud-based reporting technology. Due to the demand for some businesses to have their data and reporting solutions on-premises, Power BI also has the option to be deployed fully on-premises. Power BI on-premises hosting is called Power BI Report Server. This post concerns using Power BI in a fully on-premises solution with Power BI Report Server.

This post will teach you everything you need about the on-premises world of Power BI. You will learn how to install Power BI Report Server, learn all requirements and configurations for the Power BI Report Server to work correctly, and see all the pros and cons of this solution. At the end of this post, you will be able to decide if Power BI on-premises is the right choice for you, and if it is, then you will be able to set a Power BI on-premises solution up and running easily.

I used Power BI Report Server for a few years. My short version is that it’s really useful if you aren’t allowed to use Power BI Online (as was my case) but if you know what’s in the Online version, you’ll see just how much you’re missing out on.

Comments closed

Checking if Cross-Database Ownership Chaining is On

Tom Collins performs a check:

Cross-database ownership chaining is a SQL Server  security feature allowing database users  access to other database objects hosted on the same SQL server instance, in the case where database  users don’t have access granted explicitly

Tom shows us whether it is on as well as how to enable it. I’d recommend not enabling it at all and using module signing instead.

Comments closed

Fixing VertiPaq Analyzer Dictionary Size Errors

Marco Russo troubleshoots an issue:

There are cases where the dictionary size reported by VertiPaq Analyzer (used by DAX Studio, Bravo for Power BI, and Tabular Editor 3) does not correspond to the actual memory required by the dictionary. However, the number reported is technically correct because it represents the memory currently allocated for the dictionary. The issue is that – after a refresh – this memory amount is larger than the actual memory required for hash-encoded columns.

Read on to learn what the consequences are and how you can resolve this in Power BI Desktop as well as in Analysis Services.

Comments closed

Synapse Runtime for Spark 3.3 Now in Public Preview

Estera Kot has an announcement:

We are excited to announce the preview availability of Apache Spark™ 3.3 on Synapse Analytics. The essential changes include features which come from upgrading Apache Spark to version 3.3.1 and upgrading Delta Lake to version 2.1.0.

Check out the official release notes for Apache Spark 3.3.0 and Apache Spark 3.3.1 for the complete list of fixes and features. In addition, review the migration guidelines between Spark 3.2 and 3.3 to assess potential changes to your applications, jobs and notebooks.

There’s a lot in there, though I did snicker a bit at log4j 2 being more secure than log4j v1 given what we saw last year, though that gaping hole was fixed.

Comments closed

Bringing Order to a Columnstore Index

Tibor Karaszi puts columnstore ducks in a row:

Data for a columnstore index is divded in groups of approximate 1 million rows, rowgroups. Each rowgroup has a set of pages for each column. The set of pages for a column in a rowgroup is called a segment. SQL Server has meta-data for the lowest and highest value for a segment. There are no SEEKs in a columnstore index. But, SQL Server can use this meta-data to skip reading segments, with the knowledge that “this segment cannot contain any data that I need based on my predicates in my WHERE clause”.

Also, you might want to do these operations using MAXDOP 1, so we don’t have several threads muddling our neat segment alignment.

I’m not sure I actually set the ORDER BY clause on columnstore indexes all that often—a quick mental survey says maybe once, though that could be my own failing rather than a statement on the utility of ordered columnstore indexes.

Comments closed