Press "Enter" to skip to content

Curated SQL Posts

Parallelization in DirectQuery

Chris Webb shares some insight:

Recently we announced an important new optimisation for DirectQuery datasets: the ability to run (some) of the queries generated by a single DAX query in parallel. You can read the blog post here:

https://powerbi.microsoft.com/en-za/blog/query-parallelization-helps-to-boost-power-bi-dataset-performance-in-directquery-mode/

A few of us on the Power BI CAT team have tested this out with customers and seen some great results, so I thought I’d write a post illustrating the effect this optimisation can have and explaining when it can and can’t help.

Chris has examples of great success, as well as not-so-great success and utter failure, and explains the why behind each outcome.

Comments closed

Using Security Groups with Power BI Row-Level Security

Soheil Bakhshi has a recommendation for us:

However, managing RLS roles can be challenging if you have a large number of users or if your user base changes frequently. You need to manually assign each user account to one or more roles, which can be time-consuming and error-prone. Moreover, if a user changes their position or leaves the organisation, you must update their role membership accordingly.

This is where Security Groups become handy. 

Soheil explains why and then gives us a step-by-step guide on what we can do to use security groups instead.

Comments closed

Customizing Shiny Apps with shinydashboard

Mandy Norrbo isn’t satisfied with the defaults:

Using {shinydashboard} is great for creating dashboard prototypes with a header-sidebar-body layout. You can quickly mock up a professional looking dashboard containing a variety of outputs, including plots and tables.

However, after a while, you’ll probably have had enough of the “50 shades of blue” default theme. Or, you might have been asked to to follow company branding guidelines, so you need to replace the default colours with custom ones.

Click through for a walkthrough of what is available for customization and how to do it.

Comments closed

Startup Pains with Large Memory-Optimized Tables

Brent Ozar takes us through a problem:

Well, here it is in 2023, and recently I’ve talked to a couple of architects who wish they could go back in time and watch that video. In both cases, they suffered from the same issue.

The short story is that the more data you put into durable In-Memory OLTP tables – and even just 5GB of data can hit this issue – the more your startups, failovers, and restores turn into long stories, to the point where other databases on your SQL Server are practically unusable.

Click through for the scenario. In-Memory OLTP is one of those features which frustrates me to no end. It had the potential to be outstanding, but due to the difficulty of further development (e.g., getting cross-database queries to work when you have a mix of memory-optimized and non-optimized databases and tables) and the limitations of what it actually made faster (mostly inserts, not selects), the actual number of great use cases for the product is a lot lower than I think it could have been.

Comments closed

Unmasking Dynamic Data Masking via Powershell

Jana Sattainathan needs to see all the details:

Today, I had to unmask all the columns I had helped mask using Dynamic Data Masking. This simple post assumes that you are a privileged user with the ability to drop “Column Masking”!

In other words, this isn’t exploiting the mechanics of Dynamic Data Masking to view data you shouldn’t be able to; it’s about removing Dynamic Data Masking from columns with it enabled.

Comments closed

Performance Tuning a Dedicated SQL Pool

Sarath Sasidharan has some guidance for us:

Synapse Dedicated pools have been battle tested at enterprise customers across the globe. We deal with data in the magnitude of PetaBytes. Synapse can provide you with the scale of the cloud and the high performance required for your enterprise-grade requirements.  The key to maximizing your performance is to follow best practices, check out best practices for dedicated SQL pools in Azure Synapse Analytics

Failure to do so causes performance issues. In such scenarios, is it important to understand where the bottlenecks are. This blog focuses on the different steps a query goes through; from the time the query is fired from the client until it returns back.  Delay caused in any of the steps would impact the overall run-time of the query and hence indicate degraded performance.

Click through for a walkthrough of each step along the way, potential problems you could run into, and remediations for those problems. Much of the advice is similar to what you’d get with SQL Server, though there are differences interspersed throughout each level.

Comments closed

Working with Remote Jupyter Books in Azure Data Studio

Steve Hughes reaches across the internet:

When working with Azure Data Studio and its support of Jupyter books, you will find there is an option for remote Jupyter books. As shown in the image below, you can open that Jupyter book and follow through the dialogue for a couple of Microsoft books that are readily available.

Click through to see how this option differs from standard Jupyter books (which are themselves different from Jupyter notebooks) and how you can create one.

Comments closed

Data Pipelines and Data Mesh

Jean-Georges Perrin answers a burning question:

I keep having questions about data pipelines. Data pipelines in Data Mesh is a topic I should tackle. So… Is the data pipeline the root of all evil?

Jean-Georges’s answer is quite in line with one of my favorite phrases: “Short answer: no, with an ‘if’; long answer: yes, with a ‘but.'” Read on for some thoughts on data pipelines and what the data mesh concept does to minimize harm.

Comments closed

Creating an Elasticsearch Pipeline

The Big Data in Real World team builds a pipeline:

A pipeline is a definition of a series of processors that are to be executed in the same order as they are declared. 

Think of a processor as a series of instructions that will be executed.

In this post we are going to create a pipeline to add a field named doc_timestamp to all the documents that are added to the index.

Click through for the process. In Elasticsearch, ingest pipelines aren’t for moving data but rather for performing some common operations or tasks prior to indexing the data.

Comments closed

Role-Based Access Controls in Amazon OpenSearch

Scott Chang and Muthu Pitchaimani show how to assign rights in Amazon OpenSearch to IAM groups:

Amazon OpenSearch Service is a managed service that makes it simple to secure, deploy, and operate OpenSearch clusters at scale in the AWS Cloud. AWS IAM Identity Center (successor to AWS Single Sign-On) helps you securely create or connect your workforce identities and manage their access centrally across AWS accounts and applications. To build a strong least-privilege security posture, customers also wanted fine-grained access control to manage dashboard permission by user role. In this post, we demonstrate a step-by-step procedure to implement IAM Identity Center to OpenSearch Service via native SAML integration, and configure role-based access control in OpenSearch Dashboards by using group attributes in IAM Identity Center. You can follow the steps in this post to achieve both authentication and authorization for OpenSearch Service based on the groups configured in IAM Identity Center.

Click through for the process.

Comments closed