Press "Enter" to skip to content

Curated SQL Posts

Conditional Replacement in Power Query

Soheil Bakhshi shows us how to do conditional replacement based on the values of other columns using Power Query:

Power Query (M) made a lot of data transformation activities much easier and value replacement is one of them. You can easily right click on any desired value in Power Query, either in Excel or Power BI, or other components of Power Platform in general, and simply replace that value with any desired alternative. Replacing values based on certain conditions however, may not seem that easy at first. I’ve seen a lot of Power Query (M) developers adding new columns to accomplish that. But adding a new column is not always a good idea, especially when you can do it in a simple single step in Power Query. In this post I show you a quick and easy way to that can help you handling many different value replacement scenarios.

Imagine you have a table like below and you have a requirement to replace the values column [B] with the values of column [C] if the [A] = [B].

Click through for the solution.

Comments closed

Querying Apache Druid

Manish Mishra takes us through the basics of querying from Apache Druid:

I would not mind quoting the Druid documentation for this purpose:  “Druid is a data store designed for high-performance slice-and-dice analytics (“OLAP“-style) on large data sets. Druid is most often used as a data store for powering GUI analytical applications, or as a backend for highly-concurrent APIs that need fast aggregations.”

You might be wondering where is “SQL” in that? Actually, the fact is Druid is designed for special kind of SQL workloads which we can relate with powering the GUI analytical applications which require low latency query response. But in this post, we will only look in the “how part” of it using Druid to quickly run queries.

Click through to see how.

Comments closed

Building the Right Architecture for the Job

Gogula Aryalingam takes us through an example where the newest and most expensive tools aren’t the best for the job at hand:

When Azure SQL Data Warehouse was chosen to implement a multi-dimensional data warehouse, it may have seemed like the ideal choice. Why? because it was plain to see: keywords: “SQL”, “Warehouse”. However, no, SQL Data Warehouse is ideal only when you have data loads that are quite high, not when it is only several 100GBs. Armed with a few more reasons as to why not (A good reference for choosing Azure SQL Data Warehouse), I had confronted them. But the rebuke then was that they did get good enough performance, and that cost wasn’t a problem. Until of course a few months later when complex queries started hitting the system, and despite being able to afford that cost, the value of paying that amount did not seem worth it.

Having a good architectural understanding of the Azure or AWS platform—even if you aren’t deeply familiar with all of the tools—can help avoid these types of problems.

Comments closed

Persistent Disk Storage in Docker

Max Trinidad takes us through some of the basics of persistent disk storage in Docker:

Containers are perfectly suited for testing, meant to fast deployment of a solution, and can be easily deployed to the cloud. It’s cost effective!

Very important to understand! Containers disk data only exist while the container is running. If the container is removed, that data is gone.

So, you got to find the way to properly configure your container environment to make the data persist on disk.

Click through for an example.

Comments closed

Included Columns on Filtered Indexes

Rob Farley take a look at included columns on filtered indexes:

First let’s think a little about indexes in general.

An index provides an ordered structure to a set of data. (I could be pedantic and point out that reading through the data in an index from start to end might jump you from page to page in a seeming haphazard way, but still as you’re reading through pages, following the pointers from one page to the next you can be confident the data is ordered. Within each page you might even jump around to read the data in order, but there is a list showing you which parts (slots) of the page should be read in which order. There really is no point in my pedantry except to answer those equally pedantic who will comment if I don’t.)

And this order is according to the key columns – that’s the easy bit that everyone gets. It’s useful not only for being able to avoid re-ordering the data later, but also for being able to quickly locate any particular row or range of rows by those columns.

Rob does a great job of covering some of the nuances of filtered indexes.

Comments closed

Deploying Azure Databricks in a Custom VNET

Abhinav Garg and Anna Shrestinian explain how you can use VNET injection with Azure Databricks:

To make the above possible, we provide a Bring Your Own VNET (also called VNET Injection) feature, which allows customers to deploy the Azure Databricks clusters (data plane) in their own-managed VNETs. Such workspaces could be deployed using Azure Portal, or in an automated fashion using ARM Templates, which could be run using Azure CLI, Azure Powershell, Azure Python SDK, etc.

With this capability, the Databricks workspace NSG is also managed by the customer. We manage a set of inbound and outbound NSG rules using a Network Intent Policy, as those are required for secure, bidirectional communication with the control/management plane. 

This is a good article if the defaults won’t get past corporate security.

Comments closed

SQL Server Numerology

Denis Gobo has some fun with SQL Server numbers of importance:

I was troubleshooting a deadlock the other day and it got me thinking…. I know the number 1205 by heart and know it is associated to a deadlock.  What other numbers are there that you can associate to an event or object or limitation. For example 32767 will be known by a lot of people as the database id of the ResourceDb, master is 1, msdb is 4 etc etc.

So below is a list of numbers I thought of

Leave me a comment with any numbers that you know by heart

BTW I didn’t do the limits for int, smallint etc etc, those are the same in all programming languages…so not unique to SQL Server

Read on for Denis’s list.

Comments closed

Ways to Use Power BI Dataflows

Melissa Coates gives us three use patterns for Power BI Dataflows:

In this first option, Power BI handles everything. We use the web-based Power Query Online tool for structuring the data. Power BI handles scheduling the data refresh.

The underlying data behind the dataflow is stored in a data lake. However, since it’s fully managed, this data lake is not directly accessible or visible to the customer. As with most cloud-based implementations, the infrastructure is hidden under the covers. This is what is happening if your users are utilizing dataflows currently but you haven’t specified a data lake account in the Power BI admin center.

Melissa gives us a great summary of the three patterns, so read the whole thing.

Comments closed

Comparing Oracle RAC to SQL Server Availability Groups

Kellyn Pot’vin-Gorman explains the difference between Oracle RAC and SQL Server Availability Groups:

There is a constant rumble among Oracle DBAs- either all-in for Oracle Real Application Cluster, (RAC) or a desire to use it for the tool it was technically intended for. Oracle RAC can be very enticing- complex and feature rich, its the standard for engineered systems, such as Oracle Exadata and even the Oracle Data Appliance, (ODA). Newer implementation features, such as Oracle RAC One-Node offered even greater flexibility in the design of Oracle environments, but we need to also discuss what it isn’t- Oracle RAC is not a Disaster Recovery solution.

Click through for a good high-level contrast, as these are quite different products.

Comments closed