Press "Enter" to skip to content

Curated SQL Posts

SQL Server on a Windows Container

Kevin Chant lives dangerously:

In this post I want to cover an interesting Windows Container with SQL Server installed experiment that I did. Because it was fairly involved, and it took a while.

In fact, this is the experiment I was talking about in my recent post about recent Azure Data Studio updates. Which you can read about in detail here.

My general philosophy is to avoid Windows containers at all costs, though I’m glad that there are some more adventurous than I.

Comments closed

Adaptive Query Execution with Spark SQL

Wenchen Fan, Herman von Hoevell, and MaryAnn Xue announce Adaptive Query Execution for Apache Spark 3.0:

Over the years, there’s been an extensive and continuous effort to improve Spark SQL’s query optimizer and planner in order to generate high-quality query execution plans. One of the biggest improvements is the cost-based optimization framework that collects and leverages a variety of data statistics (e.g., row count, number of distinct values, NULL values, max/min values, etc.) to help Spark choose better plans. Examples of these cost-based optimization techniques include choosing the right join type (broadcast hash join vs. sort merge join), selecting the correct build side in a hash-join, or adjusting the join order in a multi-way join. However, outdated statistics and imperfect cardinality estimates can lead to suboptimal query plans. Adaptive Query Execution, new in the upcoming Apache SparkTM 3.0 release and available in the Databricks Runtime 7.0 beta, now looks to tackle such issues by reoptimizing and adjusting query plans based on runtime statistics collected in the process of query execution.

One of the biggest advantages of SQL as a fourth-generation language is that the database engine (whether that be SQL Server, Oracle, or Spark) gets the opportunity to write and re-write the set of operations needed to solve a query to try to find the best path which returns the same result set. These optimizations aren’t perfect, as any query tuner can tell you, but they can go a long way.

Comments closed

How Writing to Parquet Works

Dmitry Tolpeko walks us through the algorithm for writing to Parquet format:

After writing first 100 values for a column (for 100 rows), the Parquet writer checks if this 100-values column content exceeds the specified page size (default is 1 MB).

If the raw data size for the column does not exceed the page size threshold then the next page size check is constantly adjusted based on the actual column size, so it neither checked after every column value nor after every 100 values. Thus the page size is not the strict limit.

If the raw data size size exceeds the page size, the column content is compressed (if a compression is specified for the Parquet file), and flushed into the Page store for the column.

This is a nice explanation of the process.

Comments closed

Lock Waits are Really Blocking Waits

Erik Darling shows that the meter’s only running on lock waits when there’s a blocking condition:

My workload is Highly Contrived™ so the avg wait and query duration line up. In real life, you probably won’t have queries that were only ever waiting on locks to be released.

But it’s worth making some comparisons like this when you’re having blocking problems, especially when you tie wait times and durations in with cpu time.

If cpu is very low but duration is high, generally, you’ve found blocked queries.

Read on for the demo, as well as an expired coupon.

Comments closed

Surviving an Audit: Backup History

Dave Bland talks about surviving an audit:

Don’t volunteer information

Only provide the evidence they are asking for, nothing more.  You want to avoid giving them things that they do not ask for.  By doing so, you might expose something that you don’t want them to see.  The last thing you want to do, is give them a reason for documenting a finding. Another thing to remember is the audits have a defined scope.

This is some of the best advice about audits out there, particularly when combined with the other points Dave makes about being honest and doing the right thing. Dave also ties in some information about backup history as a particular auditing example.

Comments closed

Replicating Permissions with dbatools

Claudio Silva shows how you can take one user’s permission set and apply it to another user with dbatools:

What are the steps involved in a process like this?

Generally speaking, the quickest way I know is:
1. Get the permissions of the source login/user (For each database because we don’t want to miss any permission)
2. Save them to a .sql file
3. Open the file and replace ‘srcUser’ by ‘newUser’
4. Execute

Claudio shows how you can accomplish this quickly.

Comments closed

Finding the Power BI Local Tabular Instance

Emanuele Meazzo shows which port your local SSAS Tabular instance is running on as you navigate through Power BI:

I’m sure you know that PowerBI tabular models are 100% the same as SSAS tabular models, hence you can use SSMS to connect to a PowerBI Tabular model as if it was SSAS (because it is), to do whatever you want, for example scripting out the objects that you’ve created in the PowerBi Desktop GUI or extract the model to deploy it in SSAS.

The only issue to do so is that PowerBI Desktop each time you open a report creates the local SSAS Tabular instance on a random port, so it’s not like you can save a connection string and that’s it

But Emanuele has a quick Powershell script to find the port for you.

Comments closed

Preventing Copy and Paste of Text in the Messages Tab

Solomon Rutzky discovered something kind of funny about SQL Server Management Studio:

While researching some odd behavior in SQL Server Management Studio (SSMS) I ran across something even stranger. I was testing the effect of character 0 (sometimes referred to as “NUL”) in the “Messages” tab. For those who have not worked with the C programming language (or similar languages), character 0 (usually indicated by the \0 escape sequence) is the “null terminator” for strings. Meaning, this character marks the end of this string and anything after it should not be displayed. 

In Solomon’s usual fashion, you get a thorough testing of the scenario.

Comments closed

Sorting a Power BI Table by Multiple Columns

Jon Fletcher shows us how to sort a Power BI table by more than one column:

A common request that is raised by clients is how to sort a table in Power BI by multiple columns, in the same way you can in Excel.
For a while, there was no way (at least no easy way) to do this until the Power BI March 2020 update.

I learnt this tip from the following YouTube video:
https://www.youtube.com/watch?v=ik0K1H9j2Uc
Full credit to Dhruvin Shah, check his video out.

I would call this feature moderately discoverable—once you see how to do it, you can say “Oh, that makes sense.” But it’s not something I would necessarily have thought to do without this prompting.

Comments closed

Securing a Kafka Deployment

Rick Spurgeon and Nikoleta Verbeck give us some tips on securing Apache Kafka:

Confluent Platform provides Role-Based Access Control (RBAC), which addresses the gaps listed above. RBAC and ACLs perform similar functions but approach the problem differently and can be used independently or cooperatively. RBAC is powered by Confluent’s Metadata Service (MDS), which integrates with LDAP and acts as the central authority for authorization and authentication data. RBAC leverages role bindings to determine which users and groups can access specific resources and what actions can be performed within those resources (roles). RBAC is empowered on the Kafka cluster by way of Confluent Server. Confluent Server is a fully compatible Kafka broker which integrates commercial security features like RBAC.

The list of items is pretty straightforward with no major surprises.

Comments closed