Press "Enter" to skip to content

Day: June 1, 2020

Adaptive Query Execution with Spark SQL

Wenchen Fan, Herman von Hoevell, and MaryAnn Xue announce Adaptive Query Execution for Apache Spark 3.0:

Over the years, there’s been an extensive and continuous effort to improve Spark SQL’s query optimizer and planner in order to generate high-quality query execution plans. One of the biggest improvements is the cost-based optimization framework that collects and leverages a variety of data statistics (e.g., row count, number of distinct values, NULL values, max/min values, etc.) to help Spark choose better plans. Examples of these cost-based optimization techniques include choosing the right join type (broadcast hash join vs. sort merge join), selecting the correct build side in a hash-join, or adjusting the join order in a multi-way join. However, outdated statistics and imperfect cardinality estimates can lead to suboptimal query plans. Adaptive Query Execution, new in the upcoming Apache SparkTM 3.0 release and available in the Databricks Runtime 7.0 beta, now looks to tackle such issues by reoptimizing and adjusting query plans based on runtime statistics collected in the process of query execution.

One of the biggest advantages of SQL as a fourth-generation language is that the database engine (whether that be SQL Server, Oracle, or Spark) gets the opportunity to write and re-write the set of operations needed to solve a query to try to find the best path which returns the same result set. These optimizations aren’t perfect, as any query tuner can tell you, but they can go a long way.

Comments closed

How Writing to Parquet Works

Dmitry Tolpeko walks us through the algorithm for writing to Parquet format:

After writing first 100 values for a column (for 100 rows), the Parquet writer checks if this 100-values column content exceeds the specified page size (default is 1 MB).

If the raw data size for the column does not exceed the page size threshold then the next page size check is constantly adjusted based on the actual column size, so it neither checked after every column value nor after every 100 values. Thus the page size is not the strict limit.

If the raw data size size exceeds the page size, the column content is compressed (if a compression is specified for the Parquet file), and flushed into the Page store for the column.

This is a nice explanation of the process.

Comments closed

Lock Waits are Really Blocking Waits

Erik Darling shows that the meter’s only running on lock waits when there’s a blocking condition:

My workload is Highly Contrived™ so the avg wait and query duration line up. In real life, you probably won’t have queries that were only ever waiting on locks to be released.

But it’s worth making some comparisons like this when you’re having blocking problems, especially when you tie wait times and durations in with cpu time.

If cpu is very low but duration is high, generally, you’ve found blocked queries.

Read on for the demo, as well as an expired coupon.

Comments closed

Surviving an Audit: Backup History

Dave Bland talks about surviving an audit:

Don’t volunteer information

Only provide the evidence they are asking for, nothing more.  You want to avoid giving them things that they do not ask for.  By doing so, you might expose something that you don’t want them to see.  The last thing you want to do, is give them a reason for documenting a finding. Another thing to remember is the audits have a defined scope.

This is some of the best advice about audits out there, particularly when combined with the other points Dave makes about being honest and doing the right thing. Dave also ties in some information about backup history as a particular auditing example.

Comments closed

Replicating Permissions with dbatools

Claudio Silva shows how you can take one user’s permission set and apply it to another user with dbatools:

What are the steps involved in a process like this?

Generally speaking, the quickest way I know is:
1. Get the permissions of the source login/user (For each database because we don’t want to miss any permission)
2. Save them to a .sql file
3. Open the file and replace ‘srcUser’ by ‘newUser’
4. Execute

Claudio shows how you can accomplish this quickly.

Comments closed

Finding the Power BI Local Tabular Instance

Emanuele Meazzo shows which port your local SSAS Tabular instance is running on as you navigate through Power BI:

I’m sure you know that PowerBI tabular models are 100% the same as SSAS tabular models, hence you can use SSMS to connect to a PowerBI Tabular model as if it was SSAS (because it is), to do whatever you want, for example scripting out the objects that you’ve created in the PowerBi Desktop GUI or extract the model to deploy it in SSAS.

The only issue to do so is that PowerBI Desktop each time you open a report creates the local SSAS Tabular instance on a random port, so it’s not like you can save a connection string and that’s it

But Emanuele has a quick Powershell script to find the port for you.

Comments closed

Preventing Copy and Paste of Text in the Messages Tab

Solomon Rutzky discovered something kind of funny about SQL Server Management Studio:

While researching some odd behavior in SQL Server Management Studio (SSMS) I ran across something even stranger. I was testing the effect of character 0 (sometimes referred to as “NUL”) in the “Messages” tab. For those who have not worked with the C programming language (or similar languages), character 0 (usually indicated by the \0 escape sequence) is the “null terminator” for strings. Meaning, this character marks the end of this string and anything after it should not be displayed. 

In Solomon’s usual fashion, you get a thorough testing of the scenario.

Comments closed

Sorting a Power BI Table by Multiple Columns

Jon Fletcher shows us how to sort a Power BI table by more than one column:

A common request that is raised by clients is how to sort a table in Power BI by multiple columns, in the same way you can in Excel.
For a while, there was no way (at least no easy way) to do this until the Power BI March 2020 update.

I learnt this tip from the following YouTube video:
https://www.youtube.com/watch?v=ik0K1H9j2Uc
Full credit to Dhruvin Shah, check his video out.

I would call this feature moderately discoverable—once you see how to do it, you can say “Oh, that makes sense.” But it’s not something I would necessarily have thought to do without this prompting.

Comments closed