2021-06-28 – Curated SQL

Using ggplot2 to Create a Faceted Histogram plus Curve

Published 2021-06-28 by Kevin Feasel

Overlaying a histogram (possibly facetted) is not something far fetched when analyzing data. Surprisingly, it appears (to the best of my knowledge) that there’s no comfortable out-of-the-box solution in ggplot2, although it can be of course achieved with some lines of code. Here’s my take.

Click through for Sebastian’s version, as well as information on the ggh4x library.

Comments closed

A GETDATE() Workaround when Rewriting Scalar UDFs

Published 2021-06-28 by Kevin Feasel

Erik Darling finds a workaround for one scalar UDF inlining limitation:

SQL Server 2019 has been a bit of a roller coaster ride. In particular, UDF inlining started as I think the most interesting addition to the product. Big brain stuff, for sure.
It has been nerfed quite a bit, with seemingly more and more restrictions added to every cumulative update. Hopefully some of these can be lifted at the feature matures, but I understand how difficult all this is.
People program absolute bloodbaths into functions.
Today, I want to look at one restriction that has a fairly simple workaround: Calling GETDATE().

Click through to see how you can replace calls to GETDATE() without too much hassle.

Comments closed

Methods for Resolving Last Page Insert Contention

Published 2021-06-28 by Kevin Feasel

Esat Erkec shows us three techniques for resolving last page insert contention:

Primary keys constraints uniquely identify each row in the table and automatically creates a clustered index on the underlining table. This duo is frequently used in table design by database developers. At the same time, if this column is decorated with an identity constraint thus we obtain a sequential incremental index key column. The clustered index creates a sorted data structure of the table for this reason a newly inserted row will be added at the end of the clustered index page until that page is filled. When solely one thread adds data to the above-mentioned table, we will never experience a last page insert contention because this problem will occur with concurrent usage of this table. In the high-volume insert operations, the last page of the index is not accessed by all threads concurrently. All threads start waiting for the last page to be accessible to them because the last page is locked by a thread. This bottleneck affects the SQL Server performance and the PAGELATCH_EX wait type begins to be observed too much.

Read on for three techniques, though I’d swap out “use a heap” for “use a uniqueidentifier and watch Jeff Moden’s video on the topic.”

Comments closed

Increasing Refresh Parallelism in Power BI Premium

Published 2021-06-28 by Kevin Feasel

Chris Webb pushes the “go faster” button:

In this case I started the refresh from the Power BI portal so the default parallelism settings were used. The y axis on this graph shows there were six processing slots available, which means that six objects could be refreshed in parallel – and because there are nine partitions in the only table in the dataset, this in turn meant that some slots had to refresh two partitions. Overall the dataset took 33 seconds to refresh.
However, if you connect from SQL Server Management Studio to the dataset via the workspace’s XMLA Endpoint (it’s very similar to how you connect Profiler, something I blogged about here) you can construct a TMSL script to refresh these partitions with more parallelism.

Read on to see how you can do this, as well as the net improvement.

Comments closed

Power BI Cleaner Gen2

Published 2021-06-28 by Kevin Feasel

Imke Feldmann introduces a new version of the Power BI Cleaner:

Today I’m very excited to share with you my first version of a complete rework of my Power BI Cleaner tool. It is way faster the the initial version, overcomes some bugs and limitations of the old version and doesn’t require creating additional vpax files.
On top of that, I’ve created an Excel-version, that adds some very convenient additional features: The option to analyze thin reports and to generate scripts that delete unused measures and hides unused columns automatically.

Click through for instructions on how it all works.

Comments closed

Optimization Parameters in Oracle 19c

Published 2021-06-28 by Kevin Feasel

Kellyn Pot’Vin-Gorman enters a time warp:

As I and the dedicated CSA were working to optimize the ETL load on Oracle in Azure IaaS, I noticed that there wasn’t a significant improvement with physical VM and storage changes as expected. As I dug into the code and database design, I started to document what I’ve summarized above and realized that the database was quite frozen in time. Even though I couldn’t make changes to the code, (per the customer request) I was quickly understanding why we had such limited success and why I was failing miserably as I attempted to put recommended practices in place at the parameter level for the Oracle 19c database from what they had originally.
As I thought this through, I had an epiphany- This database was doing everything in its power to be a 10g or earlier database so why shouldn’t I optimize it like one?

Read on to see what this entails.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

Day: June 28, 2021

Using ggplot2 to Create a Faceted Histogram plus Curve

A GETDATE() Workaround when Rewriting Scalar UDFs

Methods for Resolving Last Page Insert Contention

Increasing Refresh Parallelism in Power BI Premium

Power BI Cleaner Gen2

Optimization Parameters in Oracle 19c