Press "Enter" to skip to content

Month: August 2021

Journey before Destination and Disagreeing with Your Heroes

Mark Seemann reasons through a difficult problem:

Perhaps we’re really talking past each other. Perhaps we’re trying to solve different problems, and thereby arrive at different solutions.

I can only guess at the kinds of problems that my heroes think of when they prefer dynamic languages, and I don’t want to misrepresent them. What I can do, however, is outline the kind of problem that I typically have in mind.

I’ve spent much of my career trying to balance sustainability with correctness. I consider correctness as a prerequisite for all code. As Gerald Weinberg implies, if a program doesn’t have to work, anything goes. Thus, sustainability is a major focus for me: how do we develop software that can sustain our organisation now and in the future? How do we structure and organise code so that future change is possible?

Reading Mark’s essay makes me want to break out my copy of Tractatus Logico-Philosophicus but let’s not go crazy here… This is a good reminder, however, that incentives (implicit as well as explicit), experiences, and a host of other factors which make it really difficult to say conclusively “X is a better solution than Y” without laying out the specific premises.

Comments closed

Power BI and Information Protection in the Enterprise

Marc Lelijveld continues a series on taking Power BI to the enterprise level:

On to the next topic, which is super important for global, enterprise grade solutions if you ask me. Security and information protection! The fifth blog in the series of transforming local into global Power BI solutions.

Good to know up front is that information protection is a feature that not specifically related to Power BI, but also applies to other Microsoft products and services. Though, in this blog I will explain the functionality of information protection, also known as sensitivity labels and how this applies to Power BI. Let’s first look at what the functionality is, where you can apply it and why it is important.

Read on for more information concerning sensitivity labels and where they become useful.

Comments closed

When Query Cost Goes Astray

Erik Darling warns that man cannot live on query cost alone:

There are also rather misguided efforts to figure out parallelism settings based on plan costs. The main problem with that being that if you currently have a lot of parallel queries, all that means is that the estimated cost of the serial plan was higher than your current Cost Threshold For Parallelism setting, and the cost of the parallel plan was less than the cost of the serial plan.

If you increase Cost Threshold For Parallelism, you may very well still end up with a parallel plan, because the serial version was still more expensive. If you eventually change Cost Threshold For Parallelism to the point where some queries are no longer eligible for parallelism, you may eventually find yourself unhappy with the performance of the serial version of the query plan.

Read on for Erik’s take on query cost, which is a good one.

Comments closed

Row Level Security in Azure Analysis Services and Power BI PPU

Gilbert Quevauvilliers continues a series on moving from Azure Analysis Services to Power BI Premium Per User:

In this blog post I am going to cover how to implement Row Level Security (RSL) when using AAS and how this can be done on PPU.

In my example below I am going to show creating to simple RLS roles which will limit data for the users who belong to 2 roles.

Despite the simplification, we can see how row-level security applies to both products and how the two differ.

Comments closed

DATE_BUCKET() and an Alternative

Itzik Ben-Gan takes us through the DATE_BUCKET() function:

Bucketizing date and time data involves organizing data in groups representing fixed intervals of time for analytical purposes. Often the input is time series data stored in a table where the rows represent measurements taken at regular time intervals. For example, the measurements could be temperature and humidity readings taken every 5 minutes, and you want to group the data using hourly buckets and compute aggregates like average per hour. Even though time series data is a common source for bucket-based analysis, the concept is just as relevant to any data that involves date and time attributes and associated measures. For example, you might want to organize sales data in fiscal year buckets and compute aggregates like the total sales value per fiscal year. In this article, I cover two methods for bucketizing date and time data. One is using a function called DATE_BUCKET, which at the time of writing is only available in Azure SQL Edge. Another is using a custom calculation that emulates the DATE_BUCKET function, which you can use in any version, edition, and flavor of SQL Server and Azure SQL Database.

DATE_BUCKET() is something I’d like to see in the next version of SQL Server on-premises. There are some peculiarities to how it works and behavior isn’t always exactly what you’d expect, but it does accomplish what it sets out to do.

Comments closed

Diving into Prophet for Time Series Analysis

Dan Lantos continues a series on the Prophet library:

These plots give us a little insight into how the model is formed. The trend plot (top) exhibits a linear, piecewise function, with approximately appropriate values for our dataset throughout the years. This looks to be a baseline for predictions.
The weekly plot (middle) demonstrates some interesting behaviour – weekdays have a small negative impact on the predictions (approximately -50), and we see large spikes for the weekends. This appears peculiar, as we have no weekend data in our dataset, but it is a product of fitting a 7-day periodic function to only 5 days of data. Thankfully, this won’t be an issue as we have no need to forecast weekends.
The yearly plot (bottom) shows a much more volatile impact on predictions (-200 to +180) with frequent changepoints throughout. This points to a more sensitive and complex relationship between the time of year and the FTSE100 index than the day of the week.

If you’re already familiar with techniques like ARMA or ARIMA, this post will let you see immediately what the key differences are.

Comments closed

Logic Branching in Powershell

Robert Cain continues to have fun with Powershell:

In this post, we’ll see how some of PowerShell’s logic branching works. Logic branching is where you execute certain lines of code only when a condition is true.

For all of the examples, we’ll display the code, then under it the result of our code. In this article I’ll be using PowerShell Core, 7.1.3, and VSCode. The examples should work in PowerShell 5.1 in the PowerShell IDE, although they’ve not been tested there.

As always, Robert has a large number of examples here to illuminate the process.

Comments closed

Finding Procedures Using SELECT *

Michael J. Swart hunts for the real performance killer:

I have trouble with procedures that use SELECT *. They are often not “Blue-Green safe“. In other words, if a procedure has a query that uses SELECT * then I can’t change the underlying tables can’t change without causing some tricky deployment issues. (The same is not true for ad hoc queries from the application).

I also have a lot of procedures to look at (about 5000) and I’d like to find the procedures that use SELECT *.
I want to maybe ignore SELECT * when selecting from a subquery with a well-defined column list.
I also want to maybe include related queries like OUTPUT inserted.*.

Read on to see Michael’s strategy for attacking the problem while not including benign instances of it (such as WHERE EXISTS (SELECT * ...), which won’t cause any issues because the database engine doesn’t expand that wildcard).

Comments closed

Scaling ADF and Synapse Analytics Pipelines

Paul Andrew has a process for us:

Back in May 2020 I wrote a blog post about ‘When You Should Use Multiple Azure Data Factory’s‘. Following on from this post with a full year+ now passed and having implemented many more data platform solutions for some crazy massive (technical term) enterprise customers I’ve been reflecting on these scenario’s. Specifically considering:

– The use of having multiple regional Data Factory instances and integration runtime services.

– The decoupling of wider orchestration processes from workers.

Furthermore, to supplement this understanding and for added context, in December 2020 I wrote about Data Factory Activity Concurrency Limits – What Happens Next? and Pipelines – Understanding Internal vs External Activities. Both of which now add to a much clearer picture regarding the ability to scale pipelines for the purposes of large-scale extraction and transformation processes.

Read on for details about the scenario, as well as a design pattern to explain the process. This is a large solution for a large-scale problem.

Comments closed