Press "Enter" to skip to content

Curated SQL Posts

Incremental Refresh of Any Power BI Data Source

Gilbert Quevauvilliers wants incrementally to refresh all the sources:

The pattern that I am talking about is the following below, which will be used as my example below.

– Connect to a data source which can query fold.
> In my example I have installed and configured an Azure SQL Serverless DB
> In this database I have a date table.
– Configure date table to use Incremental refreshing as per the blog post
> Incremental refresh in Power BI
– Create a function which will then use the Date value as part of the parameter
> In my example I have got CSV files which have Exchange rate information from Azure Blob Storage.
> The file names of the CSV files is the date.
– Invoke the Function within the date table to extract the required information.

I know what you might be thinking, that as soon as I add in the column with the function it breaks the Query Folding. That is what I thought too.

The great news is that Incremental refreshing DOES STILL WORK!

Read on for the demonstration.

Comments closed

Analysis versus Reporting and Power BI

Rob Collie thinks about industry movement between analysis and reporting. Part one gives us some backstory:

Excel was about to make a large investment in BI-related capabilities, and the powers that be had selected me to lead our part in it. I was excited, but now I needed a crash course in “what the hell is BI?” I was given multiple tutors, and they all were quick to introduce the concept of Analysis versus Reporting. The “versus” seemed to be pretty important. It wasn’t an “and” – no, the “versus” was chosen deliberately in these sermons. You see, these were Two Very Different Things.

I struggled mightily to grasp this difference. I was told that interactive things like PivotTables were Analysis tools – NOT Reporting tools! Reports were something completely different. “But,” I pointed out, “they’re called ‘Insert PivotTable Report’ on the Excel menu today!” (This was Excel 2003). “Yeah,” said the mentors, “…we might want to fix that.”

Part two explains why analysis and reporting are both important:

Another “meta characteristic” of paginated reports is that they TEND to display details rather than aggregations. EX: specific transactions rather than emergent trends. In paginated reports, you’re MORE likely (but not guaranteed!) to be looking at “raw” rows of data from the original database, whereas in a Power BI report, you’re more likely (but again, not guaranteed!) to NOT be seeing raw individual rows, but rather intelligent aggregations of MANY rows. But either way, more detail means you’re more likely to need multiple pages.

Rob’s right on the money. And I’m looking forward to part three of the series.

Comments closed

Altering the Database without Rolling Back Users

Kenneth Fisher wants to change a database:

If this strikes a bit too close to home for you then you need to look at the ROLLBACK clause. It’s great for killing and rolling back all of the current connections before making my change.

But this is a pretty sensitive app and if there’s something running I have to let it finish. No ROLLBACK allowed. But I’m also not going to wait forever to see if my alter is going to happen. Turns out there is a nice easy option for this too.

Click through to see the option, as well as the message you get if it can’t work immediately.

Comments closed

One-Column Fusion with DAX

Phil Seamark has a performance tuning tip for DAX:

I wrote an article about an optimisation called DAX Fusion that attempts to fuse similar SE calls when it can. This article highlights an elegant DAX-based trick that works for a specific scenario by reducing the number of SE calls that doesn’t rely on DAX fusion. The difference between 1-Column fusion and the other is:

– DAX fusion in the engine works across multiple columns that have the same effective WHERE clause
– 1-Column fusion works by fusing multiple measures that all reference a single column, but with different WHERE clauses

Read on to learn how to switch around a bit of DAX to reduce the number of storage engine calls, as well as an example of one scenario in which it can come in handy.

Comments closed

How SQL Server Stores Floating Point Types

Randolph West continues a series on SQL Server data type storage:

If an integer or decimal amount is a precise representation of a value, a floating point is the closest approximation of that value in binary. Programming languages and databases use floating point numbers to trade storage (and memory) costs against precision. A floating point value is imprecise, but even that is underselling the problem.

Randolph also breaks all of the rules and writes out the largest FLOAT value you can have.

Comments closed

Using Random Cut Forests for Anomaly Detection

Chris Swierczewski and Lai Jiang have an example of using Random Cut Forests to perform anomaly detection against a dataset stored in Amazon Elasticsearch Service:

Based on these constraints and performance results from internal and publicly available benchmarks across many data domains, we chose the RCF algorithm for computing anomaly scores in data streams.

But this begs the question: How large of an anomaly score is large enough to declare the corresponding data point as an anomaly? The anomaly detector uses a thresholding model to answer this question. This thresholding model combines information from the anomaly scores observed thus far and certain mathematical properties of RCFs. This hybrid information approach allows the model to make anomaly predictions with a low false positive rate when relatively little data has been observed, and effectively adapts to the data in the long run. The model constructs an efficient sketch of the anomaly score distribution using the KLL Quantile Sketch algorithm. For more information, see Optimal Quantile Approximation in Streams.

The linked post is more of an explanation of process than a tutorial, but it’s interesting in seeing how different approaches can find anomalies at different rates.

Comments closed

Higher-Order Functions in Scala

Rahul Agarwal explains how higher-order functions make your life easier:

As a part of the functional programming paradigm, whatever logic we need to write is to be implemented in terms of pure and immutable functions. Here, functions take arguments from other functions as input and return values/functions which used by other functions for further processing. Here, pure means that the function does not produce any side-effects like printing to the console and immutable means that the function takes in and produces immutable data(val) only.

Higher-order functions comply with the above idea. As compared to for loops, we can iterate a data structure using higher-order functions with much less code.

The term “higher-order function” can sound a bit overwhelming if you’re completely unfamiliar, but it’s a pretty simple concept: a function which takes another function as (at least) one of its inputs. As Rahul points out, this is quite the useful concept.

Comments closed

Creating Data-Driven Power BI Report Subscriptions

John White shows how to create a data-driven subscription for a Power BI report:

One of the features that has never made the leap from SQL Server Reporting Services (SSRS) on-premises to the cloud is data-driven subscriptions. Users can subscribe to reports, but a data-driven subscription allows individual subscriptions to be stored in a central location and parameterized, while delivering the reports to multiple locations. This article will describe a pattern for accomplishing this using SharePoint lists as the subscription store, and Power Automate as the automation tool, for a no-code solution to this requirement.

The other alternative would be to use Power BI Report Server, but if you’re not using that, this is an interesting approach and solution.

Comments closed

Optimizing Derived Table Expressions

Itzik Ben-Gan continues a series on table expressions:

As mentioned, next month I’ll get to the details of unnesting of derived tables. For now, suffice to say that SQL Server normally does apply an unnesting/inlining process to derived tables, where it substitutes the nested queries with a query against the underlying base tables. Well, I’m oversimplifying a bit. It’s not like SQL Server literally converts the original T-SQL query string with the derived tables to a new query string without those; rather SQL Server applies transformations to an internal logical tree of operators, and the outcome is that effectively the derived tables typically get unnested. When you look at an execution plan for a query involving derived tables, you don’t see any mention of those because for most optimization purposes they don’t exist. You see access to the physical structures that hold the data for the underlying base tables (heap, B-tree rowstore indexes and columnstore indexes for disk-based tables and tree and hash indexes for memory optimized tables).

This article deserves a careful reading.

Comments closed