Press "Enter" to skip to content

Day: September 25, 2023

Generating Reproducible Reports with Jupyter and Quarto

Parisa Gregg and Myles Mitchell don’t need to copy and paste for their TPS reports:

Quarto is a free-to-use, open-source software based on Pandoc that enables users to convert plain text files into a range of formats, including PDF, HTML and powerpoint presentations. These documents can contain a mixture of narrative text, Python code, and figures that are dynamically generated by the embedded code.

This has many use-cases:

  • Your company may have a weekly board meeting to go over the latest sales figures. By having a Quarto presentation that pulls in the latest company sales data, you can regenerate the presentation slides each week at the click of a button.
  • As a researcher you may be preparing a report for publication. By having the code that generates data tables and figures embedded within the report, regenerating the draft as the experimental data floods in is a breeze!

Read on for a fun example of how you could automated a research-driven report.

Comments closed

Creating Confidence Intervals on a Linear Model in R

Steven Sanderson goes frequentist on us:

Linear regression is a fundamental statistical technique used to model the relationship between a dependent variable and one or more independent variables. While fitting a linear model is relatively straightforward in R, it’s also essential to understand the uncertainty associated with our model’s predictions. One way to visualize this uncertainty is by creating confidence intervals around the regression line. In this blog post, we’ll walk through how to perform linear regression and plot confidence intervals using base R with the popular Iris dataset.

Click through to see how, even if you’re a Bayesian who considers confidence intervals to overstate precision in reality.

Comments closed

Maintaining Existing Power BI Data while Loading More with Fabric

Chris Webb looks back on an older post:

To be honest I’m slightly ashamed of this fact because, as I say in the post, the solution I describe is a bit of a hack – but at the same time, the post is popular because a lot of people have the problem of needing to add new data to the data that’s already there in their Power BI dataset and there’s no obvious way of doing that. As I also say in that post, the best solution is to stage the data in a relational database or some other store outside Power BI so you have a copy of all the data if you ever need to do a full refresh of your Power BI dataset.

Why revisit this subject? Well, with Fabric it’s now much easier for you as a Power BI developer to build that place to store a full copy of your data outside your Power BI dataset and solve this problem properly.

Read on for an example of the new solution.

Comments closed

Dynamic Highlighting of Data Points Based on Slicer Selection

Nikola Ilic shines a light on the data:

To quickly explain: when a user selects, for example, Contoso in the slicer, the Contoso bar should be highlighted by using a different color. As much as this sounds like a very basic and common business request, there is no straightforward solution in Power BI (or, at least, I’m not aware of it:))

However, the client’s wish is (almost) always our command – so, let’s see how this feature can be implemented with a little bit of data model tweaking and leveraging some DAX code.

Spoilers: there is a solution, though it does involve quite a few steps.

Comments closed

Attacks on Row-Level Security

Ben Johnston continues a series on row-level security in SQL Server:

As mentioned in previous sections, RLS is an addition to security and should not be used as the primary method to limit access to data. It is a supplementary layer, useful in specific scenarios. There are also instances where RLS can be defeated by an unauthorized user. The attacks listed below are broken down into direct attacks, indirect attacks, and side-channel attacks. The categorizations could be changed, but the important part of each is the vulnerability discussed.

The one scenario I’m a bit surprised about is the divide by zero attack, as I had figured the filter predicate would apply before the computation leading to a divide by zero scenario.

Comments closed

Using DAX to Find Products Missing Sales

Marco Russo and Alberto Ferrari observe the dog that didn’t bark:

What products did not sell in a specific area, store, or time period? This may be an important analysis for several businesses. There are multiple ways to obtain the desired result. Some specific implementations might be needed because of the user or model requirements, whereas developers can choose any formula in several cases. Or you might just find a solution on the web and blindly implement it without questioning whether there is a better way to achieve what you want.

It turns out that different formulas perform very differently. Choosing the right one in your scenario can make a slow report fast. This article analyzes the performance of different formulations of one same algorithm. 

It’s interesting to see the performance profile here: most are reasonably close together, although you can still get a 2x gain from using the fastest approach versus the second-slowest. And then there’s the slowpoke.

Comments closed