Press "Enter" to skip to content

Month: July 2023

Modularizing an Existing Shiny App

Peter Baranovskiy breaks it down:

There are multiple tutorials available online on writing modular Shiny apps. So why one more? Well, when I just started with building modular apps myself, these didn’t do much for me. So I really only learned how to write modules when I had an opportunity to team up with an experienced R Shiny developer. The reason I guess is that Shiny modules is an advanced topic, and you typically get to writing modules only when you finally need to scale your apps – and keep opportunities for further scaling open. This typically means when your app goes into production. By then you probably have already developed multiple apps, and switching over to a way of thinking required to write modules may be challenging. If you don’t know what modules are, I recommend starting here and then coming back to this post. Otherwise, read on.

So, I decided to try a different approach and instead of building a simple modular app from scratch, to go in the opposite direction by breaking down a complex real-life app into modules. Here’s the app’s original non-modular code. Note a single app.R file that contains the entire app. static_assets.R includes some object definitions which I moved to a separate file for convenience. calgary_crime_data_prep.R is not part of the app; it is a data retrieval and cleaning script executed once a month with cron. Running the script each time the app launches would have made it extremely slow and would use way too much bandwidth, as the script downloads and processes 150+ Mb of data on each run.

Read on for the reasoning behind using modules, as well as Peter’s notes on the process.

Comments closed

Bug in fn_xe_file_target_read_file

Erik Darling notes a bug:

SQL Server has had the fn_xe_file_target_read_file function for a while, but starting with SQL Server 2017, a column called timestamp_utc was added to the output.

Somewhat generally, it would be easier to filter event data out using this column… if it worked correctly. The alternative is to interrogate the underlying extended event XML timestamp data.

That’s… not fun.

Erik shows us the problem and also provides a workaround, as well as the Microsoft Feedback issue you can vote on to get this done sooner.

Comments closed

An Overview of Semantic Modeling in Microsoft Fabric

Teo Lachev talks semantic modeling:

In retrospect, I’d say I owe 50% of my BI career to Analysis Services and its flavors: Multidimensional, Tabular, and later Power BI. This is why I closely follow how this technology evolves. Fast forwarding to Fabric, there are no dramatic changes. Unlike the other two Fabric Engines (Lakehouse and Warehouse), Power BI datasets haven’t embraced the delta lake file format to store its data yet. The most significant change is the introduction of a new Direct Lake data access mode alongside the existing Import and DirectQuery.

Read on for Teo’s thoughts. I think there’s a good chance that the Bad/Ugly points will be eliminated by the time Fabric goes GA, though we’ll have to wait and see if that’s the case.

Comments closed

TreeMaps vs Pie Charts

Rita Fainshtein talks treemaps:

The main challenge arises from the difficulty in comparing the areas of the segments within the chart.

Consequently, pie charts and similar graphs become challenging to interpret.

Colors for categories don’t make it easy either since the brain tries to figure out how the color relates to the category, which in the example here just happens to be a random choice.

Let’s see if the Treemap helps and make it easier.

Rita notes that treemaps work well in a specific niche: hierarchical, categorical data. But within their niche, they work really well, which is more than you can say about pie charts…

Comments closed

Power BI Treemap Visual Desired Enhancements

Meagan Longoria has an airing of grievances:

I recently created a treemap in Power BI for a Workout Wednesday challenge. Originally, I had set out to make a different treemap, but I ran into some limitations with the visual. I ended up with the treemap below, which isn’t bad, but it made me realize that the treemap is in need of some improvements to make it really useful. So I decided to share my thoughts here.

Read on for Meagan’s thoughts on the existing treemap visual. I agree with all of Meagan’s points and would love to see this visual be updated.

Comments closed

Power BI Report Deployment with Connections to Shared Datasets

Rayis Imayev does some large-scale deployment:

Let’s say you have a collection of Power BI .pbix files stored in a git-based source control system (GitHub, Azure DevOps, or any other system). Among these files, one is your data model, while the others are Power BI visual reports and dashboards connected to the published dataset from your data model. Your published dataset is located in a separate workspace dedicated to shared content, and the visualization Power BI reports are placed in another workspace with appropriate permissions to access them.

Now, let’s consider an additional complexity: you have this collection of files not only in one development environment but also in two others. These environments support your Power BI reporting testing (UAT) and the release of your Power BI reports to end-users (Production).

The questions that arise are: How do you deploy your solution, and most importantly, how do you automate it?

Click through for an architectural diagram, as well as the answer to this question.

Comments closed

Analyzing Power BI Refresh Performance with Log Analytics

Chris Webb starts a new series:

If you’re tuning refresh for a Power BI Import mode dataset one of the areas you’ll be most interested in is throughput, that is to say how quickly Power BI can read data from the data source. It can be affected by a number of different factors: how quickly the data source can return data; network latency; the efficiency of the connector you’re using; any transformations in Power Query; the number of columns in the data and their data types; the amount of other objects in the same Power BI dataset being refreshed in parallel; and so on. How do you know if any or all of these factors is a problem for you? It’s a subject that has always interested me and now that Log Analytics for Power BI datasets is GA we have a powerful tool to analyse the data, so I thought I’d do some testing and write up my findings in a series of blog posts.

In the first post, Chris gives us an overview of information available and provides one way to query it.

Comments closed

Optimizing for Readability or Performance

Hugo Kornelis talks trade-offs:

But I wanted to contribute anyway. So here is a recent example of code that probably would have made me feel a way if I had been the type of person that gets emotional over code. Or put differently, here is the story of how I gained performance by reducing readability and maintainability.

For the record, and to prevent confusion, I am not going to name actual customers, nor name the ERP system used, and the description I give is highly abstracted away from the original problem, and heavily simplified as well. I describe the basis of what the issue was with the code I encountered and how I fixed it, but without revealing any protected information.

My internal motto is:

  • Start with simple, readable code
  • Move to more complex, faster performance in spots which are necessary
  • Document why the code is more complex with illuminating comments, so that way a future developer (including future you) won’t say, “What was this yokel thinking, doing this complicated thing when there’s an easy approach like this?”
Comments closed

Using the Azure Data Factory Self-Hosted Integration Runtime

Chen Hirsh hosts a runtime:

In Azure data factory (ADF), An integration runtime is a compute resource to run your pipelines on. When you run an application on your computer, it uses the computer resources, such as CPU and memory, to run its tasks. When you run activities in a pipeline in ADF, they also need resources to do their job, like copying data or writing a file, and these are provided by the integration runtime.

When you create an instance of ADF, you get a default integration runtime, hosted in the same region that you created ADF in. If you need, you can add your own integration runtimes, either on Azure, or you can download and install a self-hosted integration runtime (SHIR) on your own server.

Read on to understand when you would want to use a self-hosted integration runtime and the process to do so. This SHIR also applies to Synapse pipelines and is one of the few ways to move data out of a Synapse workspace with data exfiltration protection enabled.

Comments closed