Press "Enter" to skip to content

Day: October 6, 2022

Variable Definition and Programmatic ggplot2

Sebastian Sauer takes us through an interesting scenario.

No lede here because it’s almost 100% code and headers. A quick description of this is that we can see ways to parse columns in an R DataFrame and plot visuals without hard-coding the column name in our plot definition, using a variable instead.

And I had to rewrite the synopsis above because I used the data science term “variable” until hitting a wall when describing the programming term “variable.”

Comments closed

Debugging Code in R

Cosima Meyer explains how debugging works in R with RStudio:

Three basic commands in RStudio let you do the debugging: debug(function_name)browser(), and undebug(function_name).

With debug(function_name) you start the debugging of your function – it’s basically like a mole that digs in. When you’re in debug mode, you can also call the objects in your function.

Read the whole thing to learn the power of debugging beyond the print() statement. H/T R-Bloggers.

Comments closed

Kernel SHAP in R and Python

Michael Mayer and Christian Lorentzen team up:

SHAP is one of the most used model interpretation technique in Machine Learning. It decomposes predictions into additive contributions of the features in a fair way. For tree-based methods, the fast TreeSHAP algorithm exists. For general models, one has to resort to computationally expensive Monte-Carlo sampling or the faster Kernel SHAP algorithm. Kernel SHAP uses a regression trick to get the SHAP values of an observation with a comparably small number of calls to the predict function of the model. Still, it is much slower than TreeSHAP.

Read on to see how to do this in both R and Python. With libraries the way they are, the code is very similar and the results are basically the same.

Comments closed

Debouncing RMarkdown Input

Thomas Williams waits for the keystroke:

This R Markdown snippet demonstrates “debouncing”: waiting until a user stops changing an input, before updating dependent charts and tables. Debouncing is often used in web sites to prevent the user interface “jumping” as data is being entered, especially when the update takes a noticeable amount of time – for instance calling an API or database, or doing a calculation.

Read on to see an abridged example, as well as a link to the full version.

Comments closed

Diagnosing High CPU on SQL Server when It’s SQL Server’s Fault

Ajay Dwivedi continues a series on high CPU utilization:

In the last blog post Live Troubleshooting High CPU On SQL Server – Part 1, we worked on a scenario where we saw a high CPU on SQL Server due to some external OS level task. In this blog, we are going to explore a scenario where a high CPU issue is present because of the workload running on SQL Server.

Just like in the last blog post scenario, when I have to troubleshoot a “slow” SQL Server, if my SQL Server is baselined with the SQLMonitor tool, then I first visit my Monitoring – Live – All Servers dashboard which displays all the metrics of specific SQL Servers that need DBA help.

As normal, we see a scenario with SQLMonitor, as well as other options including sp_BlitzFirst.

Comments closed

Data Virtualization in SQL Server 2022

Hugo Queiroz provides an overview of data virtualization options in SQL Server 2022:

SQL Server 2022 now supports CSV, Parquet, and Deltafiles stored on Azure Storage Account v2, Azure Data Lake Storage Gen2, or any simple storage service (S3)–compliant object storage—the last as an on-premises offering or in the cloud. Finally, SQL Server 2022 can now use Create External Table as Select (CETAS), together with commands like OPENROWSETCreate External Table (CET), and all the new T-SQL enhancements. SQL Server 2022 is a powerful data hub.

The post doesn’t get too deep into the topic, though a search here will find you links to articles with concrete examples.

Comments closed

Lock Escalation Thresholds and Parallelism

Paul White complicates matters:

In part one of this series, I explained how the lock manager maintains a count of held locks per transaction, and individual access methods keep a separate lock count per HoBt for each plan operator performing data access.

Parallelism adds an extra layer of complexity because both lock counts are maintained per thread.

There are two underlying reasons for this:

Paul explains the reasons and shows off plenty of tests along the way.

Comments closed

Tracking Power BI Desktop Activity in SQL Server

Chris Webb looks in on things:

Something I do all the time when performance tuning Power BI is use SQL Server Profiler to monitor query and refresh activity. There’s a handy external tool that lets you open up Profiler with a connection to Power BI Desktop; if you’re using Power BI Premium you can also connect Profiler up to a published dataset using the XMLA Endpoint. Profiler is a bit old-school though and likely to lead to snarky comments from SQL Server DBAs (it’s not deprecated for Analysis Services though!) who’ll say you should be using Extended Events (aka xEvents) instead. And guess what – did you know you can use Analysis Services xEvents to monitor activity in Power BI Desktop?

It’s the power of Extended Events.

Comments closed