Press "Enter" to skip to content

Day: April 25, 2024

Selecting the Top N Values by Group in R

Steven Sanderson searches for subsets:

In data analysis, there often arises a need to extract the top N values within each group of a dataset. Whether you’re dealing with sales data, survey responses, or any other type of grouped data, identifying the top performers or outliers within each group can provide valuable insights. In this tutorial, we’ll explore how to accomplish this task using three popular R packages: dplyr, data.table, and base R. By the end of this guide, you’ll have a solid understanding of various approaches to selecting top N values by group in R.

Read on for the three examples.

Comments closed

Updates in R 4.4.0

Russ Hyde shares items of interest:

R 4.4.0 (“Puppy Cup”) was released on the 24th April 2024 and it is a beauty. In time-honoured tradition, here we summarise some of the changes that caught our eyes. R 4.4.0 introduces some cool features (one of which is experimental) and makes one of our favourite {rlang} operators available in base R. There are a few things you might need to be aware of regarding handling NULL and complex values.

The full changelog can be found at the r-release ‘NEWS’ page and if you want to keep up to date with developments in base R, have a look at the r-devel ‘NEWS’ page.

Read on for a big note on tail-call recursion, an operator for coalescing, and a few more neat features.

Comments closed

IAsyncEnumerable in C# 8

Camilo Reyes shows off an interface:

The IAsyncEnumerable interface was introduced to address the limitations of the IEnumerable interface and the Task class. This way, you can stream asynchronous data and process it efficiently as soon as it becomes available.

In this take, you will learn how to work with IAsyncEnumerable to asynchronously stream a big table and extract the data in a ETL process. You will also learn the difference between IAsyncEnumerable and IEnumerable and how to use IAsyncEnumerable in your everyday work. Then, you will look at comparisons between the two different approaches and why one is better than the other in certain scenarios.

Read on for a demonstration and dive into how IAsyncEnumerable implementations compare to IEnumerable in terms of memory utilization.

Comments closed

Copying Azure SQL Managed Instance Databases

Scott Klein performs a migration:

So, back to our customer. They essentially lifted and shifted their on-premises databases to Azure SQL Managed Instance and have been using it successfully for nearly two years. Again, this is awesome.

Last week they came to us and asked about reporting with Managed Instance. They were looking at data marts and data warehouses, but we needed more information. It turns out they have some people that just want the ability to query the databases, and potentially hook up Excel to these databases for data analysis.

The caveat is that the people I was talking to didn’t want to give the other group direct access to the production environment. Toootally get that. Yeah, like 100% get it. So, what are the options?

Read on for the solution Scott came up with.

Comments closed

Troubleshooting sp_getapplock Blocking

Brent Ozar has an epiphany:

I’ll give you an example. A client came to me because they were struggling with sporadic performance problems in Azure SQL DB, and nothing seemed to make sense:

  • sp_BlitzFirst @SinceStartup = 1 showed very clearly that their top wait, by a long shot, was blocking. Hundreds of hours of it in a week.
  • sp_BlitzIndex showed the “Aggressive Indexes” warning on a single table, but… only tens of minutes of locking, nowhere near the level the database was seeing overall.
  • sp_BlitzCache @SortOrder = ‘duration’ showed a couple queries with the “Long Running, Low CPU” warning, and they did indeed have blocking involved, but … pretty minor stuff. Plus, their plan cache was nearly useless due to a ton of unparameterized queries pouring through constantly, overwhelming Azure SQL DB’s limited plan cache.
  • sp_Blitz wasn’t reporting any deadlocks, either. (sp_BlitzLock doesn’t work in Azure SQL DB at the moment because Microsoft’s no longer running the default system health XE session up there. They turned that off in order to save money on hosting costs, and passed the savings on to… wait… hmm)
  • As a last-ditch hail-Mary, I ran sp_BlitzWho repeatedly, trying to catch the blocking happening in action. No dice – the odds that I’d catch it live weren’t great anyway.

Click through for the story and how all the pieces ultimately fit together.

Comments closed

Exporting and Sharing Power BI Reports in Fabric

Sandeep Pawar distributes PDFs like candy:

With the proposed solution below, you will be able to :

  • Export a Power BI report, or a page of a report or a specific visual from any page as a PDF, PNG, PPTX or other supported file formats
  • Apply report level filters before exporting
  • Automate the extracts on a schedule
  • Save the exported reports to specific folders
  • Grant access to individual folders in the Lakehouse

Click through for the solution.

Comments closed

Logical Reads and Query Tuning

Erik Darling doesn’t focus on logical reads:

To summarize the video a little bit:

  • High average or total logical reads isn’t a guarantee that a query is slow
  • Looking for high average CPU and duration queries is a better metric
  • You may see logical reads go up or down as you make queries faster
  • For I/O bound workloads, you’re better off looking for queries with a lot of physical reads

I agree with Erik. Disk has gotten so much faster, especially if you’re on all-flash arrays or (even better) direct attached nVME storage. Even relatively old SSDs are still a couple orders of magnitude faster than the spinning rust we typically dealt with 15 years ago.

Sometimes, the faster query is one that requires more logical reads. APPLY-based queries typically fall into that category: I might have 10x as many logical reads but the query takes half the time (or less) to finish using a similar percentage reduction of CPU time.

Comments closed