Kevin Feasel – Page 321

Selecting the Top N Values by Group in R

Published 2024-04-25 by Kevin Feasel

In data analysis, there often arises a need to extract the top N values within each group of a dataset. Whether you’re dealing with sales data, survey responses, or any other type of grouped data, identifying the top performers or outliers within each group can provide valuable insights. In this tutorial, we’ll explore how to accomplish this task using three popular R packages: dplyr, data.table, and base R. By the end of this guide, you’ll have a solid understanding of various approaches to selecting top N values by group in R.

Read on for the three examples.

Comments closed

IAsyncEnumerable in C# 8

Published 2024-04-25 by Kevin Feasel

Camilo Reyes shows off an interface:

The IAsyncEnumerable interface was introduced to address the limitations of the IEnumerable interface and the Task class. This way, you can stream asynchronous data and process it efficiently as soon as it becomes available.

In this take, you will learn how to work with IAsyncEnumerable to asynchronously stream a big table and extract the data in a ETL process. You will also learn the difference between IAsyncEnumerable and IEnumerable and how to use IAsyncEnumerable in your everyday work. Then, you will look at comparisons between the two different approaches and why one is better than the other in certain scenarios.

Read on for a demonstration and dive into how IAsyncEnumerable implementations compare to IEnumerable in terms of memory utilization.

Comments closed

Updates in R 4.4.0

Published 2024-04-25 by Kevin Feasel

Russ Hyde shares items of interest:

R 4.4.0 (“Puppy Cup”) was released on the 24th April 2024 and it is a beauty. In time-honoured tradition, here we summarise some of the changes that caught our eyes. R 4.4.0 introduces some cool features (one of which is experimental) and makes one of our favourite {rlang} operators available in base R. There are a few things you might need to be aware of regarding handling NULL and complex values.

The full changelog can be found at the r-release ‘NEWS’ page and if you want to keep up to date with developments in base R, have a look at the r-devel ‘NEWS’ page.

Read on for a big note on tail-call recursion, an operator for coalescing, and a few more neat features.

Comments closed

Troubleshooting sp_getapplock Blocking

Published 2024-04-25 by Kevin Feasel

Brent Ozar has an epiphany:

I’ll give you an example. A client came to me because they were struggling with sporadic performance problems in Azure SQL DB, and nothing seemed to make sense:

sp_BlitzFirst @SinceStartup = 1 showed very clearly that their top wait, by a long shot, was blocking. Hundreds of hours of it in a week.

sp_BlitzIndex showed the “Aggressive Indexes” warning on a single table, but… only tens of minutes of locking, nowhere near the level the database was seeing overall.

sp_BlitzCache @SortOrder = ‘duration’ showed a couple queries with the “Long Running, Low CPU” warning, and they did indeed have blocking involved, but … pretty minor stuff. Plus, their plan cache was nearly useless due to a ton of unparameterized queries pouring through constantly, overwhelming Azure SQL DB’s limited plan cache.

sp_Blitz wasn’t reporting any deadlocks, either. (sp_BlitzLock doesn’t work in Azure SQL DB at the moment because Microsoft’s no longer running the default system health XE session up there. They turned that off in order to save money on hosting costs, and passed the savings on to… wait… hmm)

As a last-ditch hail-Mary, I ran sp_BlitzWho repeatedly, trying to catch the blocking happening in action. No dice – the odds that I’d catch it live weren’t great anyway.

Click through for the story and how all the pieces ultimately fit together.

Comments closed

Copying Azure SQL Managed Instance Databases

Published 2024-04-25 by Kevin Feasel

Scott Klein performs a migration:

So, back to our customer. They essentially lifted and shifted their on-premises databases to Azure SQL Managed Instance and have been using it successfully for nearly two years. Again, this is awesome.

Last week they came to us and asked about reporting with Managed Instance. They were looking at data marts and data warehouses, but we needed more information. It turns out they have some people that just want the ability to query the databases, and potentially hook up Excel to these databases for data analysis.

The caveat is that the people I was talking to didn’t want to give the other group direct access to the production environment. Toootally get that. Yeah, like 100% get it. So, what are the options?

Read on for the solution Scott came up with.

Comments closed

Exporting and Sharing Power BI Reports in Fabric

Published 2024-04-25 by Kevin Feasel

Sandeep Pawar distributes PDFs like candy:

With the proposed solution below, you will be able to :

Export a Power BI report, or a page of a report or a specific visual from any page as a PDF, PNG, PPTX or other supported file formats

Apply report level filters before exporting

Automate the extracts on a schedule

Save the exported reports to specific folders

Grant access to individual folders in the Lakehouse

Click through for the solution.

Comments closed

Logical Reads and Query Tuning

Published 2024-04-25 by Kevin Feasel

Erik Darling doesn’t focus on logical reads:

To summarize the video a little bit:

High average or total logical reads isn’t a guarantee that a query is slow

Looking for high average CPU and duration queries is a better metric

You may see logical reads go up or down as you make queries faster

For I/O bound workloads, you’re better off looking for queries with a lot of physical reads

I agree with Erik. Disk has gotten so much faster, especially if you’re on all-flash arrays or (even better) direct attached nVME storage. Even relatively old SSDs are still a couple orders of magnitude faster than the spinning rust we typically dealt with 15 years ago.

Sometimes, the faster query is one that requires more logical reads. APPLY-based queries typically fall into that category: I might have 10x as many logical reads but the query takes half the time (or less) to finish using a similar percentage reduction of CPU time.

Comments closed

An Overview of GitHub Advanced Security

Published 2024-04-24 by Kevin Feasel

I finish up a series on GitHub:

In this video, I show off some of the GitHub Advanced Security functionality available to open-source public repositories, and describe what you can get with GitHub Enterprise Cloud.

It’s been fun going through various features in GitHub, but next week, I’ll start a brand new series.

Comments closed

Three Layers of Azure Data Factory Framework Components

Published 2024-04-24 by Kevin Feasel

Martin Schoombee continues a series on orchestration in Azure Data Factory:

Before we dive into the details of the Data Factory pipelines, it is worth explaining the conceptual structure of my framework and its components. How it all fits together is important, and after reading the post on the metadata as well the pieces of the puzzle will hopefully start falling into place.

When I started thinking about what I’d like the framework to do, three conceptual layers started to emerge and we’ll review them from the bottom up:

Click through for the description of each layer.

Comments closed

Thoughts on Multi-Platform Visual Studio

Published 2024-04-24 by Kevin Feasel

The NDepend Blog has an interesting post on Visual Studio:

Notice that there are two distinct questions:

Will Visual Studio Be Migrated to .NET Core? (but still remain Windows-centric).

Will Visual Studio Be Multi-Platform?

Disclaimer: For nearly two decades, I’ve been the creator and lead developer of NDepend, a tool that functions both as a Visual Studio extension and a standalone app that can collaborate with VSCode and JetBrains Rider. NDepend also integrates into CI/CD pipelines to deliver interactive web reports on .NET code quality and security. We’ve collaborated with many Visual Studio engineers over the years, who have provided invaluable assistance, though we have no insider information. This post is based purely on observable facts and speculations derived from them.

Read on for those thoughts. My money’s on “No” for the latter, as that’s the purview of Visual Studio Code, and just thinking about the sheer amount of work necessary to make Visual Studio itself cross-platform, I can’t see it happening.

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Author: Kevin Feasel