Sorting Data In Scala

Randhir Singh walks us through several methods of sorting in Scala:


Here is signature

def sortBy[B](f: A => B)(implicit ord: Ordering[B]): Repr 

The sortBy function is used to sort one or more attributes.
Here is a small example.
sort based on a single attribute of the case class.

Click through for several examples.

Overview: U-SQL Database Projects

Zach Stagers gives us an overview of the new U-SQL Database Project structure:

Source Control

The projects integrates much more nicely with TFS than the older “U-SQL Project” does.

It actually gives you the icons (padlock, check mark, etc..) in the solution explorer, so it actually looks like it’s under source control!

Something that I’d really hoped had been fixed, but hasn’t, is when copying and renaming an existing item, it doesn’t recognize the rename. You have to undo the checkout of the non-existent object (the copy, before being renamed):

Read on for more improvements.

The Intersection Of Multiple Averages Is The Empty Set

Adi Gaskell argues that we shouldn’t get too wrapped up in “average” behaviors:

I’ve written extensively about the tremendous potential for big data in healthcare to drive enormous changes in how we keep people healthy for longer. It goes without saying however that all data is not created equal, and just having a large sample is not always sufficient to get the best insights.

If we needed reminding, a reminder comes via a recent study from the University of California, Berkeley. It suggests that things like emotion, behavior, and physiology vary hugely between individuals, therefore having an average over a large dataset can still produce a ‘norm’ that is wide of the mark for individuals.

“If you want to know what individuals feel or how they become sick, you have to conduct research on individuals, not on groups,” the researchers say. “Diseases, mental disorders, emotions, and behaviors are expressed within individual people, over time. A snapshot of many people at one moment in time can’t capture these phenomena.”

Variance is important.

Building An Extended Events Session

Aamir Syed gives us a simple example of using the Extended Events UI to create a new session:

Many of us have not made the effort to switch from profiler to Extended events.  It’s 2018, if you haven’t found a few hours to learn about this incredibly powerful tool, I urge you to do so now.

I’m going to provide a quick means of tracking queries with extended events. This is not an example of how comprehensive this is, but I hope that it atleast spurs some interest.

One of the main reasons we use profiler is to quickly capture some real time data. I’m going to not only show you how to do that with extended events, but this same session can be a historical view as it’s so easy to sift through and filter through the data. (No you don’t have to create a table for the result sets ala profiler).

Click through for step-by-step instructions.

Azure SQL Database Service By Purchase Model

Glenn Berry explains the two purchase models available with Azure SQL Database, as well as the various service tiers within each model:

The older pricing option is the DTU-based SQL purchase model, where a fixed set of resources is assigned to the database from three performance tiers, which are Basic, Standard, and Premium.

For Standard and Premium, there are multiple service tiers, which are classified according to how many Database Transaction Units (DTUs) they provide (along with their included storage and maximum available storage). The Premium tier is designed for I/O intensive workloads, and is fault-tolerant.

The Database Transaction Unit (DTU) is based on a blended measure of CPU, memory, along with storage reads and writes. The DTU-based performance levels represent preconfigured bundles of compute, memory, and storage resources designed to drive different levels of application performance. If you do not want to worry about the underlying resources and prefer the simplicity of a preconfigured resource bundle while paying a fixed amount each month, you may find the DTU-based model more suitable for your needs and easier to understand.

Glenn does a good job clearing up some of the complications around pricing for Azure SQL Database.

Configuring An Azure Runbook For Index Maintenance

Jim Donahoe explains how to perform index and statistics maintenance for Azure SQL Database, where you don’t have SQL Agent available:

I had a lot of issues when I created my first one, and after discussing with some folks, they had the same issues.  I searched for the best blog posts that I could find on the subject, and the one I LOVED the most was here: Arctic DBA.  He broke it down so simply, that I finally created my own pseudo installer and I wanted to share it with all of you.  Please, bear in mind, these code snippets may fail at anytime due to changes in Azure.


These next steps assume the following:

You have created/configured your Azure Automation Account and credential to use to execute this runbook.

Read on for a reasonably short Powershell script and a modified version of Ola Hallengren’s index maintenance procedures.

Using The system_health Extended Event Session

Matthew McGiffen walks us through what the system_health Extended Events session gives us:

When Microsoft introduced Extended Events (XE) in 2008, they also gave us a built-in XE session called system_health.

This is a great little tool. I mainly use it for troubleshooting deadlocks as it logs all the information for any deadlocks that occur. No more having to mess about making sure specific trace flags are enabled to ensure deadlock information is captured in the error log.

It also captures the SQL text and Session Id (along with other relevant data) in a number of other scenarios you may need to troubleshoot:

  • Where an error over severity 20 is encountered

  • Where a session has waited on a latch for over 15 seconds

  • Where a session has waited on a lock for over 30 seconds

  • Sessions that have encountered other long waits (the threshold varies by wait type)

Simply knowing what this session includes can give you a leg up on troubleshooting, especially when it’s a machine you haven’t seen before.

Wanted: Per-Database Wait Stat Collection Built In

Erik Darling wants configurable wait stat collections on a database level built into SQL Server:

I’m hoping that a feature like this could solve some intermediate problems that Query Store doesn’t.

Namely, being lower overhead, not collecting any PII, and not taking up a lot of disk space — after all, we’re not storing any massive stored proc text or query plans, here, just snapshots of wait stats.

This will help even if you’re already logging wait stats on your own. You still don’t have a clear picture of which database the problem is coming from. If you’ve got a server with lots of databases on it, figuring that out can be tough.

Understanding what waits (and perhaps bottlenecks) a single database is experiencing can also help admins figure out what kind of instance size they’d need as part of a migration, too.

It’s an interesting approach.  If you agree with Erik, go vote.

Performance Test: Loading CSV Versus Loading Excel In Power Query

Chris Webb lays out a performance test which shows how quickly Power Query can read data from a CSV versus from an Excel spreadsheet:

The black line in the graph above is the amount of data read (actually the offset values showing where in the file the data is read from, which is the same thing as a running total when Power Query is reading all the data) from the Excel file; the green line is the amount of data read from the CSV file (the same data shown in the first graph above). A few things to mention:

  • Running Process Monitor while this second query was refreshing had a noticeable impact on its performance – in fact it was almost 20 seconds slower

  • The initial values of 80 million bytes seem to be where data is read from the end of the Excel file. Maybe this is Power Query reading some file metadata? Anyway, it seems as though it takes 5 seconds before it starts to read the data needed by the query.

  • There’s a plateau between the 10 and 20 second mark where not much is happening; this didn’t happen consistently and may have been connected to the fact that Process Monitor was running

The results were remarkable; check them out.


August 2018
« Jul