Press "Enter" to skip to content

Curated SQL Posts

Highlighting Scatter Charts In Power BI

Jason Thomas has a great post showing how to implement highlighting of scatter/bubble charts in Power BI:

That said, there is one feature from my previous blog that was not implemented in Power BI – highlighting scatter/bubble charts. In Power BI, the scatter charts are not considered as area charts and hence you can only filter them and not highlight. This feature is useful when you have a lot of data points in your scatter chart and you want to see where a particular data point is with respect to the other data points. That said, you can make use of some nifty DAX and replicate the same behavior.

There are several steps to the process so it’s not point-and-click easy, but Jason has a nice walkthrough showing how to set it up.

Comments closed

Reserved Memory Allocation Waits And Trace Flag 834

Joe Obbish has another post looking at sub-optimal columnstore index performance:

It is possible to see a scalability bottleneck in the form of high average wait time for the RESERVED_MEMORY_ALLOCATION_EXT wait if a highly concurrent workload is run on a server that consumes memory with batch mode operators. I believe that the severity of the bottleneck depends on unknown factors in the server’s initial memory state and the rate of memory actually used by queries to run batch mode operations. This blog post shares a reproduction of the issue along with a call to action.

If you use clustered columnstore indexes, check out Joe’s User Voice entry.

Comments closed

vtreat

John Mount explains the vtreat package that he and Nina Zumel have put together:

When attempting predictive modeling with real-world data you quicklyrun into difficulties beyond what is typically emphasized in machine learning coursework:

  • Missing, invalid, or out of range values.
  • Categorical variables with large sets of possible levels.
  • Novel categorical levels discovered during test, cross-validation, or model application/deployment.
  • Large numbers of columns to consider as potential modeling variables (both statistically hazardous and time consuming).
  • Nested model bias poisoning results in non-trivial data processing pipelines.

Any one of these issues can add to project time and decrease the predictive power and reliability of a machine learning project. Many real world projects encounter all of these issues, which are often ignored leading to degraded performance in production.

vtreat systematically and correctly deals with all of the above issues in a documented, automated, parallel, and statistically sound manner.

That’s immediately going onto my learn-more list.

Comments closed

R 3.4.4 Now Available

David Smith notes that R 3.4.4 is now generally available:

R 3.4.4 has been released, and binaries for Windows, Mac, Linux and now available for download on CRAN. This update (codenamed “Someone to Lean On” — likely a Peanuts reference, though I couldn’t find which one with a quick search) is a minor bugfix release, and shouldn’t cause any compatibility issues with scripts or packages written for prior versions of R in the 3.4.x series.

Read on to see the change list.

Comments closed

Window Functions In SQL

Eleni Markou explains what window functions are:

What we want is a table with an extra column which will represent the average price of all products belonging to the same category as the one on the current line.

One approach to solve this problem is to calculate the average price per category using an aggregate function and then join the result with the initial table over the Product Type column in order to get a new table looking at which you can easily find out if a product is more expensive than the average of its category.

Although this would definitely do the job, the query would be quite complicated and lengthy and may lack readability. To avoid these, an alternative approach would be to make use of window function where there is no need to mess with subqueries and joins. When using a windowed function, you can retrieve both aggregated and non-aggregated values at the same time while when using GROUP BY you can get only the results grouped into a single output row.

I ask questions about window (or windowing) functions whenever I interview someone for a job.  They are extremely useful things, and I highly recommend Itzik Ben-Gan’s windowing functions book for SQL Server 2012 if you want to learn a lot more.

Comments closed

Database Migration With dbatools

Jess Pomfret shows how easy it is to migrate databases from one SQL Server instance to another using dbatools:

Now that there are no connections we can move the database.  Depending on the situation it might be worth setting the database to read only or single user mode first. In my case, I had the application taken down so I felt confident no connections would be coming in.

With one line of code we can select the source and destination servers, the database name, specify that we want to use the backup and restore method, and then provide the path to a file share that both instance service accounts have access to:

The whole process is just five lines of code, so it could hardly be easier.

Comments closed

ORIGINAL_DB_NAME()

Kenneth Fisher explains a couple of database name functions in SQL Server:

I’d never seen ORIGINAL_DB_NAME until recently and I thought it would be interesting to highlight it out, and in particular the difference between it and DB_NAME. I use DB_NAME and DB_ID fairly frequently in support queries (for example what database context is a query running from or what database are given DB files from). So starting with DB_NAME.

Click through to know when to use each.

Comments closed

Victimless Deadlocks And SSMS

Michael J. Swart shows a scenario where the deadlock graph fails to open in SQL Server Management Studio:

I recently got this error in Management Studio when trying to view a deadlock graph that was collected with an extended events session:

Failed to initialize deadlock control.
Key cannot be null.
Parameter name: key

I found this error in a session that included the xml_deadlock_report event.

Read on for more information, and do check the comments where Lonny Niederstadt points out that even a victimless deadlocking scenario can have an ultimate victim:  performance.

Comments closed

Event Sourcing On Kafka

Adam Warski shows how you can use Apache Kafka as your event sourcing data source:

There’s a number of great introductory articles, so this is going to be a very brief introduction. With event sourcing, instead of storing the “current” state of the entities that are used in our system, we store a stream of events that relate to these entities. Each event is a fact, it describes a state change that occurred to the entity (past tense!). As we all know, facts are indisputable and immutable. For example, suppose we had an application that saved a customer’s details. If we took an event sourcing approach, we would store every change made to that customer’s information as a stream, with the current state derived from a composition of the changes, much like a version control system does. Each individual change record in that stream would be an immutable, indisputable fact.

Having a stream of such events, it’s possible to find out what’s the current state of an entity by folding all events relating to that entity; note, however, that it’s not possible the other way round — when storing the current state only, we discard a lot of valuable historical information.

Event sourcing can peacefully co-exist with more traditional ways of storing state. A system typically handles a number of entity types (e.g. users, orders, products, …), and it’s quite possible that event sourcing is beneficial for only some of them. It’s important to remember that it’s not an all-or-nothing choice, but an additional possibility when it comes to choosing how state is managed in our application.

It’s a helpful article and works hand in hand with a CQRS pattern.

Comments closed