Press "Enter" to skip to content

Curated SQL Posts

Wanted: Sorting Stacked Charts In Power BI

Melissa Connors would like to control stack order for a stacked chart in Power BI:

Currently, it’s displaying the stacked column in alphabetical order from the bottom up (coyote/red on the bottom, fox/black in the middle, and wolf/gray on the top). I want the ability to display it by the count for each type, not the name. In this example, I want the order of coyote, wolf, and fox. I believed that there must be a way to control the order within a stacked column, so I clicked all over the UI and fired up the Google.

Read the whole thing and vote for this Power BI idea if it’s interesting.

Comments closed

Wanted: Forced Parallelism

James Anderson would like a query hint to force parallelism:

I would like to see a hint that causes the optimiser to consider a parallel plan no matter the cost of the query. It’s possible to get this behaviour with trace flag 8649 but it’s unsupported by Microsoft and therefore unfit for production use.

I only tend to use query hints as a very last resort. It’s almost always better to allow the optimiser to make these decisions and continue to reevaluate these decisions as your data changes but sometimes they can be a get out of jail free card.

I’ve written about this before and so has Paul White. Paul created this connect item for something similar. Please vote it up if you think it could help with your query tuning.

Click through for the full argument, and then hit the Connect item if you agree.

Comments closed

Parsing JSON In R

Tomaz Kastrun shows how to feed a JSON data set into R and turn that into a proper data frame:

JSON has very powerful statements for converting to and from JSON for storing into / from SQL Server engine (FOR JSON and JSON VALUE, etc).  And since it is gaining popularity for data exchange, I was curious to give it a try with R combination.

I will simply convert a system table into array using for json clause.

There’s an R library.  There’s always an R library.

Comments closed

String Or Binary Data

Lori Edwards wants to fix one of the most annoying error messages in SQL Server history:

The Connect item that I’m promoting today is Connect #339410 : Please fix the “String or binary data would be truncated” message to give the column name. This Connect item was opened in 2008 and has 1,328 upvotes. If we can create virtual realities, we should be able to add a column name to an error message. Help us Microsoft, you’re our only hope.

This error message is incredibly annoying, particularly because it seems the engine should know which record has an error and on which column that error is.  We finally got information on which row caused a primary or unique key constraint violation, and that was helpful; this would be at least as helpful.

Comments closed

Replication Support For Dropping Tables

Drew Furgiuele recovers from a case of the vapors after discovering a replication codebase update:

To enable this change though, there’s more to it than just “you need to be running this service pack level on your publisher.” It’s far, far more than that:

  • Your publisher needs to be running SQL Server 2016 SP1 or 2014 SP2. We’ve already seen that is a requirement based on the content of the KB article. Big whoop.
  • In addition, if you have the same article in more than one publication, you have to enable this feature on all of them or you’ll still get the error that a table is in replication and can’t be dropped.
  • Your subscribers don’t care about the SQL Server version (except when you have a data type that exists in the publisher that doesn’t at the subscriber. Then it cares. But for this change, it doesn’t).
  • Your distributor also needs to be running a version that supports this functionality.

I can’t make that last point any bolder; that’s not documented anywhere that I could find. I’ll explain why later, but for now, roll with it. In fact, let’s see what happens if we don’t upgrade our distributor as well. Time to go to the lab!

Read the whole thing, particularly the warning about how dropping a table on the publisher will not drop tables on the subscriber.

Comments closed

Reading Extended Event Data From Powershell

Dave Mason builds a Powershell script to parse Extended Events information:

Powershell takes center stage for this post. Previously, I showed how to handle a SQL Server Extended Event in C# by accessing the event_stream target. We can do the same thing in PowerShell. The code translates mostly line-for-line from C#. Check out the last post if you want the full back story. Otherwise, continue on for the script and some Posh-specific notes.

Read on for the code.

Comments closed

Connect Items Galore

Aaron Bertrand has a series of Connect items of interest:

There are so many active Connect items, it’s difficult to choose just one, but after spending some time looking at the generate_series function in PostgreSQL, the one at the top of my list right now is this suggestion from Erland Sommarskog (blog):

Connect #258733 : Add a built-in table of numbers

I think it would be cheap and easy for SQL Server to provide a defaut table of numbers, always in memory, just like any other catalog view or DMV. This thing can use compression now, regardless of edition, so even a table with 2 billion rows will only take 13 MB, and should be easy to populate either at start-up or on first use. I could list out all of the potential uses for a numbers table, but they’re fairly well documented in the following sources (and most use cases don’t need anywhere near 2 billion values, so maybe the range could be defined using a sys.configurations or database-level SCOPED CONFIGURATION setting)

Click through for additional Connect items, and please vote on any Connect items which catch your eye.

Comments closed

The Importance Of Auditing

Louis Davidson has a parable about database design and systems auditing:

This brings me to my data question. If an order is processed in a store, but the expected data is not created, did that order ever occur?

Very often, the staff of a business are very focused on pleasing the customer, making sure they get their product, but due to software limitations, may not end up not capturing information about every sale in a satisfactory manner. Most of the blame I have seen lies in software that doesn’t meet the requirements of a customer, making capturing desired details tedious to achieve when the process is in the norm. Very often the excuse programmers give is that too much work of the work to build a system would need to be done for the atypical cases, but requirements are requirements, and it is generally essential that every action that occurs in a business is captured as data.

Read on for more.  My conjoined twin case is, how much information do we have about why users give up?  For example, if you have a three-part form, how many users get through part one, part two, and part three?  There’s some natural level of attrition, but if you see an abnormally low follow-through rate, that might indicate a bug or major issue.  Auditing is hard work, as you have to hit both sides of the problem at the same time.

Comments closed

Finding Clusters Of Queries Using R

Tomaz Kastrun shows how to use R to find clusters of queries which behave similarly:

So the R code said that, there are three clusters generating And I used medians to generate data around it. In addition I have also tested the result with Partitioning around medoids (which is opposite to hierarchical clustering) and the results from both techniques yield clean clusters.

Clustering models can be powerful for discovering commonalities, and that might help you find a number of queries which all behave in some sub-optimal way without having to trawl through every procedure’s code.

Comments closed

Submitting A Spark Job On HDInsight

Bharath Venkatesh shows different ways to run a Spark job on HDInsight:

From HDI 3.5 onwards, our clusters come preinstalled with Zeppelin Notebooks. Much like Jupyter notebooks, Zeppelin is a web-based notebook that enables interactive data analytics. It provides built-in Spark intergration that allows for:

  • Automatic SparkContext and SQLContext injection
  • Runtime jar dependency loading from local filesystem or maven repository. Learn more about dependency loader.
  • Canceling job and displaying its progress

This MSDN article provides a quick easy-to-use onboarding guide to help get acclimatized to Zeppelin. You can also try several applications that come pre-installed on your cluster to get hands on experience of Zeppelin.

Zeppelin is probably my favorite method, but there are good reasons to use all of these.

Comments closed