Press "Enter" to skip to content

Author: Kevin Feasel

Unicode Escaping Across Various Languages

Solomon Rutzky shows how to perform Unicode character escaping in a dozen places:

The purpose of this post is to correct the overall lack of examples. Everything shown below are actual working examples of creating both a Unicode-only BMP character (meaning a non-Supplementary Character that would require Unicode) and a Supplementary Character. Most examples include a link to an online demo, either on db<>fiddle (for database demos) or IDE One (for non-database demos), both very cool and handy sites.

This includes T-SQL and MySQL as well as .NET languages, PHP, JavaScript, and even Excel. It’s a handy reference page.

Comments closed

Pre-Filtering Power BI Slicers

Matt Allington takes us through a new feature in Power BI:

Now back to the point of this post.  As of June 2019, it is possible to pre-filter slicers as well.  It may seem weird, but this previously wasn’t possible – it seemed weird to me, anyway.  This is now fixed and it is possible to use the side filter pane in the same way as other visuals.  I can think of quite a few useful scenarios, including:
– Hiding the dreaded (Blank) in a slicer.  [Actually, please don’t do that, but instead fix your data model].
– Filtering out items not relevant (eg category managers may only want to see their own products)
– Hiding items with no sales

Read on for a few examples of how to use this.

Comments closed

Random Forest on Small Numbers of Observations

Neil Saunders takes us through an interesting problem:

A recent question on Stack Overflow [r] asked why a random forest model was not working as expected. The questioner was working with data from an experiment in which yeast was grown under conditions where (a) the growth rate could be controlled and (b) one of 6 nutrients was limited. Their dataset consisted of 6 rows – one per nutrient – and several thousand columns, with values representing the activity (expression) of yeast genes. Could the expression values be used to predict the limiting nutrient?

The random forest was not working as expected: not one of the nutrients was correctly classified. I pointed out that with only one case for each outcome, this was to be expected – as the random forest algorithm samples a proportion of the rows, no correct predictions are likely in this case. As sometimes happens the question was promptly deleted, which was unfortunate as we could have further explored the problem.

Neil decided to explore the problem further regardless and came to some interesting conclusions.

Comments closed

Monitoring Big Data Clusters

Mohammad Darab continues a series on Big Data Clusters:

There are many ways to view the health of your Big Data Cluster. As of CTP 3.0, there are kubectl commands, mssqlctl commands as well as dashboards. For the sake of this series, I will focus on the dashboards. I will blog about some of the useful kubectl and mssqlctl commands in later posts.

The first dashboard is the Microsoft Cluster Administration portal (see below snapshot). This is a view into the Big Data Cluster Controller. As you can see from the image below, the Overview pane shows the Controller, Master Instance and all the pools. On the left hand side you can see more details. If you click on the “Service Endpoint” option, you will see a list of endpoints that you can bookmark.

Something I appreciate is that Microsoft thought ahead on what the monitoring story should look like rather than waiting until the end and slapping something together.

Comments closed

Simplifying Columnstore

Monica Rathbun takes us through concepts behind columnstore indexes:

Now, I admit when these first were introduced in SQL Server 2012 I found them very intimidating (additionally, you couldn’t update them directly). For me, anytime you say columnstore, my mind tends to set off alarms saying wait stay away, this is too complicated. So, in this post I am going to try and simplify the feature for you.

To do that first you need to understand some terminology and the difference between a columnstore index and a row store index (the normal kind we all use). Let’s start with the terminology.

There are some interesting complications around columnstore indexes but for analytical or warehousing queries, they’re excellent.

Comments closed

More Testing of Inline Scalar UDFs

Erik Darling makes a FROIDian slip:

The idea behind FROID is that it removes some restrictions around scalar valued functions.

1. They can be inlined into the query, not run per-row returned
2. They don’t force serial execution, so you can get a parallel plan

If your functions already run pretty quickly over a small  number of rows, and the calling query doesn’t qualify for parallelism, you may not see a remarkable speedup.

Even in that case, Erik argues that you can still get some benefits from SQL Server 2019 bringing those scalar UDFs inline.

Comments closed

SQL Server 2019 CTP 3.1 Released

Anshul Rampal announces CTP 3.1 of SQL Server 2019:

The big data clusters feature continues to add key capabilities for its initial release in SQL Server 2019. This month, the release extends the Apache Spark™ functionality for the feature by supporting the ability to read and write to data pool external tables directly as well as a mechanism to scale compute separately from storage for compute-intensive workloads. Both enhancements should make it easier to integrate Apache Spark™ workloads into your SQL Server environment and leverage each of their strengths. Beyond Apache Spark™, this month’s release also includes machine learning extensions with MLeap where you can train a model in Apache Spark™ and then deploy it for use in SQL Server through the recently released Java extensibility functionality in SQL Server CTP 3.0. This should make it easier for data scientists to write models in Apache Spark™ and then deploy them into production SQL Server environments for both periodic training and full production against the trained model in a single environment.

Click through to learn more about what has changed.

Comments closed

Shortest Path with SQL Server 2019

Shreya Verma shows off a new feature in SQL Server 2019 CTP 3.1:

We are expanding the graph database capabilities with several new features. In this blog we discuss one of these features that is now available for public preview in SQL Server 2019Shortest Path, which can be used to find a shortest path between two nodes in a graph. The shortest path function can also be used to compute a transitive closure or for arbitrary length traversals.
 
With CTP3.1, we are introducing a new function SHORTEST_PATH, which can be used inside MATCH to find a shortest path between any 2 nodes in a graph or to perform arbitrary length traversals. Users can specify a pattern they would like to search for in the graph using a regular expression style syntax.

I’d be interested in seeing how well it performs. But it’s good to see graph functionality fleshed out a bit more.

Comments closed

Best Practices for Query Store

Erin Stellato ties together a number of best practices around Query Store:

I’m a huge fan of Query Store, which regular readers may know, but there’s a need to write a bit more about Query Store best practices.  This isn’t a “you must use this feature” post, this is a “here is what you must know if you want to use this feature” post.

I have a lot of content about Query Store, but maybe what’s really important gets lost amongst everything else.  Glenn prompted me to write this, after I worked with two customers last week that ran into issues related to their configuration of Query Store.  Listed below are the things you must know before you enable Query Store.  If you already have Query Store enabled and running, I recommend reviewing the information to ensure you are following best practices.

Click through for the full set of practices and links to additional details.

Comments closed

Lasso and Ridge Regression in Python

Kristian Larsen shows off a few regression techniques using Python:

Variables with a regression coefficient equal to zero after the shrinkage process are excluded from the model. Variables with non-zero regression coefficients variables are most strongly associated with the response variable. Therefore, when you conduct a regression model it can be helpful to do a lasso regression in order to predict how many variables your model should contain. This secures that your model is not overly complex and prevents the model from over-fitting which can result in a biased and inefficient model.

Read on for demonstrations.

Comments closed