Press "Enter" to skip to content

Month: February 2019

ISNUMERIC And Unexpected Results

Jen Stirrup explains why ISNUMERIC isn’t all that great:

I noted that one of the columns failed to convert VARCHAR to DECIMAL.

The error message is below, and it’s usually fairly easy to sort:
Error converting data type varchar to numeric

Normally, I’d use ISNUMERIC to identify the rows that fail to have a value in that column that could be converted to a number. Then, I could identify the value, and then I could replace or exclude it, as required.

However, on this occasion, using ISNUMERIC failed to identify any columns as being non-numeric. 

Click through to see why Jen got this result.

Comments closed

Combining Stream Analytics And Azure ML With Power BI

Brad Llewellyn shows us how to feed Azure ML predictions into Power BI via Azure Stream Analytics:

Today, we’re going to talk about combining Stream Analytics with Azure Machine Learning Studio within Power BI.  If you haven’t read the earlier posts in this series, IntroductionGetting Started with R ScriptsClusteringTime Series DecompositionForecastingCorrelationsCustom R VisualsR Scripts in Query EditorPythonAzure Machine Learning Studio and Stream Analytics, they may provide some useful context.  You can find the files from this post in our GitHub Repository.  Let’s move on to the core of this post, Stream Analytics.

This post is going to build directly on what we created in the two previous posts, Azure Machine Learning Studio and Stream Analytics.  As such, we recommend that you read them before proceeding.

Read on for the demo.

Comments closed

Extracting Numerical Data Points From Images

Matt Allington visualizes changes in the Gartner magic quadrant for BI tools:

Today Gartner released the 2019 magic quadrant for Business Intelligence.  As expected (by me at least), Microsoft is continuing its trail blazing and now has a clear lead over Tableau in both ability to execute and completeness of vision.  I thought it would be interesting to see a trend over time for the last 5 years, as this is the time period that I have been a professional Power BI Consultant.  I needed some way to extract the numerical data points from the images I had collected.  This article shows you how to do that.  Here is the final output – a scatter chart with a play axis in Power BI of course.

I was just commenting the other day about how somebody should do this and Matt went and did it.

Comments closed

Clearing sysssislog

Eduardo Pivaral shows that you should clear out some of the bigger SSIS tables occasionally:

If you have SQL Server Integration Services installed on your server, and you left the default configurations a table named sysssislog is created on MSDB database, it contains logging entries for packages executed on that instance.

If you are not careful enough, this table can grow uncontrollably over time and can make subsequent insertions very slow.

This kind of batched delete works for more than just the Integration Services tables; it’s a good plan wherever you have a large table and need to delete a fairly significant number of records from it.

Comments closed

sp_executesql WITH RECOMPILE

Erik Darling points out that running sp_executesql with the WITH RECOMPILE setting doesn’t really change anything:

This’ll give us the key lookup plan you see above. If I re-run the query and use the 2010-12-30 date, we’ll re-use the key lookup plan.

That’s an example of how parameters are sniffed.

Sometimes, that’s not a good thing. Like, if I passed in 2008-12-30, we probably wouldn’t like a lookup too much.

One common “solution” to parameter sniffing is to tack a recompile hint somewhere.

Click through for Erik’s demonstration.

Comments closed

Azure Data Studio: Extension Installation

Grant Fritchey shows how easy it is to add an extension to Azure Data Studio:

If you’re even thinking about experimenting with, let alone actively using, Azure Data Studio, you need to plan on installing a few extensions. Buck Woody has a great list that you should look through in this blog post. If you’re just getting started with Azure Data Studio, I have an introduction here.

Depending on the extension, this could be a simple as a mouse click. However, not all the extensions are that easy. Let’s explore this just a little so when you do start using Azure Data Studio, things are easy.

You can reasonably install Management Studio and never think about adding extensions. Don’t do that with Azure Data Studio, though: a lot of the benefit comes from its extensibility. And Microsoft tends to add things as extensions before bringing them into the base product.

Comments closed

Working With Skewed Data In Pig

Dmitry Tolpeko explains how you can use the Weighted Range Partitioner in Apache Pig to work with highly skewed data:

The problem is that there are 3,000 map tasks are launched to read the daily data and there are 250 distinct event types, so the mappers will produce 3,000 * 250 = 750,000 files per day. That’s too much.

Reducing the number of map tasks is not always an option as it increases the execution time, and the number of files is still high (500 tasks will produce 125,000 files per day), while our goal is to have about 500-700 files daily for this single table.

A well-known solution for this problem is to force a reduce phase. But what if the data is highly skewed, for example, when there are few event types having hundreds of millions rows, while others just have a few million rows?

This is a combination of an interesting problem and an interesting solution.

Comments closed

Spark Streaming Using DStreams Or DataFrames?

Yaroslav Tkachenko contrasts the two methods for operating on data with Spark Streaming:

Spark Streaming went alpha with Spark 0.7.0. It’s based on the idea of discretized streams or DStreams. Each DStream is represented as a sequence of RDDs, so it’s easy to use if you’re coming from low-level RDD-backed batch workloads. DStreams underwent a lot of improvements over that period of time, but there were still various challenges, primarily because it’s a very low-level API.

As a solution to those challenges, Spark Structured Streaming was introduced in Spark 2.0 (and became stable in 2.2) as an extension built on top of Spark SQL. Because of that, it takes advantage of Spark SQL code and memory optimizations. Structured Streaming also gives very powerful abstractions like Dataset/DataFrame APIs as well as SQL. No more dealing with RDD directly!

For me, it’s DataFrames all day. But Yaroslav has a more nuanced answer which is worth reading. There are also a couple of good examples.

Comments closed

Dealing With Massive Database Counts

Edward Harrison describes some of the issues you run into when working on a SQL Server instance with thousands of databases:

One of the headaches we frequently have to deal with is the excessive amount of time it takes for SQL Servers with the discussed design to go through any planned patches. We have seen this while carrying out some planned maintenance for one of our customers and the work simply comprised failing over a SQL Server instance, from the active node, over onto the passive node so that the former active node could be patched. The passive node had had its SQL Server patched earlier in the day while the active node was still on the older patch level.

This SQL Server had 10,000 databases and all that we needed to do was to simply click the buttons to failover the SQL Server and then wait for it come online on the other node. In the end, this failover took nearly four hours to complete because upgrade scripts had to be applied to each of the 10,000 databases.

Even a couple thousand databases on an instance can slow things down considerably. Consider an operation which takes 10 milliseconds like laying out a database in the object explorer in SSMS. With 10K databases, that 10ms activity (note that I’m making up the number) would take 100 seconds. And these are often blocking tasks, meaning you’re sitting there spinning your wheels for 100 seconds before you can do the thing you wanted to do. The pain is real. And for server activities, most administrative functions (backups, CHECKDB, index rebuilding, etc.) is done one at a time, meaning you have a lot of time spent on this stuff.

Comments closed

Installing R From Powershell

Tomaz Kastrun shows us how to install R and RStudio via Powershell:

For the brevity of this post, I will only download couple of R packages from CRAN repository, but this list is indefinite.
There are ways many ways to retrieve the CRAN packages for particular R version using powershell. I will just demonstrate this by using Invoke-WebRequest cmdlet.
Pointing your cmdlet to URL: https://cran.r-project.org/bin/windows/contrib/3.5  where  list of all packages for this version is available. But first we need to extract the HTML tag where information is stored.

There’s quite a bit of code here, but the upside is that you get the ability to automate server installs.

Comments closed