Press "Enter" to skip to content

Month: October 2017

Sparklines In R

Robert Sheldon shows how to use SQL Server R Services to display sparklines for categories:

In this article, we continue our discussion on visualizations, but switch the focus to sparklines and other spark graphs. As with many aspects of the R language, there are multiple options for generating spark graphs. For this article, we’ll focus on using the sparkTable package, which allows us to create spark graphs and build tables that incorporate those graphs directly, a common use case when working with spark images.

In the examples to follow, we’ll import the sparkTable package and generate several graphs, based on data retrieved from the AdventureWorks2014 sample database. We’ll also build a table that incorporates the SQL Server data along with the spark graphs. Note, however, that this article focuses specifically on working with the sparkTable package. If you are not familiar with how to build R scripts that incorporate SQL Server data, refer to the previous articles in this series. You should understand how to use the sp_execute_external_script stored procedure to retrieve SQL Server data and run R scripts before diving into this article.

Sparklines and associated visuals have their place in the world.  Read on to see how you can build a report displaying them.

Comments closed

Index Rebuilds Reset DMV Counters

Clive Strong notes that an index rebuild will reset certain DMV counters:

As it transpires, an index rebuild will reset the counters for this index within the sys.dm_db_index_usage DMV and this is potentially very dangerous unless you are aware of this. Normally, we determine whether or not an index is in use by looking at this information, but if you perform regular maintenance on this table, you could be resetting the data which you rely on for an accurate decision.

Read the whole thing.

Comments closed

Unboxing ISPACs

It’s an early Christmas for Richie Lee:

The first file that we’re going to look at is the [Content_Types].xml file, and this is the file that confirms that the ZipPackage class is used. There’s an article here that is ten years old but is still valid (scroll down to the System.IO.Packaging INcludes Zip Support to read up on this.) This is because we know that the content_types file is part of the output when using the ZipPackage class to zip up a bunch of files into a .zip. The content_file contains both the extension and content type of the three other files that are included in the ispac:

  • dtsx
  • params
  • manifest

Note that the content_types file does not specify the files, either in quantity or in content, other than the fact that they will contain xml.

Read on for a good amount of detail on what’s included in an Integration Services package.

Comments closed

No Hierarchies In SSAS DirectQuery

Thomas LeBlanc points out that Analysis Services Tabular’s DirectQuery functionality does not include hierarchies on dimensions:

You can go to this like from Microsoft about more limitations.

There was also difference in labeling when designing a Tabular Model in Visual Studio (SQL Server Data Tools – SSDT) and making a change to the Model from SQL Server Management Studio – SSMS.

In SSDT, The Property for the Model is DirectQuery and the values are On and Off.

It’s certainly not the end of the world, but I can see it being a bit annoying to deal with.

Comments closed

I/O Read-Ahead In SQL Server

Kendra Little shows how read-ahead works, using the example of index seeks:

Looking at an actual execution plan, I dig into the index seek operator and it shows me information about the physical IO. Almost all of the requests were read-ahead reads.

Read-ahead is a mechanism that SQL Server can use when it’s pulling a lot of information from disk. Instead of pulling 8K pages onesy-twosy-threesy, SQL Server can suck up big chunks of pages from disk with a vacuum cleaner.

If you’re running developer or enterprise edition, you may get a larger vacuum cleaner.

Read-ahead is a good piece of functionality, but those reads still have a cost associated, and the cheapest read is the read you don’t do.

Comments closed

Finding Progress On A Long-Running Statement

David Fowler shows us how to track how far we’ve gotten on a long-running data modification statement:

Well, it would turn out that there is and to find out we need to turn to our trusty friend, the transaction log.

As we know, the transaction log will squirrel away an entry each time that a row is modified.  We can count up all the LOP_MODIFY_ROW, LOP_INSERT_ROW and LOP_DELETE_ROWS entries for our transaction and that will tell us just how many rows our transaction has altered so far.

Click through for a script, as well as an important disclaimer.

Comments closed

Finding Broken Views

Bill Fellows has a script to test each view to see if it is broken:

Shh, shhhhhh, we’re being very very quiet, we’re hunting broken views. Recently, we were asked to migrate some code changes and after doing so, the requesting team told us we had broken all of their views, but they couldn’t tell us what was broken, just that everything was. After a quick rollback to snapshot, thank you Red Gate SQL Compare, I thought it’d be enlightening to see whether anything was broken before our code had been deployed.

You’ll never guess what we discovered.

Read on to see what they discovered (spoilers:  broken views) and how Bill fixed the problem.

Comments closed

Using Akka In A Streaming Solution

Artem Rukavytsia shows us how you can easily integrate Akka into a solution with Kafka and Spark Streaming:

Akka gives you the opportunity to make logic for producing/consuming messages from Kafka with the Actor model. It’s very convenient if actors are widely used in your code and it significantly simplifies making data pipelines with actors. For example, you have your Akka Cluster, one part of which allows you to crawl of web pages and the other part of which makes it possible to index and send indexed data to Kafka. The consumer can aggregate this logic. Producing data to Kafka looks as follows:

The Actor model, which Akka implements, is something I kind of understand, but have never spent much time trying to implement.  I can see how it’d make perfect sense communicating with Kafka, though, given the scale and independence of consumers within a consumer group that Kafka provides.

Comments closed

What’s New In Spark 2.2?

Geetika Gupta shows us some of the updates in Apache Spark 2.2:

The major addition to this release is Structured Streaming. It has been marked as production ready and its experimental tag has been removed.

Some of the high-level changes and improvements :

  • Production ready Structured Streaming

  • Expanding SQL functionalities

  • New distributed machine learning algorithms in R

  • Additional Algorithms in MLlib and GraphX

Read on for more details.

Comments closed

Errors Using Native Prediction In SQL Server

Sacha Tomey walks us through a few potential issues when converting code which uses SQL Server Machine Learning Services’s sp_execute_external_script procedure to native PREDICT calls:

Stumble One:

Error occurred during execution of the builtin function 'PREDICT' with HRESULT 0x80004001. 
Model type is unsupported.

Reason:

Not all models are supported. At the time of writing, only the following models are supported:

  • rxLinMod
  • rxLogit
  • rxBTrees
  • rxDtree
  • rxdForest

sp_rxPredict supports additional models including those available in the MicrosoftML package for R (I was using attempting to use rxFastTrees). I presume this limitation will reduce over time. The list of supported models is referenced in the PREDICT function (Documentation).

sp_rxPredict does require CLR, but it’s a viable alternative if you need to use a model not currently supported—like rxNeuralNet.

Comments closed