Press "Enter" to skip to content

Author: Kevin Feasel

Machine Learning At Build 2017

Adnan Masood looks at some of the new machine learning offerings in Azure:

Language Understanding Intelligent Service (LUIS) is one of the marquee offerings in cognitive services which contains an entire suite of NLU / NLP capabilities, teaching applications to understand entities, utterances, and genera; commands from user input. Other language services include Bing Spell Check API which detect and correct spelling mistakes, Web Language Model API which helps building knowledge graphs using predictive language models Text Analytics API to perform topic modeling and do sentiment analysis, as well as Translator Text API to perform automatic text translation. The Linguistic Analysis API is a new addition which parses and provide context around language concepts.

In the knowledge spectrum, the Recommendations API to help predict and recommend items, Knowledge Exploration Service to enable interactive search experiences over structured data via natural language inputs, Entity Linking Intelligence Service for NER / disambiguation, Academic Knowledge API (academic content in the Microsoft Academic Graph search), QnA Maker API, and the newly minted custom Decision Service which provides a contextual decision-making API with reinforcement learning features. Search APIs include Autosuggest, news, web, image, video and customized searches.

There are some nice products available on the Azure platform and Adnan does a good job of outlining them.

Comments closed

Building A Spinning Globe With R

James Cheshire shows how to use R to create an image of a spinning globe:

It has been a long held dream of mine to create a spinning globe using nothing but R (I wish I was joking, but I’m not). Thanks to the brilliant mapmate package created by Matt Leonawicz and shed loads of computing power, today that dream became a reality. The globe below took 19 hours and 30 processors to produce from a relatively low resolution NASA black marble data, and so I accept R is not the best software to be using for this – but it’s amazing that you can do this in R at all!

Now all that is missing is a giant TV and an evil lair.

Comments closed

Log Info DMF

Andrew Pruski looks at the new sys.dm_db_log_info dynamic management function in SQL Server 2017:

This new DMF is intended to replace the (not so) undocumented DBCC LOGINFO command. I say undocumented as I’ve seen tonnes of blog posts about it but never an official Microsoft page.

This is great imho, we should all be analyzing our database’s transaction log and this will help us to just that. Now there are other undocumented functions that allow us to review the log (fn_dblog and fn_dump_dblog, be careful with the last one).

I’m glad to see this make the cut, as replacing informative / idempotent DBCC commands with DMVs makes it easier to query this data, particularly in conjunction with other DMVs or system tables.

Comments closed

Stop Profiling

Wayne Sheffield concludes his two-part series on converting a trace to use Extended Events:

This query extracts the data from XE file target. It also calculates the end date, and displays both the internal and user-friendly names for the resource_type and mode columns – the internal values are what the trace was returning.

For a quick recap: in Part 1, you learned how to convert an existing trace into an XE session, identifying the columns and stepping through the GUI for creating the XE session. In part 2 you learned how to query both the trace and XE file target outputs, and then compared the two outputs and learned that all of the data in the trace output is available in the XE output.

Read the whole thing.

Comments closed

Why Restores Can Be Slow

Paul Randal explains why a database restoration tends to be slower than backing that database up:

Here’s a list of things you can do to make restoring a full backup go faster:

  • Ensure that instant file initialization is enabled on the SQL Server instance performing the restore operation, to avoid spending time zero-initializing any data files that must be created. This can save hours of downtime for very large data files.

  • If possible, restore over the existing database – don’t delete the existing files. This avoids having to create and potentially zero initialize the files completely, especially the log file. Be very careful when considering this step, as the existing database will be irretrievably destroyed once the restore starts to overwrite it.

  • Consider backup compression, which can speed up both backup and restore operations, and save disk space and storage costs.

It’s a straightforward explanation, and Paul provides a few more tips for speeding up restorations.

Comments closed

Getting A Date Is Hard

Nate Johnson has a fun rant about datetime ranges in SQL Server and date pickers:

I mean, I’m not that old, but spinning thru a few decades is still slower than just typing 4 digits on my keyboard — especially if your input-box is smart enough to flip my keyboard into “numeric only” mode.

Another seemingly popular date-picker UX is the “calendar control”.  Oh gawd.  It’s horrible!  Clicking thru pages and pages of months to find and click (tap?) on an itty bitty day box, only to realize “Oh crap, that was the wrong year… ok let me go back.. click, click, tap..” ad-nauseum.

Food for thought.

Comments closed

Multiple R Studio Users On HDInsight

Xiaoyong Zhu shows how to set up additional R Studio users in an HDInsight cluster:

Basically speaking, the “http user” will be used to authenticate through the HDInsight gateway, which is used to protect the HDInsight clusters you created. This user is used to access the Ambari UI, YARN UI, as well as many other UI components.

The “ssh user” will be used to access the cluster through secure shell. This user is actually a user in the Linux system in all the head nodes, worker nodes, edge nodes, etc., so you can use secure shell to access the remote clusters.

For Microsoft R Server on HDInsight type cluster, it’s a bit more complex, because we put R Studio Server Community version in HDInsight, which only accepts Linux user name and password as login mechanisms (it does not support passing tokens), so if you have created a new cluster and want to use R Studio, you need to first login using the http user’s credential and login through the HDInsight Gateway, and then use the ssh user’s credential to login to RStudio.

It’s a good read and also includes a sample Spark-R job.

Comments closed

SQL Server Query Options

John Morehouse shows how to set your query options to be the same as that problematic application which is generating nasty queries:

Have you every executed a query in SQL Server Management Studio, looked at the execution plan, and noticed that it was a different plan than what was generated on the server?

A potential reason for this could be a different option settings.  The options represent the SET values of the current session.  SET options can affect how the query is execute thus having a different execution plan.   You can find these options in two places within SSMS under Tools -> Options -> Query Execution -> SQL Server -> Advanced.

Click through for lots of information including a script John provided to see which options are currently on.

Comments closed

R Is Bad For You?

Bill Vorhies lays out a controversial argument:

I have been a practicing data scientist with an emphasis on predictive modeling for about 16 years.  I know enough R to be dangerous but when I want to build a model I reach for my SAS Enterprise Miner (could just as easily be SPSS, Rapid Miner or one of the other complete platforms).

The key issue is that I can clean, prep, transform, engineer features, select features, and run 10 or more model types simultaneously in less than 60 minutes (sometimes a lot less) and get back a nice display of the most accurate and robust model along with exportable code in my selection of languages.

The reason I can do that is because these advanced platforms now all have drag-and-drop visual workspaces into which I deploy and rapidly adjust each major element of the modeling process without ever touching a line of code.

I have almost exactly the opposite thought on the matter:  that drag-and-drop development is intolerably slow; I can drag and drop and connect and click and click and click for a while, or I can write a few lines of code.  Nevertheless, I think Bill’s post is well worth reading.

Comments closed

Pretty R Plots

Simon Jackson has a couple posts on how to use ggplot2 to make graphs prettier.  First, histograms:

Time to jazz it up with colour! The method I’ll present was motivated by my answer to this StackOverflow question.

We can add colour by exploiting the way that ggplot2 stacks colour for different groups. Specifically, we fill the bars with the same variable (x) but cut into multiple categories:

Then he follows up with scatter plots:

Shape and size

There are many ways to tweak the shape and size of the points. Here’s the combination I settled on for this post:

There are some nice tricks here around transparency, color scheme, and gradients, making it a great series.  As a quick note, this color scheme in the histogram headliner photo does not work at all for people with red-green color-blindness.  Using a URL color filter like Toptal’s is quite helpful in discovering these sorts of issues.

Comments closed