Press "Enter" to skip to content

Curated SQL Posts

More On Adaptive Joins

Erik Darling has a couple more posts on adaptive joins in SQL Server 2017.  First, he wonders what happens when you add scalar functions to the mix:

See, this isn’t SARGable either (and no, SCHEMABINDING doesn’t change this). When a predicate isn’t SARGable, you take away an index seek as an access choice. You don’t see too many Nested Loops with an index scan on the other end, do you?

No.

So there you go. It’s not the function itself that bops our Adaptive Join on the head, but the lack of SARGability.

Erik also looks at using APPLY and EXISTS to make sure you can use adaptive joins with that dynamic duo:

I do have to point out that Cross Apply used to only be implemented as a Nested Loops Join. I learned that many years ago from one of the best articles written about Cross Apply by Paul White. That changed recently — it’s possible to see it implemented with a Hash Join in at least 2016. I’ve seen it crop up in Cross Apply queries without a TOP operator.

The latter results are a bit surprising.

Comments closed

Kinesis Data Generation

Allan MacInnis shows off a new data generation tool for Amazon’s Kinesis:

Amazon Kinesis Streams and Amazon Kinesis Firehose enable you to continuously capture and store terabytes of data per hour from hundreds of thousands of sources. Amazon Kinesis Analytics gives you the ability to use standard SQL to analyze and aggregate this data in real-time. It’s easy to create an Amazon Kinesis stream or Firehose delivery stream with just a few clicks in the AWS Management Console (or a few commands using the AWS CLI or Amazon Kinesis API). However, to generate a continuous stream of test data, you must write a custom process or script that runs continuously, using the AWS SDK or CLI to send test records to Amazon Kinesis. Although this task is necessary to adequately test your solution, it means more complexity and longer development and testing times.

Wouldn’t it be great if there were a user-friendly tool to generate test data and send it to Amazon Kinesis? Well, now there is—the Amazon Kinesis Data Generator (KDG).

Check it out if you’re using Kinesis and need to do some load testing.

Comments closed

Getting Execution Plans In Spark

Anubhav Tarar shows how to get an execution plan for a Spark job:

There are three types of logical plans:

  1. Parsed logical plan.
  2. Analyzed logical plan.
  3. Optimized logical plan.

Analyzed logical plans go through a series of rules to resolve. Then, the optimized logical plan is produced. The optimized logical plan normally allows Spark to plug in a set of optimization rules. You can plug in your own rules for the optimized logical plan.

Click through for the details.

Comments closed

Machine Learning At Build 2017

Adnan Masood looks at some of the new machine learning offerings in Azure:

Language Understanding Intelligent Service (LUIS) is one of the marquee offerings in cognitive services which contains an entire suite of NLU / NLP capabilities, teaching applications to understand entities, utterances, and genera; commands from user input. Other language services include Bing Spell Check API which detect and correct spelling mistakes, Web Language Model API which helps building knowledge graphs using predictive language models Text Analytics API to perform topic modeling and do sentiment analysis, as well as Translator Text API to perform automatic text translation. The Linguistic Analysis API is a new addition which parses and provide context around language concepts.

In the knowledge spectrum, the Recommendations API to help predict and recommend items, Knowledge Exploration Service to enable interactive search experiences over structured data via natural language inputs, Entity Linking Intelligence Service for NER / disambiguation, Academic Knowledge API (academic content in the Microsoft Academic Graph search), QnA Maker API, and the newly minted custom Decision Service which provides a contextual decision-making API with reinforcement learning features. Search APIs include Autosuggest, news, web, image, video and customized searches.

There are some nice products available on the Azure platform and Adnan does a good job of outlining them.

Comments closed

Building A Spinning Globe With R

James Cheshire shows how to use R to create an image of a spinning globe:

It has been a long held dream of mine to create a spinning globe using nothing but R (I wish I was joking, but I’m not). Thanks to the brilliant mapmate package created by Matt Leonawicz and shed loads of computing power, today that dream became a reality. The globe below took 19 hours and 30 processors to produce from a relatively low resolution NASA black marble data, and so I accept R is not the best software to be using for this – but it’s amazing that you can do this in R at all!

Now all that is missing is a giant TV and an evil lair.

Comments closed

Log Info DMF

Andrew Pruski looks at the new sys.dm_db_log_info dynamic management function in SQL Server 2017:

This new DMF is intended to replace the (not so) undocumented DBCC LOGINFO command. I say undocumented as I’ve seen tonnes of blog posts about it but never an official Microsoft page.

This is great imho, we should all be analyzing our database’s transaction log and this will help us to just that. Now there are other undocumented functions that allow us to review the log (fn_dblog and fn_dump_dblog, be careful with the last one).

I’m glad to see this make the cut, as replacing informative / idempotent DBCC commands with DMVs makes it easier to query this data, particularly in conjunction with other DMVs or system tables.

Comments closed

Stop Profiling

Wayne Sheffield concludes his two-part series on converting a trace to use Extended Events:

This query extracts the data from XE file target. It also calculates the end date, and displays both the internal and user-friendly names for the resource_type and mode columns – the internal values are what the trace was returning.

For a quick recap: in Part 1, you learned how to convert an existing trace into an XE session, identifying the columns and stepping through the GUI for creating the XE session. In part 2 you learned how to query both the trace and XE file target outputs, and then compared the two outputs and learned that all of the data in the trace output is available in the XE output.

Read the whole thing.

Comments closed

Why Restores Can Be Slow

Paul Randal explains why a database restoration tends to be slower than backing that database up:

Here’s a list of things you can do to make restoring a full backup go faster:

  • Ensure that instant file initialization is enabled on the SQL Server instance performing the restore operation, to avoid spending time zero-initializing any data files that must be created. This can save hours of downtime for very large data files.

  • If possible, restore over the existing database – don’t delete the existing files. This avoids having to create and potentially zero initialize the files completely, especially the log file. Be very careful when considering this step, as the existing database will be irretrievably destroyed once the restore starts to overwrite it.

  • Consider backup compression, which can speed up both backup and restore operations, and save disk space and storage costs.

It’s a straightforward explanation, and Paul provides a few more tips for speeding up restorations.

Comments closed

Getting A Date Is Hard

Nate Johnson has a fun rant about datetime ranges in SQL Server and date pickers:

I mean, I’m not that old, but spinning thru a few decades is still slower than just typing 4 digits on my keyboard — especially if your input-box is smart enough to flip my keyboard into “numeric only” mode.

Another seemingly popular date-picker UX is the “calendar control”.  Oh gawd.  It’s horrible!  Clicking thru pages and pages of months to find and click (tap?) on an itty bitty day box, only to realize “Oh crap, that was the wrong year… ok let me go back.. click, click, tap..” ad-nauseum.

Food for thought.

Comments closed

Multiple R Studio Users On HDInsight

Xiaoyong Zhu shows how to set up additional R Studio users in an HDInsight cluster:

Basically speaking, the “http user” will be used to authenticate through the HDInsight gateway, which is used to protect the HDInsight clusters you created. This user is used to access the Ambari UI, YARN UI, as well as many other UI components.

The “ssh user” will be used to access the cluster through secure shell. This user is actually a user in the Linux system in all the head nodes, worker nodes, edge nodes, etc., so you can use secure shell to access the remote clusters.

For Microsoft R Server on HDInsight type cluster, it’s a bit more complex, because we put R Studio Server Community version in HDInsight, which only accepts Linux user name and password as login mechanisms (it does not support passing tokens), so if you have created a new cluster and want to use R Studio, you need to first login using the http user’s credential and login through the HDInsight Gateway, and then use the ssh user’s credential to login to RStudio.

It’s a good read and also includes a sample Spark-R job.

Comments closed