Press "Enter" to skip to content

Curated SQL Posts

Dynamic Data Masking For Lower Environments

Joey D’Antoni shows how to use Dynamic Data Masking to help prevent sensitive production data from getting to lower environments:

Well at PASS Summit, both in our booth and during my presentation on security in Azure DB, another idea came up—exporting data from production to development, while not releasing any sensitive data. This is a very common scenario—many DBAs have to export sensitive data from prod to dev, and frequently it is done in an insecure fashion.

Doing this requires a little bit of trickery, as dynamic data masking does not work for administrative users. So you will need a second user.

Read on for details.

Comments closed

Elasticsearch 5.0

Itamar Syn-hershko looks at the new functionality in the latest version of Elasticsearch:

One fundamental feature of Elasticsearch is scoring – or results ranking by relevance. The part that handles it is a Lucene component called Similarity. ES 5.0 now makes Okapi BM25 the default similarity and that’s quite an important change. The default has long been tf/idf, which is both simpler to understand but easier to be fooled by rogue results. BM25 is a probabalistic approach to ranking that almost always gives better results than the more vanilla tf/idf. I’ve been recommending customers to use BM25 over tf/idf for a long time now, and we also rely on it at Forter for doing quite a lot of interesting stuff. Overall, a good move by ES and I can finally archive a year’s long advise. Britta Weber has a great talk on explaining the difference, and BM25 in particular, definitely a recommended watch.

This is one of several search-related features in the latest version.  Looks like a solid release.

Comments closed

RStudio 1.0

RStudio has officially hit 1.0:

While RStudio has been an enormously useful IDE for R since day 1, it’s officially been in “beta” status all of this time. But last week, RStudio released the first official production version, RStudio 1.0. Check out that link for the release history of RStudio and all that’s been added to it over the last 6 years, but this release also adds major new functionality, including:

R Tools for Visual Studio is certainly making strides, but RStudio is the gold standard for R IDEs.

Comments closed

Installing Polybase

I have a post on installing Polybase:

Java gets updated due to security vulnerabilities approximately once every three days, so I won’t link to any particular version.  You only need to get the Java Runtime Environment (JRE), not the Java Development Kit (JDK).  Anyhow, once you have that installed, you can safely install SQL Server.

In the Polybase configuration section, you have the option of making this a standalone Polybase instance or enlisting it as part of a scale-out group.  In my case, I want to leave this as a standalone Polybase machine.  The reason that I want to leave it as a standalone machine is that I do not have this machine on a Windows domain, and you need domain accounts for Polybase scaleout to work correctly.  Later in the series, we’ll give multi-node Polybase a shot.

This is the easiest installation scenario, but it’s a start.

Comments closed

Discovering Deprecated Functionality

Jason Brimhall shows how to see if you’re using deprecated functionality in SQL Server:

Right here would be a good time to point out that your mileage may vary. Some items that are deprecated are more painful than others. Some may be deprecated and may have been on the list for 10+ years at this point. The point is, know your environment and then use good judgement to determine which items in your results need the most attention to fix and update. Why? Well, things really may break especially if you are looking to upgrade to a new version of SQL Server. Just because an item is still available in your current edition, that does not ensure it will still be available in a future release of SQL Server.

Now for the juicy stuff. As I mentioned, finding when and where a deprecated feature or syntax is employed, there are a couple of neat little events within Extended Events that can help to track each time a deprecated feature is accessed or employed. How do we do that? Use the deprecation_announcement and deprecation_final_support events. To help create sessions to track these events, I have the following script to create an “audit” session to do exactly that.

Click through for Extended Event scripts and demo code.

Comments closed

Globe Map Visual

Devin Knight has part 24 of his custom visuals series:

  • The Globe Map is a 3D globe visualization.

  • It looks similar to the technology Power Map in Excel but lacks a few features like animating the data across time.

  • The map can have multiple data visualization layers on top of the map like a bar chart and a heat map.

Under the right circumstances, this can be a useful visualization.  I think its benefit is mostly limited to the “wow, this looks cool” effect.

Comments closed

Machine Learning With R Q&A

Ginger Grant answers a series of questions about R and machine learning:

Question: Is it possible to run R processes in diffrent boxes other than SQL Server itself for scalability reasons?

You have the option of installing the R Server on another server. Just keep in mind that you do have to account for the additional overhead of moving all the data over the network, which needs to weigh in on your decision to move processing to a different server.

Click through for plenty more questions and answers.

Comments closed

Test Connection With HDInsight

I have a post trying to test a connection using HDInsight:

WebHCat is a web-based REST API for HCatalog, a management layer for dealing with files in HDFS.  If you’re looking for configuration settings for WebHCat, you’ll want generally to look for “templeton” in config files, as Templeton was the project name before WebHCat.  In Ambari, you can go to the Hive configs and look at webhcat-site.xml for configuration settings.  For WebHCat, the default port in HDInsight is 30111, which you should find in the templeton.port configuration setting.

I don’t like the fact that WebHDFS is blocked, but at least WebHCat is functional.

Comments closed