Press "Enter" to skip to content

Month: August 2017

Saving Statistics Sample Rates

Pedro Lopes shows off a new feature in the latest SQL Server 2016 CU:

When SQL Server creates or updates statistics and a sampling rate is not manually specified, SQL Server calculates a default sampling rate. Depending on the real distribution of data in the underlying table, the default sampling rate may not accurately represent the data distribution and then cause degradation of query plan efficiency.

To improve this scenario, a database administrator can choose to manually update statistics with a specific sampling rate that can better represent the distribution of data. However, a subsequent automatic update statistics operation will reset back to the default sampling rate, possibly reintroducing degradation of query plan efficiency.

With the most recent SQL Server 2016 SP1 CU4, we released an enhancement for the CREATE and UPDATE STATISTICS command – the ability to persist sampling rates between updates with a PERSIST_SAMPLE_PERCENT keyword.

This seems rather useful.

Comments closed

Substrings: Powershell Versus T-SQL

Shane O’Neill contrasts the SUBSTRING function in T-SQL with Powershell’s Substring method:

The main difference that I can see when using SUBSTRING() in SQL Server versus in PowerShell is that SQL Server is very forgiving.

If you have a string that is 20 characters longs and you ask for everything from the 5th character to the 100th character, SQL Server is going to look at this, see that the string does not go to the 100th character, and just give you everything that it can.

It’s a small difference but an important one.

Comments closed

Anti-Virus On Your Database Server?

Steve Stedman gives you food for thought if you need to run anti-virus software on your SQL Server instance:

In a perfect world, your SQL Server would be so secure that you would not need antivirus software, you would have behind layers of firewalls, nobody would ever connect with remote desktop to install anything, and it would always have all of the latest security patches… But that is not the real world.

Given that your SQL Server often times contains extremely valuable information, and that the damage that could be done by virus software, malware, and ransomware could be so great then it is strongly recommended that you run antivirus software on your SQL Server. There are some files that you will want to exclude from the virus check.

I’m not a big fan of running anti-virus software on database instances, but if you have to run it for whatever reason, be sure to check out Steve’s advice.

Comments closed

R Services 182 Error

Joey D’Antoni provides a solution to a tricky SQL Server R Services error:

Recently, and unfortunately I don’t have an exact date on when this started failing (though it was around service pack 1 install time) with the following error:

Error
Msg 39012, Level 16, State 1, Line 10
Unable to communicate with the runtime for ‘R’ script. Please check the requirements of ‘R’ runtime.
STDERR message(s) from external script:

DLL ‘C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER1601\MSSQL\Binn\sqlsatellite.dll’ cannot be loaded.
Error in eval(expr, envir, enclos) :
DLL ‘C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER1601\MSSQL\Binn\sqlsatellite.dll’ cannot be loaded.
Calls: source -> withVisible -> eval -> eval -> .Call
Execution halted
STDOUT message(s) from external script:

Failed to load dll ‘C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER1601\MSSQL\Binn\sqlsatellite.dll’ with 182 error.

Click through to see how to resolve this issue.

Comments closed

Putting Measures On Rows In Power BI

Meagan Longoria shifts our perspective by 90 degrees:

Back in January 2016, I wrote a blog post explaining a DAX workaround that allows you to put measures on rows in a matrix in a Power BI report. I’m happy to say that you no longer need my workaround because you can now natively put measures on rows in a matrix in both Power BI Desktop and PowerBI.com.

This is accomplished via a new formatting option for the matrix.

Click through to see how to pull this off.

Comments closed

Date Conversions In Oracle And SQL Server

Daniel Janik compares Oracle and SQL Server date conversion functions:

There are many ways to create a date from a string. First you’ve got the CONVERT() function and the CAST() function. Next you’ve got DATEFROMPARTS(), DATETIMEFROMPARTS(), DATETIME2FROMPARTS(), SMALLDATETIMEFROMPARTS(), TIMEFROMPARTS(), and DATETIMEOFFSETFROMPARTS().

That’s a lot of functions for one simple task isn’t it? To be fair, it’s really more than 1 simple task. Each of these functions is meant to be paired with the matching data type so you get just what you want. To go along with these you also have the ISDATE() function which tests the value to be sure it’s a date.

I never liked the verbosity of the Oracle TO_DATE() function…but I am biased.

Comments closed

Analyzing Twitter Data With Storm In HDInsight

Nischal S shows how to configure an HDInsight cluster to process tweets, followed by loading them into a Power BI dashboard:

When we need to process streams of real-time data, Storm is a great contender. Examples of streaming data are the number of consumer clicks and navigations on a website, IIS or user logs, IoT data, and social network information. In all these scenarios, we use real-time data processing. Apache Storm can process real-time unbounded streams of data.

The term “unbounded” defines streams of data with no start or end. Here, the processing of data is continuous and in real-time. Twitter is a good example. Twitter data is continuous, has no start or end time, and is provided in real-time by millions of Twitter users around the world.

Storm wouldn’t rank in my top three technologies for doing this, but it certainly does the job.

Comments closed

Diamond: Solving Generalized Linear Models Using Python

Tim Sweester and Aaron Bradley announce Diamond, a Python library which solves certain kinds of generalized linear models.  In a two-part series, they explain more.  Part 1 covers the mathematical principles behind it:

Many computational problems in data science and statistics can be cast as convex problems. There are many advantages to doing so:

  • Convex problems have a unique global solution, i.e. there is one best answer
  • There are well-known, efficient, and reliable algorithms for finding it

One ubiquitous example of a convex problem in data science is finding the coefficients of an L2L2-regularized logistic regression model using maximum likelihood. In this post, we’ll talk about some basic algorithms for convex optimization, and discuss our attempts to make them scale up to the size of our models. Unlike many applications, the “scale” challenge we faced was not the number of observations, but the number of features in our datasets. First, let’s review the model we want to fit.

Part 2 looks at one interesting use case:

In this example, GLMMs allow you to pool information across different brands, while still learning individual effects for each brand. It breaks the problem into sets of fixed and random effects. The fixed effects are similar to what you would find in a traditional logistic regression model, while the random effects allow the regression relationship to vary for each brand. One of the advantages of GLMMs is that they learn how different brands are from each other. Brands that are very similar to the overall average will have small random effect estimates. Because of the regularization of these models, brands with few observations will also have small random effect estimates, and be treated more like the overall average. In contrast, for brands that are very different from the average, with lots of data to support that, GLMMs will learn large random effect estimates.

Check it out.  Part 2 also contains a link to the GitHub repo if you want to try it on your own.

Comments closed

Generating U-SQL Extract Scripts From Visual Studio

Yanan Cai shows a GUI for creating U-SQL EXTRACT scripts via Azure Data Lake Tools for Visual Studio:

One of U-SQL’s core capabilities is to be able to schematize unstructured data on the fly without having to create a metadata object for it. This capability is provided by the EXTRACT expression that will invoke either a user-defined extractor or built-in extractor to process the input file or set of files specified in the FROM clause and produces a rowset whose schema is specified in the EXTRACT clause.

While using the build-in extractor to schema semi-structured data, like data in .csv file, the schema definition in U-SQL is slow and error prone, especially for the .csv file contains hundreds of columns.

Recently, we released a new feature in the latest version of Azure Data Lake Tools for Visual Studio to help you generate this U-SQL EXTRACT statement automatically.

Click through for an example as well as a video showing the process.

Comments closed

Contributing To Open Source: Understanding GitHub

Andy Levy has a great guide showing how to pull the dbatools repo from GitHub:

I’m putting this together here for my own reference and to hopefully write it up in a way that helps things “click” for some people who need that extra nudge to get into “aha!” territory. A number of the examples I’ve seen elsewhere have mixed the command-line and GUI clients, but the more I use git GUIs, the less I like them for the basic workflow. You only need to know a handful of commands to be productive and for that, the command line beats the GUI in my opinion.

So, here we go. My GitHub workflow for working on dbatools, with as much command-line work as possible. This walk-through assumes basic familiarity with source control concepts.

This is a great guide for people who are not familiar with Git.

Comments closed