Kevin Feasel – Page 1474

Trailing Space Behavior

Published 2017-04-14 by Kevin Feasel

Brent Ozar shows how SQL Server deals with trailing spaces in columns:

The mind-blowing part to me is the <> operator – that seemed utterly crazy to me.

And if you add another table, and join to it, guess what happens:

The answer might be a bit surprising.

Comments closed

BimlExpress Keyboard Shortcuts

Published 2017-04-14 by Kevin Feasel

Cathrine Wilhelmsen shows how to create keyboard shortcuts for the BimlExpress extension for Visual Studio:

Are you tired of right-clicking on your Biml files to Check Biml for Errors or to Generate SSIS Packages? Did you know that you can create your own BimlExpress Keyboard Shortcuts? 🙂

Read on to see how to do this.

Comments closed

Cardinality Estimation On COUNT(*)

Published 2017-04-13 by Kevin Feasel

Paul White digs into how the cardinality estimator works with COUNT aggregations containing HAVING clauses:

The approach SQL Server takes is to assume that each group is most likely to contain the overall mean (average) number of rows. This is simply the cardinality divided by the number of unique values. For example, for 1000 rows with 20 unique values, SQL Server would assume that (1000 / 20) = 50 rows per group is the most likely value.

Turning back to our original example, this means that the computed count column is “most likely” to contain a value around (19614 / 575) ~= 34.1113. Since density is the reciprocal of the number of unique values, we can also express that as cardinality * density = (19614 * 0.00173913), giving a very similar result.

Definitely worth a careful read.

Comments closed

The Basics Of SparkR

Published 2017-04-13 by Kevin Feasel

Yanbo Liang has an introductory article on what SparkR is and why you might want to use it:

However, data analysis using R is limited by the amount of memory available on a single machine and further as R is single threaded it is often impractical to use R on large datasets. To address R’s scalability issue, the Spark community developed SparkR package which is based on a distributed data frame that enables structured data processing with a syntax familiar to R users. Spark provides distributed processing engine, data source, off-memory data structures. R provides a dynamic environment, interactivity, packages, visualization. SparkR combines the advantages of both Spark and R.

In the following section, we will illustrate how to integrate SparkR with R to solve some typical data science problems from a traditional R users’ perspective.

This is a fairly introductory article, but gives an idea of what SparkR can accomplish.

Comments closed

Be Wary Of Case Sensitivity

Published 2017-04-13 by Kevin Feasel

Slava Murygin points out that for SQL Server on Linux, case sensitivity matters a lot:

So, in that manner, SQL Server files will be located in “c:\var\opt\mssql\” directory!
Isn’t it simple?!!!
There will be no effort at all to refurbish old SQL SQL Server code for new Linux platforms!

Here I’ll give you an example of Database creation using different addressing methods:
1. I use Windows notation for data file and Linux notation for log file.
2. Because Linux systems are case sensitive it is very important to know it when you work with Linux. However, Microsoft allows you to be reluctant. You can use upper or lower case in the order you want.

Click through for an example.

Comments closed

Using h2o.ai On HDInsight

Published 2017-04-13 by Kevin Feasel

Xiaoyong Zhu shows how to set up h2o.ai on Azure HDInsight:

H2O Flow is an interactive web-based computational user interface where you can combine code execution, text, mathematics, plots and rich media into a single document, much like Jupyter Notebooks. With H2O Flow, you can capture, rerun, annotate, present, and share your workflow. H2O Flow allows you to use H2O interactively to import files, build models, and iteratively improve them. Based on your models, you can make predictions and add rich text to create vignettes of your work – all within Flow’s browser-based environment. In this blog, we will only focus on its visualization part.

H2O FLOW web service lives in the Spark driver and is routed through the HDInsight gateway, so it can only be accessed when the spark application/Notebook is running

You can click the available link in the Jupyter Notebook, or you can directly access this URL:

https://yourclustername-h2o.apps.azurehdinsight.net/flow/index.html

Setup is pretty easy.

Comments closed

Embedded Solr With Scala

Published 2017-04-13 by Kevin Feasel

Anurag Srivastava shows how to use Embedded Solr using an example written in Scala:

Embedded Solr has the same interface as Solr without requiring an HTTP connection. When we “embed” Solr into a Java an application, it provides the exact same API that you would use if you were connecting to a remote Solr instance. We can use embedded Solr for in-memory testing because when we implement test cases, it should not depend on any external resources.

Read on for the code sample.

Comments closed

Optimizing For Ad Hoc Workloads

Published 2017-04-13 by Kevin Feasel

Kendra Little has soured a bit on the Optimize for Adhoc Workloads setting:

Once upon a time, I was really excited about getting this configuration item in SQL Server 2008. Early versions of SQL Server 2005 weren’t all that great at managing the size of the execution plan cache: it could really balloon up and eat away at the buffer pool. But the SQL Server team did a good job at tuning those algorithms in later service packs for 2005 and future versions, and it became much less of an issue.

Personally, I’ve never had a case where enabling ‘Optimize for Adhoc Workloads’ improved performance in a way that I could measure. It may save you a small amount of memory, it may not.

I don’t mean this as a big insult. Trying to save a penny every time you go to the grocery store could add up, if you grocery shop very frequently. But hopefully that’s not one of your major revenue sources over time.

It’s an interesting counter-argument and worth reading.

Comments closed

SQL Server Backup To Azure Tool Causing Restore Errors

Published 2017-04-13 by Kevin Feasel

Jack Li diagnoses an issue in which the Microsoft SQL Server Backup to Microsoft Azure Tool causes errors when trying to restore a database on an Azure VM with SQL Server 2008 R2:

I worked on an interesting issue today where a user couldn’t restore a backup. Here is what this customer did:

backed up a database from an on-premises server (2008 R2)

copied the file to an Azure VM

tried to restore the backup on the Azure VM (2008 R2 with exact same build#)

But he got the following error:

Msg 3241, Level 16, State 0, Line 4
The media family on device ‘c:\temp\test.bak’ is incorrectly formed. SQL Server cannot process this media family.
Msg 3013, Level 16, State 1, Line 4
RESTORE HEADERONLY is terminating abnormally.

We verified that he could restore the same backup on the local machine (on-premises). Initially I thought the file must have been corrupt during transferring. We used different method to transfer file and zipped the file. The behavior is the same. When we backed up a database from the same Azure VM and tried to restore, it was successful.

Click through for Jack’s findings as well as a couple workarounds.

Comments closed

On-Prem Power BI Gateway

Published 2017-04-13 by Kevin Feasel

Steve Hughes shows how to set up a data gateway for Power BI:

First, I will not be discussing the personal gateway in this post. If you have chosen to use the personal gateway, you have limited functionality and should consider using the on-premises data gateway for corporate use.

The on-premises data gateway (referred to as gateway throughout this post) “acts as a bridge, providing quick and secure data transfer between on-premises data and the Power BI, Microsoft Flow, Logic Apps, and PowerApps services.” (ref) Much of what is discussed here will apply to all of the services referenced above, but our primary concern is related to Power BI. Please refer to references at the end of this post for details about data sources supported within the gateway.

Click through for more information.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Author: Kevin Feasel