Press "Enter" to skip to content

Month: May 2017

Storm In .Net

Ravi Peri explains how to use Apache Storm in .NET code on HDInsight:

Topology submissions can fail due to many reasons:

  • JDK is not installed or is not in the Path
  • Required java dependencies are not included
  • Incompatible java jar dependencies. Example: Storm-eventhub-spouts-9.jar is incompatible with Storm 1.0.1. If you submit a jar with that dependency, topolopgy submission will fail.
  • Duplicate names for topologies

/var/log/hdinsight-scpwebapi/hdinsight-scpwebapi.out file on active headnode will contain the error details.

At one point, I was big on Storm and really wanted a .NET client for Storm to take off.  Nowadays, I’d rather use Spark Streaming or Kafka Streams for the same kind of streaming data work.

Comments closed

Partitioning Nullable Columns

Kenneth Fisher looks at what happens when you use a nullable column as a partition key:

So to start with how does partitioning handle a NULL? If you look in the BOL for the CREATE PARTITION FUNCTION you’ll see the following:

Any rows whose partitioning column has null values are placed in the left-most partition unless NULL is specified as a boundary value and RIGHT is indicated. In this case, the left-most partition is an empty partition, and NULL values are placed in the following partition.

So basically NULLs are going to end up in the left most partition(#1) unless you specifically make a partition for NULL and are using a RIGHT partition. So let’s start with a quick example of where NULL values are going to end up in a partitioned table (a simple version).

Click through to see Kenneth’s proof and the repercussions of making that partitioning column nullable.

Comments closed

Power BI: Calculated Measures + SSAS Tabular

Shabnam Watson notes that the May updates to Power BI Desktop allow you to create new calculated measures on a report which connects live to a tabular model:

Ideally the SSAS database has all the measures you need but now you have the capability to add new ones if you need to.

You can control the folder (table/measure group) under which the new measure shows up by using the “Home Table” option from the Modeling tab. I really like this feature as you can create copies of the same calculation and send them to different folders for ease of use.

If you’re interested in getting this added to Multidimensional as well, there is a request you can vote on.

Comments closed

Golang And SQL Server

Mat Hayward-Hill gives us another language to think about:

Right now I spend most of my time in Management Studio writing TSQL. And I use PowerShell whenever I need to do something on more than one machine at a time. But now Microsoft is embracing open source should I be thinking the same and learn a new language which isn’t so Microsoft-centric.

After talking to some experts, I narrowed the choice down to two; Python and Go (also referred to as Golang). I picked Golang as it’s relatively new (open sourced in 2009 but for a language is leading-edge, whereas Python dates back to the late 1980s); nothing more complicated than that as this project is just for fun!

I’d see this as more of a “Cool, I can do this now” type of language rather than a “Hey, drop what you’re doing and learn this!” language.  That may change over the next few years.

Comments closed

Installing Anaconda

Max Trinidad shows how to install Anaconda in Windows and Ubuntu:

Installing Anaconda in Windows

In Windows the installation is simply done through the SQL Server 2017 setup process. During the SQL Server installation process, select the “Machine Learning Services (In-Database)” option and this will automatically install both “R” and *”Anaconda” on your system.

Max also shows some Python with SQL Server 2017 basics.

Comments closed

The Basics Of Always Encrypted

Josip Saban has an article on Always Encrypted in SQL Server 2016:

Always Encrypted is a client-side encryption technology that Microsoft introduced with SQL Server 2016. Always Encrypted keeps data automatically encrypted, not only when it is written, but also when it is read by an approved application. Unlike Transparent Data Encryption, which encrypts the data and log files on disk in real time but allows the data to be read by any application that queries the data, Always Encrypted requires your client application to use an Always Encrypted-enabled driver to communicate with the database. By using this driver, the application securely transfers encrypted data to the database that can then be decrypted later only by an application that has access to the encryption key. Any other application querying the data can also retrieve the encrypted values, but that application cannot use the data without the encryption key, thereby rendering the data useless. Because of this encryption architecture, the SQL Server instance never sees the unencrypted version of the data.

At this time, the only Always Encrypted-enabled drivers are the .NET Framework Data Provider for SQL Server, which requires installation of .NET Framework version 4.6 on the client computer, and the JDBC 6.0 driver. That will probably change in time, but these are the official Always Encrypted requirements as of April 2017.

This is a good intro to the topic if you aren’t familiar and are thinking of migrating to SQL Server 2016 or later.

Comments closed

Schema Modification Locks With CC Compliance

Lori Brown explains why you might see schema modification locks after enabling Common Criteria compliance:

We have a client who has no idea how or when Common Criteria was enabled on their production system. All they know is that performance has been slowly degrading. After collecting performance data, we found that there were high LCK_M_SCH_M waits which is a schema modification lock that prevents access to a table while a DDL operation occurs. We also found blocked process records where a LOGIN_STATS table in the master database was waiting a lot. This table is used to hold login statistics. When there are a lot of logins and outs there can be contention in this table.

When you enable Common Criteria compliance, something called Residual Information Protection (RIP) is enabled. RIP is an additional security measure for memory and it makes it so that in memory a specific bit pattern must be present before memory can be reallocated(overwritten) to a new resource or login. So with lots of logins and outs, there is a performance hit in memory because overwriting the memory allocation has to be done.

It’s worth reading the whole thing.

Comments closed

Stop Using Domain Admin!

Sean McCown has had it with people using Domain Admin accounts as service accounts:

If you paid close attention, you’ll notice the ‘DomainAdmin’ portion of that name.  Yep, you got it right… they were running SSRS under the domain admin account.  The Windows guy thought that it would be too much trouble to manage the permissions and get everything right on all the shares and DBs that it needed to access.

So this is when I pretty much lost it.  These guys were running  SSRS under a domain admin account because they were too lazy to do the right thing.  It’s unthinkable.  There may be some reasonable excuses why you’re not able to change your current security model to something better.  You may even be able to convince me that you’re not just being lazy.  But to actively be lazy about your security isn’t something I’m going to take lying down.  Hey, I know it’s your shop, and I know you can ultimately do whatever you like, but I’m going to make sure you know what you’re doing.

Your SQL Server (and related) service accounts should not be Domain Admin.  Period.  This isn’t one of those “Well, it depends…” types of scenarios; there is no reason ever to use an account with Domain Admin rights as a SQL Server service account, and it is security malpractice to do so.

Comments closed

Interleaved Execution

Arun Sirpal looks at how Interleaved Execution affects table cardinality estimates with multi-statement table-valued functions in SQL Server 2017:

Joe states in the article “MSTVFs have a fixed cardinality guess of “100” in SQL Server 2014 and SQL Server 2016, and “1” for earlier versions. Interleaved execution will help workload performance issues that are due to these fixed cardinality estimates associated with multi-statement table valued functions.”

This is exactly what I saw where the below is just a basic screen shot of 1 of many tests that I carried out.

Read the whole thing for more details.

Comments closed

Deeper Into Adaptive Join Optimization

Erik Darling has a couple blog posts getting deeper into Adaptive Join Optimizations in SQL Server 2017.  First, Erik discusses the basics:

You see, in every plan, you see both possible paths the optimizer could have chosen. Right now it’s only limited to two choices, Nested Loops and Hash Joins.

Just guessing that Merge Joins weren’t added because there would have been additional considerations around the potential cost of a Sort operation to get the data in order.

Be sure to read Brent’s comment that in the initial release, it will just support columnstore indexes.  Then, Erik talks about execution plan details:

Some points of interest:

  • Actual Join Type: doesn’t tell you whether it chose Hash or Nested Loops
  • Estimated Join Type: Probably does
  • Adaptive Threshold Rows: If the number of rows crosses this boundary, Join choice will change. Over will be Hash, under will be Nested Loops.

The rest is fairly self-explanatory and is the usual stuff in query plans.

Good stuff here.

Comments closed