New JDBC Driver

Kevin Feasel

2016-07-18

Hadoop

Microsoft has released a new version of their SQL Server JDBC driver:

Table-Valued Parameters (TVPs)

TVP support allows a client application to send parameterized data to the server more efficiently by sending multiple rows to the server with a single call. You can use the JDBC Driver 6.0 to encapsulate rows of data in a client application and send the data to the server in a single parameterized command.

There are a couple of interesting features in this driver which could help your Hadoop cluster play nice with SQL Server.

Limitations Of Statistics

Erik Darling discusses statistics and limitations of statistics outside of indexes:

Head rush moment: SQL may use information from histograms outside of the index it chooses for cardinality estimation.

Back to earth: If you just create a bunch of statistics instead of indexes, you’re (at best) using your Clustered Index for everything (which is still bad), or you’re using a HEAP for everything (which is usually worse). You’re still generally better off creating good indexes for your workload. They’ll get statistics objects created and associated with them, and if SQL thinks another column is interesting, it will create a single column statistics object for it, as long as you haven’t turned off auto create stats.

Sure, you can put on your black cloak and goat mask and create some multi-column or filtered statistics, but in the words of a wise man (Doug), you end up with more stats to maintain and understanding query behavior gets more difficult.

Erik argues that at the margin, keeping existing statistics up to date tends to be a better solution than creating new statistics.

Biml Object Model

Kevin Feasel

2016-07-18

Biml

Bill Fellows begins a series on exploring the Biml Object Model:

The most important precursor to doing this is you will need a licensed installation of Mist. Full stop. The assemblies we’re going to use have security built into them to tell whether they are licensed and you cannot use the assemblies shipped with BidsHelper or BimlExpress as they’ve hardwired to the specific apps.

We’re going to use two classes: AstRootNode and AstPackageNode.

Ast, what is that? Abstract Syntax Tree – it’s a compiler theory thing.

I’m interested in seeing where this series goes.

Tabular Caching

Bill Anton discusses using the Formula Engine cache with Analysis Services tabular models:

If tabular data is already in memory, what’s the point of having a cache at all? Memory is memory, right? Both are in main memory and access speed is the same, right?

Good question! Yes, access speed is the same. However, there are other benefits to a cache store.

For example, even though the data is already in memory, queries against the tabular model can still be slow… very slow even… usually in cases where the execution of the query is bound to the single threaded formula engine. To be sure, this is not a tabular specific problem… formula engine bound queries can be (and are commonly) found in both technologies and the issue (usually) speaks more to the deign of the model and/or the way the query is written (be that DAX for tabular or MDX in multidimensional). That said, performance problems related to FE-bound queries can be terribly difficult to resolve as it usually requires a redesign of the data model and rewrite of the query or measure(s) involved.

Bill points out the limitations of this solution, but within those limitations this looks like it could be a huge time-saver for end users.

Scaling Kafka Streams

Kevin Feasel

2016-07-15

Hadoop

Michael Noll discusses elastic scaling of Kafka Streams:

Third, how many instances can or should you run for your application?  Is there an upper limit for the number of instances and, similarly, for the parallelism of your application?  In a nutshell, the parallelism of a Kafka Streams application — similar to the parallelism of Kafka — is primarily determined by the number of partitions of the input topic(s) from which your application is reading. For example, if your application reads from a single topic that has 10 partitions, then you can run up to 10 instances of your applications (note that you can run further instances but these will be idle).  In summary, the number of topic partitions is the upper limit for the parallelism of your Kafka Streams application and thus for the number of running instances of your application.  Note: A scaling/parallelism caveat here is that the balance of the processing work between application instances depends on how well data messages are balanced between partitions.

Check it out.  Kafka Streams is a potential alternative to Spark Streaming and Storm for real-time (for some definition of “real-time”) distributed computing.

Projections And Confidence Intervals

Mirio De Rosa explains confidence intervals and sampling issues, using polls as an example:

Weighting is used to make sure samples reproduce the underlying characteristics of the population they are drawn from. For instance, in the UK 51.6% of voters are women, of these 22.5% are above 65 years of age, 23.1% have a higher education and so on. The people selected to make part of a sample may be recruited to ensure they match these proportions.

Within the context of weighting there are two major sampling procedures: Quota and stratified sampling. The relevant difference between them is the latter uses some sort of randomization device while with the former the interviewer decides whether or not to interview a person with certain characteristics. YouGov presumably applied Quota Sampling for the Brexit survey[4], and the gender, age and education weights they applied are shown in the following image.

Read the whole thing.

Single Slices And Chart Slicers

Matt Allington has a post up on improving your Power BI user experience:

The example on the right adds more value over the one on the left.  The example on the right uses a column chart instead of a slicer.  The benefit of this is you can communicate more information to the user than you can with the static slicer.  In this case I am displaying the total value of each of the scenarios which means it is easy to see the relative difference between them.  And this all happens while still providing slicer capabilities thanks to the powerful cross filter feature in Power BI.  This is what I mean by adding value by thinking outside your old paradigms.

Definitely read the comments on this one; they are full of great questions and suggestions.

Auditing Power Plan Settings

Drew Furgiuele writes a bit of Powershell to control power plan settings and expands upon this one-liner:

It’s a classic one-liner, but if you’re not used to reading it I’ll break it down for you. First, we useGet-ChildItem  to return a list of registered servers in our central management server (named PRO-CM-SQL in my example). This returns a series of objects that lists all the registered names on the central management server in each directory, so we need to filter out the directory names with a Where-Object  (objects that don’t have a “mode” value of “d” for directory). Once we have our list, we just select the names ( Select-Object ). Then we pipe the list of names over to a ForEach-Object  and execute the script each time. Finally, we tack on a Export-CSV  cmdlet to output the results to an easy to read file we can open in Excel (or notepad, or whatever).

Our script also doesn’t control output, so you leave that up to the user. They can put it on the screen or pipe it to a file. And that’s an important style point: never put your users on rails. You may like output one way, but someone else may not. Just think of the next person.Because some day you might be that next person.

This is a good post if you need to figure out how to find your servers’ current power settings, but a great post if you want to think about how to write helpful Powershell scripts.

Power BI Costs

Adam Saxton looks at pricing in Power BI:

You can get a quick look at the differences between free and Pro by looking at the Power BI Pricing page. You can also look at the official documentation for what is Pro content.

So, that’s it! Right?

No so fast! There are other factors that may come into play, or you may be wondering about. You may think they are part of Power BI, but they may be separate.

What I like about this post is that Adam goes into detail on some of the other potential costs involved aside from product licensing.

Missing Query Plans

Jack Li explains when you might not get a result back for query_plan in sys.dm_exec_cached_plans:

When you first execute a batch or stored procedure which may contain multiple statements.   Not every statement will be compiled when you first use the procedure.   If there are some code paths that result in the statement not being executed, SQL Server may choose not to compile that statement in a small set of scenarios. In other words, some statements will only be compiled upon first execution.  So far, I have found the following two scenarios that a statement will not be compiled (deferred compilation) if code path result in that statement being skipped.

  1. Statement involving temp table.   Data could have been changed when the statement is actually executed.  So it doesn’t make sense to compile right off the beginning.

  2. Statement that has option (recompile) attached to it   We need to recompile that statement every time anyway.  If we don’t use, it why compile in advance?

Read the whole thing.

Categories

February 2019
MTWTFSS
« Jan  
 123
45678910
11121314151617
18192021222324
25262728