Author: Kevin Feasel

Biml Relational Hierarchy

Published 2016-09-07 by Kevin Feasel

Andy Leonard introduces the Biml Relational Hierarchy:

There are two ways to populate the relational hierarchy:

Graphically

Programmatically

In this post, I am going to cover graphically populating the relational hierarchy. Scott Currie [Varigence] provided a remarkable example of programmatically populating the relational hierarchy in his BimlScript.com article Biml Basics for Relational DBs.

Populating the relational hierarchy graphically is simple; just follow the steps as they are listed from top to bottom in the Logical View’s Relational node in the Mist IDE. Begin with Connections.

Following along does require a copy of Mist/BimlStudio but there’s a free trial. If your company gets seriously into Biml, Mist is a great purchase.

Comments closed

Azure SQL Database Supports JSON

Published 2016-09-07 by Kevin Feasel

Jovan Popvic reports that Azure SQL Database now has full JSON support:

JSON is available in all service tiers (basic, standard, and premium) but only in new SQL Database V12. You can see quick introduction here or more details in Getting Started page. you can also find code samples that JSON functions in Azure Sql Database on official Sql Server/Azure Sql Database GitHub repository.

Note that OPENJSON function requires database compatibility level 130. If all functions work except OPENJSON, you would need to set the latest compatibility level in database.

It will be interesting to see adoption of JSON within Azure SQL Database. I could see it being a bit more likely due to DocumentDB.

Comments closed

XE Default Sessions

Published 2016-09-07 by Kevin Feasel

Jason Brimhall goes into the various Extended Events default sessions:

With the release of SQL Server 2016 there is another default session that gets installed. This session is probably the most difficult to find. This is a private hidden session when it is installed. In addition, it is not present by default with SQL 2016 RTM but it was present by default with various CTP versions. The session is specific to the new stretch feature and is called rdaxesession.

Breaking down this particular session reveals pretty easily what feature the session is designed to support. The name rdaxesession breaks down to the following: rda = remote data archive, and then the rest is self explanatory. If I investigate the metadata for the session I will discover that the following events are traced in the session:

Jason also has an earlier blog post about the default sessions I tend to use, like system health.

Comments closed

Date Dimension With DAX

Published 2016-09-07 by Kevin Feasel

Meagan Longoria shows how to create a date dimension using only DAX:

The fiscal calendar assumes calendar months with the fiscal calendar shifted by some number of months. It also assumes that the fiscal year is ahead of the calendar year. That is, fiscal year 2017 starts in the first day of some month in 2016. In my example script, the fiscal year starts in October. If you have some 4-4-5, 4-5-4, or other calendar, the fiscal calendar calculations in this script won’t help you.

To add this date dimension to your SSAS Tabular project:

Right click on the Model.bim file in the Solution Explorer and choose View Code.
If you have at least one table in the model already, locate the end of the definition of the previous table, inside the table array. Add a comma after the end curly brace for the previous table.
Paste the JSON/TMSL from my Gist.
Save and close the file.
Right click on Model.Bim and choose View Designer.

Click through for the script and additional explanation.

Comments closed

Autocompleter For Hue

Published 2016-09-06 by Kevin Feasel

The Hue team shows off their new SQL editor’s autocomplete capabilities:

We’ve fine-tuned the live autocompletion for a better experience and we’ve introduced some options under the editor settings where you can turn off live autocompletion or disable the autocompleter altogether (if you’re adventurous). To access these settings open the editor and focus on the code area, press CTRL + , (or on Mac CMD + ,) and the settings will appear.

The autocompleter talks to the backend to get data for tables and databases etc. by default it will timeout after 5 seconds but once it has been fetched it’s cached for the next time around. The timeout can be adjusted in the Hue server configuration.

I haven’t used Hue in a while, but that’s a nice feature. Just don’t use ANSI-89 syntax like in that first example…

Comments closed

Deploying SSDT Scripts

Published 2016-09-06 by Kevin Feasel

Richie Lee has concerns with database deployments:

At any rate, the script is generated and maybe reviewed….. so then what? In SSDT there is no way to create and deploy script in one step; they are two distinct steps. And even if they were one step, this would still not resolve the issue that troubles me. So what is this issue?

The issue is that by creating a script, and then running the deploy, you cannot be sure that the database is in the exact same state that it was when the initial script was generated. If you don’t already know, SSDT runs a deploy entirely in memory, so as mentioned there is no script created. You have to explicitly create the script as part of the process. Or, if you have already created one, you have to re-create the script.

I’m on the fence here. In simpler environments, I think Richie has a good point. But in a complex environment, I wouldn’t even use auto-generated deployment scripts; when you’re changing hundreds of database objects (including adding and modifying columns, backfilling data, modifying indexes, etc.), that automated deployment script is almost guaranteed to fail. And if it does fail, it could leave you in a state of irreparable harm.

Comments closed

Regular Expressions Against Large Data Sets

Published 2016-09-06 by Kevin Feasel

Liz Bennett explains types of regular expressions which do not scale:

With recursive backtracking based regex engines, it is possible to craft regular expressions that match in exponential time with respect to the length of the input, whereas the Thompson NFA algorithm will always match in linear time. As the name would imply, the slower performance of the recursive backtracking algorithm is caused by the backtracking involved in processing input. This backtracking has serious consequences when working with regexes at a high scale because an inefficient regex can take orders of magnitude longer to match than an efficient regex. The standard regex engines in most modern languages, such as Java, Python, Perl, PHP, and JavaScript, use this recursive backtracking algorithm, so almost any modern solution involving regexes will be vulnerable to poorly performing regexes. Fortunately, though, in almost all cases, an inefficient regex can be optimized to be an efficient regex, potentially resulting in enormous savings in terms of CPU cycles.

There’s a significant performance difference, so if you work frequently with regular expressions, check this out.

Comments closed

Distributed File System Replication And Backups

Published 2016-09-06 by Kevin Feasel

James Anderson discusses an interesting setting within Distributed File System Replication:

Ideas of a cmd job step (after the backup step) that renamed the .bak files to .BTFU started to form, but a quick search showed that there is a default filter on DFSR folders.

Files starting with ~ (temporary files created by programs like Word)
Files with .tmp extension
Files with a .bak extension.

Read on to learn what you can do to remove extension filters within DFSR.

Comments closed

HBase Performance Tips

Published 2016-09-06 by Kevin Feasel

Ashish Thapliyal has nine tips for optimizing HBase performance:

Does your RowKey’s looks like 1,2,3…….. or 00000001, 00000002, 00000003, or do you have Row Key that starts with date-time (starting with the year)? If you answered yes, bad news is that HBase will not scale for you, you have so many options to improve the HBase performance but there is nothing that will compensate for the bad rowkey design.

When rowkey is in sorted order, all the writes go to the same region and other regions will sit ideal doing nothing. you will see one of your node is very stressed trying to cope up with all the writes where as other nodes are thanking you for not giving them enough work. So, always salt your keys by adding random numbers or characters to the row key prefix.

If you are using Phoenix on top of HBase, Phoenix provides a way to transparently salt the row key with a salting byte for a particular table. You need to specify this in table creation time by specifying a table property “SALT_BUCKETS” typical practice is to set the value of SALT_BUCKET =number of region server

I think the biggest one is to design your data structures correctly. This is particularly important if you’re coming at it from a relational background and are thinking in terms of what makes relational databases fast.

Comments closed

Pearson’s Correlation Coefficient

Published 2016-09-06 by Kevin Feasel

Mala Mahadevan explains correlation coefficients:

The statistical definition of Pearson’s R Coefficient, as it is called, can be found in detail here for those interested. A value of 1 indicates that there is a strong positive correlation(the two variables in question increase together), 0 indicates no correlation between them, and -1 indicates a strong negative correlation (the two variables decrease together). But you rarely get a perfect -1, 0 or 1. Most values are fractional and interpreted as follows:
High correlation: .5 to 1.0 or -0.5 to 1.0.
Medium correlation: .3 to .5 or -0.3 to .5.
Low correlation: .1 to .3 or -0.1 to -0.3.

Mala includes R and T-SQL code so you can follow along.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31