2016-09-09 – Curated SQL

Counting Without Counts

Published 2016-09-09 by Kevin Feasel

Ewald Cress discusses the breferences member:

I’ll spare you my false starts, but I think I finally have it. The first observation is that, on the occasions breferences increments, it does not increment linearly, but instead has an exponential growth pattern. These increments take it through the sequence 0, 1, 3, 7, 15, 31, 127, 255 etc. Or in binary: 0, 1, 11, 111, 1111, 11111, 111111, 1111111, 11111111…

Those numbers can be seen as off-by-one variations of powers of two. Forget the offset, and think of the number as simply doubling on each increment if it keeps your head clearer – instead of accuracy, we have a order-of-magnitude reference count.

I’d never heard of an algorithm like this, although that could be due to my having dealt with relatively little low-level structural code. I’m glad Ewald sussed out the mechanics driving breferences.

Comments closed

Immutable Servers

Published 2016-09-09 by Kevin Feasel

Diana Tkachenko describes a pattern for reducing “prod doesn’t look like stage” types of errors:

Immutable server pattern makes use of disposable components for everything that makes up an application that is not data. This means that once the application is deployed, nothing changes on the server – no scripts are run on it, no configuration is done on it. The packaged code and any deploy scripts is essentially baked into the server. No outside process is able to modify the contents after the server has been deployed. For example, if you were using Docker containers to deploy your code, everything the application needs would be in the Docker image, which you then use to create and run a container. You cannot modify the image once it’s been created, and if any changes do need to take place, you would create a new image and work with that one.

In our case, we use AWS Amazon Machine Images (AMIs) to accomplish the same thing. We make heavy use of Amazon Linux machines, which are Redhat-based, and thus package the code into RPMs^[2]. The RPMs define all the dependencies for running the application, the code itself, and any startup scripts to run on bootup. The RPM is then installed on a clean base image of Amazon Linux, and an image is taken, resulting in an AMI. This AMI is synonymous with “immutable server” – it cannot be changed once it is created. The AMI is then deployed into an Auto Scaling Group(ASG) and attached to the Elastic Load Balancer (ELB). In this post, I’ll guide you through for a closer look at every step of this Immutable Server deploy pipeline. I’ll then go into how and why we embedded planned failures into this system. At the end, I’ll share the insights we’ve gained into the pros and cons of deploying in this way.

This is a very interesting concept. I’ve heard of no-patch servers (where, instead of patching live servers, you spin up a new VM with the operating system updates and spin down the old one), but this takes the idea one step further.

Comments closed

Data Masking And Row-Level Filtering In Hadoop

Published 2016-09-09 by Kevin Feasel

Syed Mahmood and Srikanth Venkat discuss two security features in Apache Ranger:

Dynamic data masking via Apache Ranger enables security administrators to ensure that only authorized users can see the data they are permitted to see, while for other users or groups the same data is masked or anonymized to protect sensitive content. The process of dynamic data masking does not physically alter the data, or make a copy of it. The original sensitive data also does not leave the data store, but rather the data is obfuscated when presenting to the user. Apache Ranger 0.6 included with HDP 2.5, introduces a new type of authorization policy called “Masking Policy” that can used to define which specific data fields are masked and what are the rules for how to anonymization or pseudonymize the specific data. For example, a security administrator may choose to mask credit card numbers when displayed to customer service personnel, such that only last four digits are rendered in the form of XXXX-XXXX-XXXX-0123. The same would be true of sensitive data such as social security numbers or email addresses that are masked to be rendered in a different formats based on data masking rules.

This is part one of a two-part series; part two will dig into the technical details. I have to wonder if Ranger’s dynamic data masking is as easy to circumvent as SQL Server’s.

Comments closed

Precision, Significance, and Accuracy

Published 2016-09-09 by Kevin Feasel

Emmanuelle Rieuf shares an image from Kirk Borne:

Comments closed

Collecting Registry Data

Published 2016-09-09 by Kevin Feasel

Slava Murygin shows how to use Powershell to read the Registry:

Do you use third party tools to document state of your SQL Server?
If not, that script is for you!

At first, you will know what is your SQL Server is up to.
At second, that might be your baseline document, to which you can compare a current SQL Server state over the time.
At third, that is a priceless piece of documentation!!! (I mean FREE!!!) which you can put in a folder and report to your boss.

Registry settings are a good part of a baseline, particularly a security baseline.

Comments closed

Finding Destinations In SSISDB

Published 2016-09-09 by Kevin Feasel

Bill Fellows has a script to figure out the name of that table throwing errors upon insertion:

There is a rich set of tables and views available in the SSISDB that operate as a flight recorder for SSIS packages as they execute. Markus Ehrenmüller (t) had a great question in Slack. In short, can you figure out what table is being used as a destination and I took a few minutes to slice through the tables to see if I could find it.

If it’s going to be anywhere, it looks like you can find it in catalog.event_message_context

If someone is using an OLE DB Destination and uses “Table or view” or “Table or View – fast load” settings, the name of the table will be the event message_context table. If they are using a variable name, then it’s going to be trickier.

Read on for the script.

Comments closed

OR Filters With Excel Cube Functions

Published 2016-09-09 by Kevin Feasel

Chris Webb shows how to build an OR filter in Power Pivot:

Now, imagine that you want a report with the Sales Amount measure on columns and Years on rows, and you want to filter the data so that you only see values for Mondays in July or Wednesdays in September. Using the Fields, Items and Sets functionality you could filter the data to only show the day/month combinations you need for each year, but since you can’t put a named set into the Filter area of a PivotTable you would have to use Excel formulas to sum up the combinations to get the totals you need

I’ve never used Excel’s Cube Functions. Looks like something to add to the to-learn pile.

Comments closed

ExecuteSQL In Biml

Published 2016-09-09 by Kevin Feasel

Andy Leonard shows how to create an Integration Services ExecuteSQL task using Biml:

You do not yet have an SSIS package. You have the Biml template of a package. I can hear you thinking, “So how does this become an SSIS package, Andy?” I’m glad you asked!

We now need to execute the BimlScript and then build the output.

Follow along with the images.

Comments closed

Plan Explorer Pro Is Now Free

Published 2016-09-09 by Kevin Feasel

Greg Gonzalez with the announcement of the week:

Which brings me to today, one of the most exciting days in the history of SQL Sentry. I would like to introduce to you the new unified Plan Explorer v3! This edition includes all features of Plan Explorer PRO, plus two new and incredibly powerful features: Index Analysis and Performance Profiling.

These new features were originally going to be part of yet a 3rd edition called Plan Explorer ULTIMATE, and at an even higher price point. However, we ultimately came to our senses and realized that this would be counter to the mission, and that the only way ahead was to make all features of Plan Explorer freely available to everyone.

Andy Mallon goes in-depth with two of the new features:

If you’ve used Live Query Profile in Plan Explorer (or Live Query Statistics in SSMS), then you’re familiar with seeing your query plan play live as it executes. Performance Profiling adds some special sauce to that feature. Its like a DVR for the live query profile, and also displays the performance metrics as a graph below the plan diagram. The graph gives performance metrics by time, rather than by operator. In complex plans, you might have many operators doing work at the same time, and you’ll be able to see that better by looking at the graphs. Adding a time axis to the plan is a pretty ingenious way to give a new way to look at performance.

You can then replay the “DVRed” query, and re-watch the profile without re-running the query. If you save the plan as a .pesession file, the DVR’ed query with performance profile is included in that file. You can even change the speed to watch in slow-mo or fast-forward.

If you don’t already use Plan Explorer, you’re missing out on the single best query tuning add-in.

Comments closed

Constraints Or Unique Indexes

Published 2016-09-09 by Kevin Feasel

Kendra Little discusses unique constraints versus unique indexes:

Unique constraints and unique nonclustered indexes have a lot in common: unique constraints are implemented with a unique index behind the scenes.

While unique nonclustered indexes have a couple of additional features, I can still think of a scenario where it might be best to use unique constraints.

The value I see in unique constraints over unique indexes is semantic: it says to everyone, “I’m supposed to be unique by virtue of my existence.” They’re part of the data model, whereas indexes are performance optimizations to handle common data paths.

Comments closed

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30

Day: September 9, 2016