Press "Enter" to skip to content

Author: Kevin Feasel

How To Handle A Data Breach

Ben Davis has some advice on handling data breaches:

1. Work backwards
Jot down when you were first notified of the breach and start to retrace the events that led to you being notified. This may mean investigating logs on databases, firewalls, routers and everything else in between. It’s a massive job to sift through all of the log data trying to find that one event that led you to receive that dreaded call. Fortunately, the data analytics field has been used in recent years to speed up the investigative work by pulling together multiple log files and analyzing it for anomalies.

As a technical guy, the “hire a PR firm” part did not come to mind.

Comments closed

Query Store Or Plan Guide?

Grant Fritchey answers a big question:

While presenting at SQLDay in Wroclaw, Poland, on the Query Store, I was asked a pretty simple question, which takes precedence, the Query Store or a Plan Guide?

One of my favorite answers to questions is “I don’t know” because it gives me the opportunity to learn. Let’s figure this one out together.

I’ll post the code to recreate this experiment within AdventureWorks at the end of the article. I’m doing this because the code for forcing execution plans using Plan Guides can be pretty doggone long (you may need to generate your own XML from a plan on your own system, fair warning).

The answer is not quite as clear-cut as I would have expected, and I’ll be interested to see what others find.

Comments closed

Lambda Architecture

Sebastiao Correia discusses Lambda architecture:

The batch layer stores all the data with no constraint on the schema. The schema-on-read is built in the batch views in the serving layer. Creating schema-on-read views requires algorithms to parse the data from the batch layer and convert them in a readable way. This allows input data to freely evolve as there is no constraint on their structure. But then, the algorithm that builds the view is responsible to manage the structural change in order to still deliver the same view as expected. 

This shows a coupling between the data and the algorithms used for serving the data. Focusing on data quality is therefore not enough and we may ask the question of the algorithm quality. As the system lives and evolves, the algorithms may become more and more complex. These algorithms must not be regarded as black boxes, but a clear understanding of what they are doing is important if we want to have a good data governance. Moreover, during the batch view creation, data quality transformations could be done so as to provide data of better quality to the consumer of the views.

Lambda is an interesting architectural concept, as it tries to solve the age-old “fast or accurate?” problem with “both.”  Get your fast estimates streamed through a speed layer, but your accurate, slow calculations handled through the serving layer.  Definitely check out this article.

Comments closed

CDH Update

Cloudera reports that CDH 5.7 includes a large number of changes to Hue, the web-based Hive UI:

Single-page app: The initial page loads very quickly and asynchronously fetches the list of tables, table statistics, data sample, and partition list. Subsequent navigation clicks will trigger only 1 or 2 calls to the server, instead of reloading all the page resources again. As an added bonus, the browser history now works on all the pages.

These are some nice changes.  I still don’t think a web app replaces quality tooling (like Management Studio), but if a web app is what you have, it should at least be nice.

Comments closed

Cycle Your Logs

Andrea Allred shows how to cycle error logs to prevent them from growing out of hand:

What is a good size?  I usually try to get it to roll over around 10 MB.  I use a monitoring tool and when the large error log alert is triggered, I have it run sp_cycle_errorlog for me so mine always stay a healthy size.  You don’t need fancy tools to do this though.  If you know about how fast your logs grow, you can set up a SQL Agent job to run it on a schedule to keep your logs healthy.

How many logs should I keep? This is completely up to you, but since I keep my logs so small, I try to keep 15 of them.  Why so many? I do it so I can go back and see issues further back if needed. You can adjust the amount you keep by right clicking on SQL Server Logs in SSMS and selecting “Configure”

I’d personally prefer to keep more logs—at least 32-45 days worth—but that’s going to depend upon the environment.

Comments closed

Max And Min Decimal Values

Robert Davis gives us the formula for the max and min decimal values given a scale and precision:

Unfortunately, the mathematical approach has flaws. First of all, Power(10, 38) exceeds the range of any numerical data type in SQL Server. There is no way to store or work with this value in calculations. Secondly, once you try to raise 10 to a power greater than 30, you start seeing floating point calculations (the values are approximate). For example, select Power(Cast(10 as decimal(38, 0)), 31) — casting as decimal(38, 0) because it exceeds int or bigint — yields 9999999999999999600000000000000. That’s clearly an approximated value and is not going to work for calculations where we’re expecting a precise value. So, that leaves the hacky version I didn’t want to do because I just cannot count on the mathematical approach working. Here is the base query using the hacky calculation

Read on for the hacky version.

Comments closed

Bidirectional Cross-Filtering And M

Chris Webb shows how to create a report with with a single slicer that allows the ability to show data for the current day, week, month, and year:

The way to achieve this is not all that different from the calculated column approach, but it requires a separate table to model the many-to-many relationship between all the required relative period selections and the dates in them, as well as the use of bidirectional cross-filtering between tables (which I blogged about here). The data model I used for this report looks like this

Be sure to read the comments to make sure you don’t get into a scenario in which a user can select multiple periods and get duplicated data.

Comments closed

Undocked Query Windows

Michael Swart notes that undocked query windows now feel all grown up:

The March 2016 Refresh (13.0.13000.55 Changelog) updates SSMS to use the new Visual Studio 2015 shell. Part of that change means that undocked windows are now top-level windows.

Top level windows are windows without parents so the undocked window is not a child window of the main SSMS window (but it is part of the same process). And so it gets its own space in the task bar, and participates in alt+tab when you switch between windows.

Also these undocked windows can be a collection of query windows.

One reason I rarely used child windows is that I’d undock something, switch to a browser tab underneath, and then switch back and watch the undocked window pop over my browser tab.  This sounds like a good improvement.

Comments closed