Month: November 2018

In 2018, I’ve found myself frequently running a series of static SQL Server queries and copying/pasting the output of each query into a separate sheet in a Microsoft Excel Worksheet file. They say if you perform a manual task X number of times or more, you should automate it. Great advice, right? There are a number of ways to export SQL Server query results to Excel, and an internet search readily turns up many options. I didn’t find any that suited my needs, so I wrote my own utility, uncleverly named “SQL to Excel”.

SQL to Excel is a modest command line utility. It iterates though a folder of *.sql script files, executing each script, capturing the script query output, and writing the results to a Microsoft Excel Worksheet file (one worksheet per script file). And yes, it’s an actual Excel file, not a .csv file. The C# code for SQL to Excel is available as a GitHub repository.

Click through for more details. This sounds a bit more robust than building Integration Services packages to do the same.

Comments closed

Corrupting Databases For Fun And Profit

Published 2018-11-26 by Kevin Feasel

Eric Blinn has started a new series on database corruption. In part one, he shows us how to corrupt a database (probably by letting it listen to Alice Cooper and Frank Zappa):

I’m going to start a series of posts regarding corruption detection, prevention, and correction. In order to detect corruption we must first have a corrupt database. This post is going to show how to purposely corrupt a database. I’ll refer back to this post often as it will be the basis of most of the future posts in this series which will pick up where this leaves off.

Step 1. Create a new database. You didn’t think I was going to ask you to corrupt a real database did you? We will create one table with some names in it. The table is abnormally wide on purpose. The goal is to get fewer rows on any one page of data so that we can get several pages of data without needing many hundreds of rows.

Part two explains the concept of page verification:

Page verification is a process in SQL Server where the engine writes extra data to the header of the page while writing it to disk. This extra data can be used to verify that the data later read from that page is what was expected. There are 3 options for the setting. They are NONE, CHECKSUM, and TORN_PAGE_DETECTION. The torn page option is deprecated. It should not be used and will not be covered in this series.

When set to CHECKSUM SQL Server will quickly determine a checksum for the page of data and write it to the header of the data page. Any time from that point forward when the page is read from disk SQL Server will perform the same checksum calculation and compare that to to the stored value in the header of the page. If the value matches that would indicate there is probably not corruption. If the values do not match that would indicate there is almost certainly some kind of corruption.

Corruption is one of the scariest things that can happen to a database; knowing how to respond to an incident of corruption is critical even as the incidence of corruption is quite low in practice (unless you’re running on awful hardware).

Comments closed

Using Kafka To Drive ML Predictions

Published 2018-11-23 by Kevin Feasel

Kai Waehner shows us a model architecture for using Apache Kafka to generate predictions from trained models:

Kafka applications are event based, and leverage stream processing to continuously process input data. If you’re using Kafka, then you can embed an analytic model natively in a Kafka Streams or KSQLapplication. There are various examples of Kafka Streams microservices embedding models built with TensorFlow, H2O or Deeplearning4j natively.

It is not always possible or feasible to embed analytic models directly due to architectural, security or organizational reasons. You can also choose to use RPC to perform model inference from your Kafka application (bearing in mind the the pros and cons discussed above). You can visit my project for an example of gRPC integration between a Kafka Streams microservice and locally hosted TensorFlow Serving container for making predictions with a hosted TensorFlow model.

There are a couple separate and interesting patterns here.

Comments closed

Strategies For Dealing With Failed Projects

Published 2018-11-23 by Kevin Feasel

Edwin Thoen gives us a few tips for dealing with failing data science projects:

At the beginning of a project the levels enthusiasm and optimism are always at its peak. Especially in data science projects. Isn’t data the new oil? This is the time we are finally going to dig into that well and leverage our data in unprecedented ways! No setbacks are experienced yet. There is only one road ahead and it will lead us to success. Probably at this stage you, the data scientist, are already well aware of a number of project risks. You might want to keep these concerns to yourself, as you don’t want to come across as negative, or worse, someone who is not up to the job ahead. Please don’t, if you foresee possible problems at this stage and you don’t speak out, they can come back as a boomerang when the problems actually occur. Rather, invite all stakeholders to perform a risk analysis together.

This is good advice and applies outside of data science projects as well. H/T R-bloggers

Comments closed

Kafka Analytics Patterns In HDP 3.1

Published 2018-11-23 by Kevin Feasel

George Vetticaden walks us through what’s coming with Apache Kafka in Hortonworks Data Platform 3.1:

A summary of these three new access patterns:

Stream Processing: Kafka Streams Support – With existing support for Spark Streaming, SAM/Storm, Kafka Streams addition provides developers with more options for their stream processing and microservice needs.
SQL Analytics: New Hive Kafka Storage Handler – View Kafka topics as tables and execute SQL via Hive with full SQL Support for joins, windowing, aggregations, etc.
OLAP Analytics: New Druid Kafka Indexing Service – View Kafka topics as cubes and perform OLAP style analytics on streaming events in Kafka using Druid.

Click through for high-level explanations of each. George promises more detailed explanations as well.

Comments closed

DATEDIFF

Published 2018-11-23 by Kevin Feasel

Randolph West continues a series on covering dates and times, looking at DATEDIFF and DATEDIFF_BIG:

The only functional difference between them is that the DATEDIFF_BIG() returns values as a BIGINT, for results that exceed the boundary of an INT. Keep this in mind when deciding which one to use. For example, the maximum number of seconds an INT can hold is 68 years, while a BIGINT can comfortably store the number of seconds in 10,000 years. This becomes especially important when dealing with microseconds and nanoseconds.

The rest of the post will use DATEDIFF() to refer to both functions.

I think this might be the first time I’d read about DATEDIFF_BIG()and I’m not aware of ever having used it. But hey, it could make sense if you need to track more than 2 billion microseconds.

Comments closed

Optimizing SQL Server Workloads On VMware

Published 2018-11-23 by Kevin Feasel

Jeff Mlakar shares a few tips on hosting SQL Server via VMware virtual machine:

Why is Over-Allocating a VM Bad?

Why is is so important to allocate just the right amount of resources for your guest VM running SQL Server? We all know the problem with an undersized system; however, what about an oversized system? What problem could that be? Here are a few common issues to consider:

Over-allocating CPU causes poor resource utilization across all the guest VMs
Over-allocating memory unnecessarily increases memory contention and overhead on other guest VMs
Having more vCPUs assigned to the VM can have an impact on licensing.

There are some good points in the post, so check it out.

Comments closed

The Risk Of Shadow IT In The Cloud

Published 2018-11-23 by Kevin Feasel

Kenneth Fisher walks us through the risk of increased Shadow IT with migration to the cloud:

Shadow IT has been, well, maybe not the bane of the IT department, but certainly a pain in the neck. On the off chance you’ve never heard of shadow IT do any of these sound familiar?

A user asks you to restore a corrupt database on a SQL Server you’ve never heard of and isn’t in your inventory. (And 50/50 odds there’s never been a backup taken.)
You do a licensing true-up and dozens of new SQL Servers suddenly show up.
You hear from a user: “We have this mission critical Access database that suddenly isn’t working. I know you don’t support access but you’re the database person so we need you to fix it.”

It’s an interesting short essay and worth thinking about if you’re in the cloud or moving that way.

Comments closed

Casting Constants And POWER()

Published 2018-11-23 by Kevin Feasel

Steve Jones walks us through a case of an unexpected error:

I ran into an interesting problem while working with the POWER() function. I was trying to do some binary conversions and had a statement like this to process powers of 2.
SELECT POWER(2, n)
This was designed to take a value and return a power of 2. I then used a different value to determine if this was added to my conversion factor or not. In trying to work with some larger numbers, I ran into this error:

Msg 232, Level 16, State 3, Line 3
Arithmetic overflow error for type int, value = 2147483648.000000.

Click through for a description of the problem as well as Steve’s solution.

Comments closed

Extended Events In Azure Data Studio

Published 2018-11-23 by Kevin Feasel

Jason Brimhall walks us through installation of the SQL Server Profiler on Azure Data Studio:

Azure Data Studio (ADS) is getting all sorts of love and attention these days. So much so that they have finally gotten around to adding Extended Events (XE) to the tool – sort of. Now we have the power to run traces on SQL Server via ADS.

The presence of XE in ADS comes via an extension and comes with a few other caveats. I will explore the extension for XE available in ADS in this article and discuss some of the caveats. As you read the article, it might be helpful to go ahead and download ADS if you do not already have it.

Jason points out the name of SQL Server Profiler and I’d like to add my own bit of irritation here. “Don’t use Profiler, except the one good Profiler but not the Profiler you think you’re using unless you know not to use Profiler and use Profiler instead.” Yeah, that’s pretty clear.

Comments closed

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30