Press "Enter" to skip to content

Author: Kevin Feasel

CI With SQL Server And Jenkins

Chris Adkin shows how to auto-deploy SQL Server Data Tools projects to a SQL Server instance using Jenkins:

The aim of this blog post is twofold, it is to explain how:

  • A “Self building pipeline” for the deployment of a SQL Server Data Tools project can be implemented using open source tools
  • A build pipeline can be augmented using PowerShell

What You Will Need

  • Jenkins automation server

  • cURL

  • SQL Server 2016 (any edition will suffice)

  • Visual Studio 2015 community edition

  • A windows server, physical or virtual to install all of the above on, I will be using Windows Server 2012 R2 as the operating system

Automated integration via CI is extremely helpful, and Chris makes it look easy in this post.

Comments closed

Benefits Of Deprecated Data Types

Raul Gonzalez shows how to get one of the benefits of older, deprecated data types using (MAX) data types:

We can see that our table is managed by two different allocation units, IN_ROW_DATA and LOB_DATA, which means that all data within columns of the data types above, will end up in different pages by default, regardless of the size of the data.

This is the default behaviour for old LOB types, to be stored separately, but new LOB types (MAX) by default will try to get them In-Row if they are small enough to fit.

Having some of those documents In-Row will result in a serious increase in the number of pages to scan, therefore affecting performance.

Note that for the table scan we have used only the IN_ROW_DATA pages, making it much lighter than if we have to scan the sum of all pages.

This might be helpful for some situations, like where you rarely need to get to the LOB data.

Comments closed

Killing SPIDs

Garland MacNeill is all out of bubble gum:

Recently came across a situation where reporting logins were interfering with nightly jobs due to blocking. After a number of attempts of trying to resolve the blocking, it was decided that a stored procedure that disabled the login and killed the user sessions was the most pragmatic solution. This is the code I came up with to resolve the issue.

Click through for the script.  This is definitely a last-ditch option, but it’s good to have in your bag of tricks.

Comments closed

Choosing A Hadoop Data Format

Silvia Oliveros has a set of considerations to help you choose a file format for your data in Hadoop:

What does your pipeline look like, and what steps are involved?

Some of the file formats were optimized to work in certain situations. For example, Sequence files were designed to easily share data between Map Reduce (MR) jobs, so if your pipeline involves MR jobs then Sequence files make an excellent option. In the same vein, columnar data formats such as Parquet and ORC were designed to optimize query times; if the final stage of your pipeline needs to be optimized, using a columnar file format will increase speed while querying data.

At first, I’d suggest just using delimited files, as it’s easiest that way.  Once you have developed a bit of Hadoop maturity, then it makes sense to think about whether rowstore formats (like Parquet and Avro) or columnstore formats (like ORC) make sense for a particular data set.

Comments closed

OCR With Tesseract

Amuda Adelou shows how to use Tesseract’s Java API to perform character recognition in images:

Extracting text from an image means that you are considering the flowchart imagery that’s processed to extract the text components and then extracting the geometrical shapes components. The text components are extracted with geometrical components, as well. The internal relationship between the components is set up by tracing the flow lines that connect different components. The extracted components are output to metadata (in XML format), which is machine-readable. This metadata can be archived, stored in a knowledge base, or shared with others.

Click through for a demo app and code.

Comments closed

Spark Deep Learning On AWS

Joseph Spisak, et al, show how to configure and use BigDL in Amazon Web Services’s ElasticMapReduce:

Classify text using BigDL

In this tutorial, we demonstrate how to solve a text classification problem based on the example found here. This example uses a convolutional neural network to classify posts in the 20 Newsgroup dataset into 20 categories.

We’ve provided a companion Jupyter notebook example on GitHub that you can open in the Jupyter dashboard to execute the code sections.

There’s a lot to this tutorial.

Comments closed

Troubleshooting AG Creation Failure

Anthony Nocentino digs into logs to troubleshoot a failure when trying to create an Availability Group:

Now we have some data to look through!

When we look at the contents of the cluster logs, we’re totally on the other side of the spectrum when it comes to information verbosity. The logs so far have been pretty terse and haven’t really told us about what’s causing the failure…well dig through this log and you’ll likely find your reason and a lot more information. Good stuff to look at to get an understanding of the internals of WSFCs. Now for the the reason my Availability Group creation failed was permissions. Check out the log entries.

It’s a good troubleshooting guide.

Comments closed

Building A Recommendation System With Graph Data

Arvind Shyamsundar and Shreya Verma show how to implement a recommender system using SQL Server 2017’s new graph database functionality:

As you can see from the animation, the algorithm is quite simple:

  • First, we identify the user and ‘current’ song to start with (red line)
  • Next, we identify the other users who have also listened to this song (green line)
  • Then we find the other songs which those other users have also listened to (blue, dotted line)
  • Finally, we direct the current user to the top songs from those other songs, prioritized by the number of times they were listened to (this is represented by the thick violet line.)

The algorithm above is quite simple, but as you will see it is quite effective in meeting our requirement. Now, let’s see how to actually implement this in SQL Server 2017.

Click through for animated images as well as an actual execution plan and recommendations for graph query optimization (spoilers:  columnstore all the things).  They also link to the GitHub project where you can try it out yourself.

Comments closed

Team Maturity Levels

Ed Elliott has the best lists:

Maturity Levels

OK so this is pretty simple, we have these levels:

  • Low
  • Medium
  • High

Wow. Just WOW

That is an amazing list, how did you come up with it? Did it come from some phd study on the effectiveness of lists in the internet age? No.

So a little more detail…

Read on for Ed’s take on database development maturity levels.  I might quibble with some of the specifics, but I agree with the principle.

Comments closed

Finding Query Plan Regressions

Jovan Popovic shows how to find query plan regressions in SQL Server 2017:

In CTP2.0 version is added new system view sys.dm_db_tuning_recommendations that returns recommendations that you can apply to fix potential problems in your database. This view contains all identified potential performance issues in SQL queries that are caused by the SQL plan changes, and the correction scripts that you can apply. Every row in this view contains one recommendation that you can apply to fix the issue. Some of the information that are shown in this view are:

  • Id of the query, plan that caused regression, and the plan that that might be used instead of this plan.

  • Reason that describes what kind of regression is detected (e.g. CPU time for the query is changed from 17ms to 189ms)

  • T-SQL script that can be used to force the plan.

  • Information about the current plan, and previous plan that had better performance.

In the “surgical scalpel to chainsaw” range of query tuning options, this rates approximately guillotine.  I think it’ll be a very useful tool for finding issues, but it wouldn’t be wise to start lopping off all the heads just because the optimizer tells you to.  In this context, I imagine this DMV to be about as useful as the missing indexes DMV and for the same reasons.

Comments closed