2018-07-16 – Curated SQL

Let’s take the model to the data and reproduce figures 2.1. and 2.2 of “Cities, Agglomeration, and Spatial Equilibrium”. The focus are two cities, Chicago and Boston. These cities are chosen because both differ in how easy is to access to their city centers. Chicago is fairly easy, Boston is more complicated. Our model then implies that gradients then should reflect the differential costs to access the city centers.

So let’s begin, the first step is to get some data. To do so I’m are going to use the “tidycensus” package. This package will allow me to get data from the census website using their API. We are also going to need the help of three other packages: “sf” to handle spatial data, “dplyr” my go-to package to wrangle data, and “ggplot2” to plot my results.
require("tidycensus", quietly=TRUE)
require("sf", quietly=TRUE)
require("dplyr", quietly=TRUE)
require("ggplot2", quietly=TRUE)
In order to get access to the Census API, I need to supply a key, which can be obtained from http://api.census.gov/data/key_signup.html.

Read on for theory and a test. H/T R-bloggers

Comments closed

Interacting With SQL Server From Pandas

Published 2018-07-16 by Kevin Feasel

Tomaz Kastrun shows how to use pyodbc to interact with a SQL Server database from Pandas:

In the SQL Server Management Studio (SSMS), the ease of using external procedure sp_execute_external_script has been (and still will be) discussed many times. But the reason for this short blog post is the fact that, changing Python environments using Conda package/module management within Microsoft SQL Server (Services), is literally impossible. Scenarios, where you want to build a larger set of modules (packages) but are impossible to be compatible with your SQL Server or Conda, then you would need to set up a new virtual environment and start using Python from there.

Communicating with database to load the data into different python environment should not be a problem. Python Pandas module is an easy way to store dataset in a table-like format, called dataframe. Pandas is very powerful python package for handling data structures and doing data analysis.

Click through for examples of reading and writing data.

Comments closed

HADR_DATABASE_WAIT_FOR_TRANSITION_TO_VERSIONING Wait Type

Published 2018-07-16 by Kevin Feasel

Chirag Shah explains what the HADR_DATABASE_WAIT_FOR_TRANSITION_TO_VERSIONING wait type really means:

Recently a customer reported an interesting issue, while querying against recently added readable replica, SELECT statement is shown as suspended and session is shown as waiting on HADR_DATABASE_WAIT_FOR_TRANSITION_TO_VERSIONING

[…]

Upon more investigation, it appeared to be waiting on with a wait type HADR_DATABASE_WAIT_FOR_TRANSITION_TO_VERSIONING

The behavior is by design as mention in the SQL Server product documentation and applicable to all version of SQL Server that supports availability group.

Read on for the explanation.

Comments closed

HASHBYTES On FOR JSON PATH Data

Published 2018-07-16 by Kevin Feasel

Greg Low walks us through a mechanism to check whether data has changed:

In a previous post, I wrote about how to determine if a set of incoming values for a row are different to all the existing values in the row, using T-SQL in SQL Server.

I later remembered that I’d seen a message by Adam Machanic a while back, talking about how FOR JSON PATH might be useful for this, so I did a little more playing around with it.

If you are using SQL Server 2016 or later, I suspect this is a really good option.

Click through to see an example using the WideWorldImporters database.

Comments closed

Generating Realistic-Looking Data With Markov Chains

Published 2018-07-16 by Kevin Feasel

Phil Factor shows how to use Markov chain generation in T-SQL to generate realistic-looking country names:

How did we do this? We started with a table that took each word, added two spaces at the beginning and a |, followed by two subsequent spaces, at the end. This allowed us to map the frequency of each three-letter combination in a collection of words. Any language is made up of common combinations of characters with a few wild exceptions. For words to look right, they must follow this distribution. This distribution will change in various parts of a word, so you need all this information.

So what would happen if, instead of feeding the name of countries into the batch, we do the names of people?

My favorite name from the list was Kuwatian Samoa.

Comments closed

Windows Server 2019: USB-Based File Share Witness Support

Published 2018-07-16 by Kevin Feasel

Dave Bermingham introduces us to a new feature coming in Windows Server 2019:

I’m very excited to hear that coming in Windows Server 2019 there will be a few new features in regards to the File Share Witness for the Failover Cluster Quorum. The feature that many of my customers have been asking for about for many years is finally arriving…File Share Witness on a USB stick!

Okay, they didn’t really ask for that specifically, but many of my customers wanted to deploy a simple 2-node cluster in each store location, branch office, etc., and they didn’t want the added expense of a SAN to leverage a Disk Witness and weren’t to keen, or just didn’t have the connectivity, to rely on a Cloud Witness in Azure. Many of these customers just decided to forgo clustering, or they used an alternative clustering solution like the SIOS Protection Suite.

Now they have a viable alternative coming in Windows Server 2019. By leveraging a supported router, a USB disk inserted into the router can be configured with a file share that can be used as the witness. This eliminates the need for a 3rd server or internet connectivity.

I can’t see this being extremely useful in most scenarios, though that could be a lack of imagination on my part.

Comments closed

Using Azure Blob Storage Archive Tier For Archival Data

Published 2018-07-16 by Kevin Feasel

Bob Pusateri shows us how to configure Azure Blob Storage Archive Tier:

Two of the products I use extensively for this purpose are Amazon Glacier and, more recently, Microsoft Azure Blob Storage Archive Tier. As happy as I’ve been with Amazon Glacier since its introduction in 2012, I always hoped Microsoft would offer a similar service. My wish came true in Fall of 2017 when an archive tier of Azure Blob Storage was announced. Rather than branding this capability as a new product, Microsoft decided to present it as a new tier of Azure Blob Storage, alongside the existing hot and cool storage tiers.

A noticeable difference from the hot and cool storage tiers is that the archive storage tier is only available on a per-blob basis. While a storage account can be configured to have all blobs placed in either the hot or cool tier by default once they are uploaded, the archive tier is not an option. Once a blob is uploaded, it must explicitly be moved into the archive tier. If one is using the Azure Portal to do this, there’s several clicks involved per blob. The free Azure Storage Explorer client is no better. While I found several third party tools that can upload files to the archive tier, none were free. At this point, I decided to write my own method using Powershell, which I am happy to share below.

Read on for the script. A good use for Azure Blob Storage Archive Tier would be storing old database backups which you have to keep around for compliance purposes but rarely use.

Comments closed

Diagnosing A SQL Server Error Using WinDbg

Published 2018-07-16 by Kevin Feasel

Dmitry Pilugin gives us a specific example of using WinDbg to track down a bug in SQL Server:

The important part is that the same stack with the same relative to a module addresses was in all the dumps.

To get an idea of what is going on, we may try to convert the relative address: Module(<sqlmodule>+hex), into a SQL Server’s method name. Probably the method’s name would be descriptive enough to give some information. This is the moment where WinDbg with public symbols may help.

You may download WinDbg from the here, and then download public symbols, you may use WinDbg documentation or here is an example of how to do it from Paul Randal, adopt it to your folders and version.

You should not debug a production server, because WinDbg will break any singe instruction execution, so make sure you have a test server with the exact same SQL Server version as production (in my case it was 13.0.4474.0).

I heartily second the comment that you never want to run the debugger in production.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

Day: July 16, 2018

Testing Spatial Equilibrium Concepts With tidycensus

Interacting With SQL Server From Pandas

HADR_DATABASE_WAIT_FOR_TRANSITION_TO_VERSIONING Wait Type

HASHBYTES On FOR JSON PATH Data

Generating Realistic-Looking Data With Markov Chains

Windows Server 2019: USB-Based File Share Witness Support

Using Azure Blob Storage Archive Tier For Archival Data

Diagnosing A SQL Server Error Using WinDbg