May 2020 – Page 12 – Curated SQL

Polygon-Based Spatial Searches with Cosmos DB

Published 2020-05-08 by Kevin Feasel

Hasan Savran continues a series on spatial data in Cosmos DB:

I want to continue to develop our new map application for Azure Cosmos DB. So far, we can run a custom spatial query in Cosmos DB and display the results on a map. I want my users to create a polygon on map and search for data under this polygon. If you are familiar with Zillow, that is how their application lets you look for houses to buy or rent. You select an area, and Zillow application displays all available houses or rental under the area you draw. It is extremely useful and user-friendly search.

Click through to see how Hasan does it, as well as getting around a coordinate ordering problem.

Comments closed

Using Computed Columns to Avoid Scans without Changing Queries

Published 2020-05-08 by Kevin Feasel

Andy Mallon shares a trick you don’t want to use too often, but can get you out of a pinch:

We’ve all been there. You’ve got a query where the JOIN or WHERE predicate is not SARGable. You’ve read about how this can be a problem, and how bad it is for performance.
Alas, you cannot change the query. Sometimes this reason is political, sometimes it’s because you’ve got a third-party app and simply don’t have access to the code. But you do have access to the database…

This is the type of thing you learn about and use maybe twice in your career, and then you get frustrated with the third-party vendor which won’t fix their code.

Comments closed

A History of Bad Habits

Published 2020-05-08 by Kevin Feasel

Aaron Bertrand has a compendium of bad habits, anti-patterns, and Festivus-quality grievances to air:

Going back more than a decade, I’ve been writing and presenting about what I call “bad habits” – typically shortcuts or sub-optimal ways to do things in SQL Server. Often users just don’t realize these things are bad or that there is a better way.
Here is an ongoing list of articles that I consider to be along these lines – eradicating bad habits or at least promoting best practices. Not all are explicitly framed as a “bad habit,” but they do all present things I wish I observed less often.

Click through for a disturbingly long list of items.

Comments closed

mssql-cli Generally Available on MacOS and Linux

Published 2020-05-08 by Kevin Feasel

Alan Yu announces the general availability of mssql-cli outside of Windows:

We are excited to announce that mssql-cli is now generally available on macOS and Linux.
mssql-cli is an open source and cross-platform command-line tool (CLI) to manage SQL Server on-prem and on the cloud. We are a proud member of the dbcli family of open source command line tools to manage relational databases.
If you are a user of sqlcmd, you will love the interactive and modern design components in mssql-cli. With this release, you will also be able to use mssql-cli in non-interactive scenarios such as scripting and automation. Read on to learn more about how mssql-cli will help improve your productivity through a modern CLI experience.

If you love the command line (or simply need to SSH into a box from time to time), give this product a try.

Comments closed

Removing Redundant Indexes

Published 2020-05-08 by Kevin Feasel

Guy Glantser helps find and remove redundant indexes in SQL Server:

For some reason, which I have never understood, SQL Server allows you to create duplicate indexes on the same object (table or view). You can create as many non-clustered indexes as you like with the exact index keys and included columns as well as the exact index properties. The only difference between the indexes would be the index ID and the index name. This is a very undesirable situation, because there is clearly no benefit from having the same index more than once, but on the other hand there is quite a lot of overhead that each index incurs. The overhead includes the storage consumed by the index, the on-going maintenance during DML statements, the periodical rebuild and/or reorganize operations, and the additional complexity that the optimizer has to deal with when evaluating possible access methods to the relevant table or view.

I don’t fully agree with Guy’s definition of redundancy, but it’s more a quibble than anything else—if I have an index on (A,B,C) and an index on (A,B), it might seem redundant, but there are cases when it isn’t. For example, suppose C is a large NVARCHAR column such that we barely fit (A,B,C) into the window for an index (1700 bytes in SQL Server 2016, 900 in prior versions), but A and B are INT types. If we have a lot of cases where we need (A,B) but not C, that second index is definitely not redundant.

Regardless, click through for Guy’s argument and a script to help you find potentially redundant indexes.

Comments closed

Columnstore and Memory-Optimized tempdb

Published 2020-05-08 by Kevin Feasel

Erik Darling has a bucket of cold water for us:

In SQL Server 2019:
– Exciting stuff: In memory tempdb!
– Exciting stuff: sp_estimate_data_compression_savings can evaluate columnstore compression!
– Disappointing stuff: If you use in memory tempdb, you can’t have any columnstore anything in tempdb
That means if you’re using sneaky tricks like clustered columnstore indexes on temp tables to induce batch mode, you’re gonna get a lot of errors.
Likewise, you won’t be able to evaluate if columnstore will help your tables.

Click through to understand the extent of this limitation. Hopefully this is something we see addressed in vNext and a CU for 2019.

Comments closed

Accessing Azure Queue Storage from R

Published 2020-05-07 by Kevin Feasel

Hong Ooi announces a new package for R:

This post is to announce that the AzureQstor package is now on GitHub. AzureQstor provides an R interface to Azure queue storage, building on the facilities provided by AzureStor.
Queue Storage is a service for storing large numbers of messages, for example from automated sensors, that can be accessed remotely via authenticated calls using HTTP or HTTPS. A single queue message can be up to 64 KB in size, and a queue can contain millions of messages, up to the total capacity limit of a storage account. Queue storage is often used to create a backlog of work to process asynchronously.

Hong includes a couple of demos as well, so check it out.

Comments closed

Security Practices for Azure Databricks

Published 2020-05-07 by Kevin Feasel

Abhinav Garg and Anna Shrestinian walk us through good security practices when using Azure Databricks:

Azure Databricks is a Unified Data Analytics Platform that is a part of the Microsoft Azure Cloud. Built upon the foundations of Delta Lake, MLflow, Koalas and Apache Spark^TM, Azure Databricks is a first party PaaS on Microsoft Azure cloud that provides one-click setup, native integrations with other Azure cloud services, interactive workspace, and enterprise-grade security to power Data & AI use cases for small to large global customers. The platform enables true collaboration between different data personas in any enterprise, like Data Engineers, Data Scientists, Business Analysts and SecOps / Cloud Engineering.
In this article, we will share a list of cloud security features and capabilities that an enterprise data team could utilize to bake their Azure Databricks environment as per their governance policy.

Much of this is fairly straightforward, but it is nice to have it all in one place.

Comments closed

Displaying Cosmos DB Spatial Data with .NET Core

Published 2020-05-07 by Kevin Feasel

Hasan Savran builds up a quick .NET Core app to retrieve spatial data from Cosmos DB and display it:

Cosmos DB stores geospatial data in GeoJSON format. You can not tell what raw GeoJSON represents because usually all it has is a type and bunch of coordinates. Azure Cosmos DB does not have any UI to help you what GeoJSON data looks like on a map either. Only option you have is a third party tool which might display data on a map or Azure Cosmos DB Jupyter Notebooks.
I want to run a query in Azure Cosmos DB and see the results on a map. I decided to create a simple UI which displays spatial data on a map. I will show you how to do this step by step. I will use LeafLetJs as a map. It is open source and free! Also, I need to create .NET Core 3.1 web application and use Azure Cosmos DB Emulator for data.

Hasan walks us through the demo and promises to put the code in GitHub later.

Comments closed

XML Storage in SQL Server

Published 2020-05-07 by Kevin Feasel

Vladimir Klimov walks us through what happens when we save data to an XML data type:

When working on the release of dbForge Transaction Log, among other tasks, our team had to puzzle out how to properly store typed XML data.
To start with, it is worth mentioning that SQL Server does not store XML in the format it was entered. An XML string is parsed, split to tags, and thus is stored in a compressed format. Description elements that the server considers unnecessary are discarded.

It also should be kept in mind that, if the data type of a column is specified as simple XML, the server will store this data as Unicode strings.

Click through for a couple of examples and their data storage requirements.

Comments closed

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Month: May 2020