Press "Enter" to skip to content

Curated SQL Posts

Connecting PolyBase to Spark

I have a blog post connecting PolyBase to a Spark cluster:

If you do define your Spark DataFrames well, you get a much happier result. Here’s me creating a better-looking DataFrame in Spark:

import org.apache.spark.sql.functions._
spark.sql("""
SELECT
INT(SUMLEV) AS SummaryLevel,
INT(COUNTY) AS CountyID,
INT(PLACE) AS PlaceID,
BOOLEAN(PRIMGEO_FLAG) AS IsPrimaryGeography,
NAME AS Name,
POPTYPE AS PopulationType,
INT(YEAR) AS Year,
INT(POPULATION) AS Population
FROM NorthCarolinaPopulation
WHERE
POPULATION <> 'A'
""")
.write.format("orc").saveAsTable("NorthCarolinaPopulationTyped")

It’s not all perfect, though: I also cover driver problems that I ran into here with Spark and Hive.

Comments closed

Finding High-Variance Memory Grants

Erin Stellato shows how you can use Query Store to track the variance of memory grant requests:

One of the more perplexing problems to troubleshoot in SQL Server can be those related to memory grants. Some queries need more memory than others to execute, based on what operations need to be performed (e.g. sort, hash). SQL Server’s optimizer estimates how much memory is needed, and the query must obtain the memory grant in order to start executing. It holds that grant for the duration of query execution – which means if the optimizer overestimates memory you can run into concurrency issues. If it underestimates memory, then you can see spills in tempdb. Neither is ideal, and when you simply have too many queries asking for more memory than is available to grant, you’ll see RESOURCE_SEMAPHORE waits. There are multiple ways to attack this issue, and one of my new favorite methods is to use Query Store.

Click through for a demonstration.

Comments closed

CAST and CONVERT Make Expressions Nullable

Daniel Hutmacher points out a side effect of using CAST() and CONVERT():

Suppose we want to set up a view in the new solution that mirrors the names and definitions of the old table, so the legacy integration can use that view going forward:

CREATE OR ALTER VIEW new.the_table_like_before
AS
SELECT CAST(id AS varchar(32)) AS id,
CAST([row] AS int) AS [row],
CAST(date_loaded AS datetime) AS dt
FROM new.the_table;

Now, if you check out the resulting datatypes of the view, you’ll notice that all the columns are marked nullable, even though they’re all based on non-nullable columns, so the values in the view could never be null.

Read on for a couple possible solutions.

Comments closed

Business Cases for SQL Server 2019

Kevin Chant gives you business cases for why you should upgrade to SQL Server 2019:

1. Polybase
Currently, you might import large amounts of data from another relational database system like Oracle or MongoDB into your SQL Server database. If so, you can use Polybase instead.

Polybase has certainly been improved in SQL Server 2019. For instance, more data sources are now available. Plus, you can even install Polybase on SQL Server 2019 on Linux as well.

Instead of importing the data from the source you can use Polybase to connect to it remotely. Which means that you can run your queries against the source directly instead.

Read on for more in this vein. Here it’s less about the technical capabilities and more about making life easier for other people in the business.

Comments closed

Wireshark and the SQL Server Browser Service

Joy George Kunjikkur takes a look at the SQL Server Browser service using Wireshark:

Normally on development environment the 1434 port would be open and SQL Server Browser running in SQL Server machine. If developers code based on that assumption, it may fail in higher environments. Most of the developers know what might have happened. But some may not and ends up in complete darkness. The symptom of this issue will be connection failure from application to SQL Server database.

This post is mainly to demonstrate how to check network failures on SQL Server Browser operations. The prerequisite is the availability of Wireshark in the machine where application runs and basic knowledge how to run the same. The above link has some screenshots which would help to run Wireshark for UDP connections.

This is usually the first service I disable post-installation.

Comments closed

Antivirus and SQL Server

Randolph West proffers advice should your IT team require installing antivirus software on a server with SQL Server running:

This is why it is documented that we should exclude SQL Server from any AV (anti-malware) detection products, so that it can get on with doing what it does best.

Yes, it’s formally documented. This is why we should read documentation when installing things. While it’s super-easy to click “Next,” “Next,” “Next,” that should not be the case with a complex product like SQL Server.

Read on for the list of exceptions you should add and processes to avoid scanning.

Comments closed

Matrix Operations in R

Kristian Larsen shows off some linear algebra concepts in R:

In this article, you learn how to do linear algebra in R. In particular, I will discuss how to create a matrix in R, Element-wise operations in R, Basic Matrix Operations in R, How to Combine Matrices in R, Creating Means and Sums in R and Advanced Matrix Operations in R.

The post is so chock-full of examples, the only block of multi-line text is the description.

Comments closed

Creating Models with ML.NET

I have a series on ML.NET; in this post, I look at building a model:

Okay, now that I have classes, I need to put in that lambda. I guess the lambda could change to qb => qb.Quarterback == "Josh Allen" ? "Josh Allen" : "Nate Barkerson" and that’d work except for one itsy-bitsy thing: if I do it the easy way, I can’t actually save and reload my model. Which makes it worthless for pretty much any real-world scenario.

So no easy lambda-based solution for us. Instead, we need a delegate. 

The experience so far has been a bit frustrating compared to doing similar work in R, but they’re actively working on the library, so I’m hopeful that there will be improvements. In the meantime, I’ve landed on the idea of doing all data cleanup work outside of ML.NET and just use the simplest transformations.

Comments closed

Cosmos DB Role-Based Access Control

Hasan Savran takes us through role-based access control in Cosmos DB:

Role-based access control (RBAC) is available for Azure CosmosDB. By using RBAC, you can manage who has access to CosmosDB resources. You need to have a profile in Azure Active Directory to assign RBAC roles to users, groups or other targets. There are 4 built-in roles you can use. Microsoft announced the CosmosDB Operator
 role recently.

Click through for that list.

Comments closed

Using CHOOSE() in SQL Server

Bert Wagner explains the CHOOSE() function:

While I know I don’t utilize most of the features available in SQL Server, I like to think I’m at least aware that those features exist.

This week I found a blind-spot in my assumption however. Even though it shipped in SQL Server 2012, the SQL Server CHOOSE function is a feature that I think I’m seeing for the first time this past week.

CHOOSE() and IIF() were functions ported over to make it easier for Access and Excel users to write code. I tend to avoid them because there are typically better idiomatic constructs (like CASE) in SQL Server.

Comments closed