Press "Enter" to skip to content

Month: October 2021

Data Type Casing in SQL Server

Aaron Bertrand finds a case where casing matters:

We all have coding conventions that we have learned and adopted over the years and, trust me, we can be stubborn about them once they’re part of our muscle memory. For a long time, I would always uppercase data type names, like INTVARCHAR, and DATETIME. Then I came across a scenario where this wasn’t possible anymore: a case-sensitive instance. In a recent post, Solomon Rutzky suggested:

As long as you are working with SQL Server 2008 or newer, all data type names, including sysname, are always case-insensitive, regardless of instance-level or database-level collations.

I have a counter-example that has led me to be much more careful about always matching the case found in sys.types.

Click through for that scenario.

Comments closed

pyspark.pandas in Apache Spark 3.2

Hyukjin Kwon and Xinrong Meng announce a built-in pandas API for Apache Spark 3.2:

We’re thrilled to announce the pandas API as part of the upcoming Apache Spark™ 3.2 release. pandas is a powerful, flexible library and has grown rapidly to become one of the standard data science libraries. Now pandas users can leverage the pandas API on their existing Spark clusters.

A few years ago, we launched Koalas, an open source project that implements the pandas DataFrame API on top of Spark, which became widely adopted among data scientists. Recently, Koalas was officially merged into PySpark by SPIP: Support pandas API layer on PySpark as part of Project Zen (see also Project Zen: Making Data Science Easier in PySpark from Data + AI Summit 2021).

pandas users can now scale their workloads with one simple line change in the upcoming Spark 3.2 release:

Click through to see more details on the change.

Comments closed

Write-Debug in Powershell

Robert Cain goes from verbose to debug mode:

In my previous post, Fun With PowerShell Write-Verbose, I introduced the use of the built in -Verbose switch. In this post we’ll dive into its counterpart, the -Debug switch and its companion Write-Debug cmdlet.

In covering Write-Verbose, I mentioned verbose messages are typically targeted at the average user. The -Debug messages are meant to target the developer. These messages are meant to assist the PowerShell developer in trouble shooting and debugging their code.

Click through for examples of it in action.

Comments closed

SQL Server Express Memory Limitations

Steve Stedman notes that the memory limitations on SQL Server Express Edition are not quite as stringent as you may first believe:

Looking at the memory limits and other limits on the SQL Server versions over time, we have seen things increase, but one limit that is still very low is the memory limit for SQL Express. Specifically the maximum memory for buffer pool per instance of SQL Server Database Engine for SQL 2019. The limit there is 1410 MB.

At first glance you may think that this limit is the total amount of memory that SQL Server will use, but let me show you a couple of screen shots for Database Health Monitor showing the memory utilization on two different SQL 2019 Express servers.

Read on to see what, exactly, the memory limitation is. Also, there are separate limits for things like In-Memory OLTP table sizes.

Comments closed

Top with Percent

Kevin Wilkie is on the top shelf:

In the last blog post, we went over the extreme basics of using the TOP operator in SQL. We showed how to grab things like the TOP 10 of a certain item.

That ability will get you through a number of criteria that you will be asked to perform. But what if you’re asked to grab the top five percent of performers in your company? Or in a region? It’s kinda hard to do that if you only have what we know so far, right?

Read on for the answer.

Comments closed

Operating System Error 995 on Adding a Database to an AG

Andrew Pruski troubleshoots a problem:

I was adding databases to an availability group (SQL Server 2017 CU20 instance) the other day and one database failed to automatically seed to the secondary.

When I looked in the SQL Server error log I saw this error message: –

BackupIoRequest::ReportIoError: write failure on backup device ‘{GUID}’. Operating system error 995(The I/O operation has been aborted because of either a thread exit or an application request.).

Read on to see how Andrew solved the problem.

Comments closed

Combining Windows and SQL Server Time Zone Info

Andy Mallon splices together time zone information like some kind of time zone Dr. Frankenstein:

There was a recent post on Database Administrators, where there was some confusion over Daylight Saving Time(DST) in Australian timezones. Let’s be honest though–timezones are confusing, especially when DST comes into play. Arizona Time doesn’t observe Daylight Saving Time, so sometimes it’s the same as Mountain Time, and sometimes it’s Pacific Time.

Finding the right time zone can be hard. As of this writing, Windows recognizes 141 different timezones, many of which overlap.

In this case, the asker was running this query, and confused over why it wasn’t properly reflecting DST, and the other columns in the DMZ weren’t helpful either:

This is why everybody should just use Eastern Standard Time: it’s the standard—says so in the name, even.

Comments closed

Replication Updates in SQL Server 2019 CU13

Kevin Farlee surprises us all with some changes to replication:

Starting with CU13 of SQL Server 2019 , we are including new functionality which facilitates peer-to-peer replication with conflict detection & resolution using a last-writer-wins strategy.  This enables many scenarios for multi-write configurations which may not have been possible or practical before.

Traditionally, customers configuring peer-to-peer replication are advised to implement a sharded solution which ensures that each row is updated on only one replica server (key values A-M are updated on server 1 while key values N-Z are updated on server 2, etc.).  If it should happen that the same row is updated within a small time window by two servers in a replication environment, a conflict will be detected, which will either stop replication after raising an error, or it may optionally be resolved using the originator ID of the replica sending the update.  (i.e. if you have replicas with Originator IDs 1 and 2, then updates coming from the replica with originator ID 2 would always win, and be persisted.  See Conflict Detection in peer-to-peer replication – SQL Server | Microsoft Docs for a detailed explanation.

Read on for more details.

Comments closed

Working with Wide Data in R

Andrew Collier works with some wide data:

The concept of “wide data” is relative. In some domains 100 columns is considered “wide”, while in others that’s perfectly normal and you’d need to have thousands (or tens of thousands!) of columns for it to be considered even remotely “wide”. The data that we work with at Fathom Data generally lies in the first domain, but from time to time we do work on data that is considerably wider.

This post touches on a couple of approaches for dealing with that sort of data. We’ll be using some HCRIS (Healthcare Cost Report Information System) data, which are available for download here. Specifically, we’ll be working with an extract from the hcris2552_10_2017.csv file, which contains “select variables in flat shape”.

Click through for one example which has 1700 columns. H/T R-Bloggers.

Comments closed