Press "Enter" to skip to content

Author: Kevin Feasel

Adding Libraries in Databricks

Arun Sirpal has some third-party libraries to add:

It is a really common requirement to add specific libraries to databricks. Libraries can be written in Python, Java, Scala, and R. You can upload Java, Scala, and Python libraries and point to external packages in PyPI, Maven, and CRAN repositories.

Libraries can be added in 3 scopes. Workspace, Notebook-scoped and cluster. I want to show you have easy it is to add (and search) for a library that you can add to the cluster, so that all notebooks attached to the cluster can leverage the library.

I’m hoping that loading libraries in Azure Synapse Analytics will, at some point, be this convenient.

Comments closed

Connecting to Postgres with PolyBase

I clear one blog post off my backlog:

Now that we have some data, let’s go back to SQL Server. I assume you’ve already installed and configured PolyBase—if not, check out my presentation on PolyBase. Note that this requires SQL Server 2019 or later, as that’s the first version which supports PolyBase to ODBC. Here’s a script which assumes a database named Scratch and a master key <<SomeSecureKey>>.

Click through for step-by-step instructions to get started, though I will freely admit that I don’t have the Postgres knowledge to give you a full listing of sharp edges.

Comments closed

Backing Up the Service Master Key

William Assaf takes us through backing up important keys in SQL Server:

You should consider complimentary backup solutions that backup/snapshot the entire server (or VM) for SQL Server, but sometimes these technologies are limited or have too much of an impact on the server. A whole VM snapshot for example that is reliant on VSS could incur an unacceptable long IO stun duration when it occurs. 

Regardless, in all cases, SQL Server backups of each database should be taken regularly. This is a conversation for another blog post but a typical pattern is weekly full backups, nightly differential backups, and in the case of databases not in SIMPLE recovery model, 15 minute transaction log backups.

Read the whole thing.

Comments closed

SSMS 18.7.1 Released

Glenn Berry takes us through the latest edition of SQL Server Management Studio:

One big change with SSMS 18.7 is described by Microsoft this way:

Beginning with SQL Server Management Studio (SSMS) 18.7, Azure Data Studio is automatically installed alongside SSMS. Users of SQL Server Management Studio are now able to benefit from the innovations and features in Azure Data Studio. Azure Data Studio is a cross-platform and open-source desktop tool for your environments, whether in the cloud, on-premises, or hybrid.

So far, this has been a pretty controversial change. Erik Darling created a User Voice suggestion on October 20th that has already gotten over 234 votes, and many comments.

I’m not going to weigh in too much here, though I would prefer this to be an optional installation. Do watch out for an annoyance, though, if you have Azure Data Studio installed as a User instead of System.

Comments closed

Stored Procedure Return Values and Entity Framework Core

Erik Ejlskov Jensen shows us how to retrieve the return value from a stored procedure using Entity Framework Core:

SQL Server stored procedures can return data in three different ways: Via result sets, OUTPUT parameters and RETURN values – see the docs here.

I have previously blogged about getting result sets with FromSqlRaw here and here.

I have blogged about using OUTPUT parameters with FromSqlRaw here.

In this post, let’s have a look at using RETURN values.

Click through for the process.

Comments closed

Dotnet-Spark UDFs and Missing Shared State

Ed Elliott uncovers a mystery:

To understand this we need to take a look at how we can create a UDF in .NET that is called by the Java VM Apache Spark code because, that is logically, what happens. In our application we call into Apache Spark and ask it to do things like read from a file, run some transformation and write files back out again. With UDF’s, we ask Spark to run a UDF and Spark comes back to our UDF, passing it some data and asks the UDF to execute but the Java VM does not understand how to execute .NET code.

Read the whole thing.

Comments closed

Migrating SQL Server Container Images to GitHub

Andrew Pruski has moved some images around:

A couple of months ago Docker announced that they would be implementing a 6 month retention policy for unused images in the Docker Hub.

This was due to kick in on the 1st of November but has now been pushed back until mid 2021.

I’ve had multiple Windows SQL Server container images up on the Docker Hub for years now. It’s been a great platform and I’m very thankful to them for hosting my images.

That being said, I want to make sure that the images that I’ve built are always going to be available for the community so I have pushed my SQL Server images to the Github Container Registry.

I guess I should do the same.

Comments closed

Parallelism and Nested Loops Joins

Erik Darling talks about the intersection of two performance tuning topics:

Yesterday we saw a case where the Gather Streams operator was costed quite highly, and it prevented a parallel plan from being chosen, despite the parallel plan in this case being much faster.

It’s important to note that costing for plans is not a direct reflection of actual time or effort, nor is it accurate to your local configuration.

They’re estimates used to come up with a plan. When you get an actual plan, there are no added-in “Actual Cost” metrics.

Read on to see how you can monkey’s paw your way through this problem by introducing exciting, new problems.

Comments closed

Avoid Backup-and-Restore of SSISDB for Deployment

Andy Leonard recommends not using backup-and-restore as an approach of moving SSIS packages around:

First, please do not misunderstand. You should back up SSISDB just like you back up all other databases – especially in Production. You should also conduct Disaster Recovery exercises in which you restore SSISDB from the latest backup, or avail yourself of Always On availability groups and / or Windows Server Failover Clustering.

With that caveat in mind, read on to see why.

Comments closed