Press "Enter" to skip to content

Curated SQL Posts

Summarizing Improvements In Spark 2.4

Anmol Sarna summarizes Apache Spark 2.4 and pushes his meme game at the same time:

The next major enhancement was the addition of a lot of new built-in functions, including higher-order functions, to deal with complex data types easier.
Spark 2.4 introduced 24 new built-in functions, such as  array_unionarray_max/min, etc., and 5 higher-order functions, such as transformfilter, etc.
The entire list can be found here.
Earlier, for manipulating the complex types (e.g. array type) directly, there are two typical solutions:
1) exploding the nested structure into individual rows, and applying some functions, and then creating the structure again.
2) building a User Defined Function (UDF).
In contrast, the new built-in functions can directly manipulate complex types, and the higher-order functions can manipulate complex values with an anonymous lambda function similar to UDFs but with much better performance.

2.4 was a big release, so check this out for a great summary of the improvements it brings.

Comments closed

Building A Full-Stack App With F#

Shanglun Wang has a tutorial on building a full application using F#:

In the United States, there is a popular saying: “It’s five o’clock somewhere”.
In some parts of the world, 5:00 pm is the earliest time when it is socially acceptable to have a drink, or a traditional cup of tea.
Today, we will build an application based on this concept. We will build an application that, at any given time, searches through the various time zones, find out where it is five o’clock, and provide that information to the user.

It’s a neat app.  

Comments closed

Using Powershell As Your Default Prompt On Windows Server Core

Patrick Gruenauer shows us how to make Powershell the shell of choice when running on Windows Server Core:

Well, if you’re running a Server Core (I hope so, for domain controllers, dns, file services and more there’s no good reason to do not so), then it’s an ease to change that. The corresponding setting has to be configured in the registry. Regedit can be opened on Server Core (there are more graphical user interfaces that are shipped with core, for example notepad.exe and timedate.cpl). I’m talking about this key:

Click through for the instructions.

Comments closed

Service Account Names And SQL Agent Not Starting Up After Reboots

Hamish Watson walks us through a weird scenario:

Looking at the event logs I (eventually) found this:
Event ID 7000:
The MSSQLSERVER service failed to start due to the following error: 
The account name is invalid or does not exist, or the password is invalid for the account name specified.
Which is bizarre – as the service account had been used for months – but after each reboot the services had to be manually started. GPO and other things had been blamed but no one could actually find out why.

I’ve never used @ naming for service accounts, so until I read this I didn’t even know it was possible.  Now I’m going to forget again because apparently it’s not a good idea.

Comments closed

Mandatory And Default Powershell Parameters

Jess Pomfret shows us an example of creating mandatory parameters but also having a default set:

I came across a situation this week where I wanted to add the option of running an existing script,  for a specific server/database combination.  The script currently has no parameters and runs against all servers in the environment from a scheduled task.  I wanted to make sure that behavior didn’t change. The other requirement was that if I specified Server, Database should be a mandatory parameter and vice versa.
The final solution was to add the two parameters to a parameter set and make them both mandatory.  I also had to add a different DefaultParameterSet (thanks to Andrew for this idea), otherwise it defaulted to the defined parameter set, meaning the script always required both Server and Database parameters.

Click through for a demo.

Comments closed

Bulk Insertion With An Identity Column

Kenneth Fisher gives us a quick post on bulk insertion against tables with identity columns:

TL;DR; BULK INSERT doesn’t have an easy way to specify a column list so the columns in the insert file must match the columns in the table unless you use a format file or a staging table.
As simple as they appear on the surface identity columns are fairly complicated things. BULK INSERT, on the other hand, is pretty complicated even on the surface (at least as far as I’m concerned). Because of this, the combination can get even worse. When you do an insert into a table that has an identity column you usually just ignore that column and it’s automatically filled in. However, that’s not true with BULK INSERT. Unless you use a format file, the number of columns (and their order) in the file has to match the table.

Read the whole thing.

Comments closed

Availability Groups And Read-Only Filegroups

Allan Hirt walks us through a couple of scenarios involving databases with read-only filegroups using Always On Availability Groups:

A question came across my inbox this week which I decided to investigate: what happens if you have read only filegroups as part of your database, and you want to use Always On Availability Groups? Let’s find out.
First, I have a database with two filegroups: one read write (PRIMARY) and one read only (ROFG).

Click through for a demonstration of this, as well as a different scenario in which you might want only the read-write data on the secondary.

Comments closed

Using DISTINCT With XML Data

Dave Bland has a workaround for a limitation with processing XML in SQL Server:

Since I was working on an example for my next performance class, I decided to use an example from that session.  In the query below the qp.* part of the column list will return a column named query_plan which is an XML data type.
SELECT DISTINCT pa.,cp.,qp.* 
FROM
(SELECT plan_handle
FROM sys.dm_exec_cached_plans) cp 
CROSS APPLY sys.dm_exec_query_plan (cp.plan_handle ) qp
CROSS APPLY sys.dm_exec_plan_attributes(cp.plan_handle) As pa
WHERE is_cache_key = 1 AND pa.attribute = ‘set_options’
However, when I add the DISTINCT keyword I get the error below.

Read on for a workaround for this.

Comments closed

What’s New In Ambari 2.7

Paul Codding and Kat Petre share some of the new features in Ambari 2.7:

With this release, we wanted to make Ambari more enjoyable to use every day, simplify finding and using our API, and unblock teams managing very large clusters.  Here is a preview of a few features we’re excited to share with you:
Revamped UI – Our Hortonworks Product Design Team has created a great new UI design system, and we’ve taken the time to update Ambari to use the new styles, colors, and navigation concepts from it.

I’ll have to rebuild my home Hadoop cluster someday, so I’ll have to check this out when I do.

Comments closed

Deploying An R Service To Azure Kubernetes Service

Hong Ooi shows us how we can use Azure Container Registry and Azure Kubernetes Service to deploy an R model via Plumber:

If you run this code, you should see a lot of output indicating that R is downloading, compiling and installing randomForest, and finally that the image is being pushed to Azure. (You will see this output even if your machine already has the randomForest package installed. This is because the package is being installed to the R session inside the container, which is distinct from the one running the code shown here.)
All docker calls in AzureContainers, like the one to build the image, return the actual docker commandline as the cmdline attribute of the (invisible) returned value. In this case, the commandline is docker build -t bos_rf . Similarly, the push() method actually involves two Docker calls, one to retag the image, and the second to do the actual pushing; the returned value in this case will be a 2-component list with the command lines being docker tag bos_rf deployreg.azurecr.io/bos_rf and docker push deployreg.azurecr.io/bos_rf.

I love this confluence of technologies and at the same time get a “descent into madness” feeling from the sheer number of worlds colliding.

Comments closed