The next major enhancement was the addition of a lot of new built-in functions, including higher-order functions, to deal with complex data types easier.
Spark 2.4 introduced 24 new built-in functions, such as
array_max/min, etc., and 5 higher-order functions, such as
The entire list can be found here.
Earlier, for manipulating the complex types (e.g. array type) directly, there are two typical solutions:
1) exploding the nested structure into individual rows, and applying some functions, and then creating the structure again.
2) building a User Defined Function (UDF).
In contrast, the new built-in functions can directly manipulate complex types, and the higher-order functions can manipulate complex values with an anonymous lambda function similar to UDFs but with much better performance.
2.4 was a big release, so check this out for a great summary of the improvements it brings.
In the United States, there is a popular saying: “It’s five o’clock somewhere”.
In some parts of the world, 5:00 pm is the earliest time when it is socially acceptable to have a drink, or a traditional cup of tea.
Today, we will build an application based on this concept. We will build an application that, at any given time, searches through the various time zones, find out where it is five o’clock, and provide that information to the user.
It’s a neat app.
Well, if you’re running a Server Core (I hope so, for domain controllers, dns, file services and more there’s no good reason to do not so), then it’s an ease to change that. The corresponding setting has to be configured in the registry. Regedit can be opened on Server Core (there are more graphical user interfaces that are shipped with core, for example notepad.exe and timedate.cpl). I’m talking about this key:
Click through for the instructions.
Looking at the event logs I (eventually) found this:
Event ID 7000:
The MSSQLSERVER service failed to start due to the following error:
The account name is invalid or does not exist, or the password is invalid for the account name specified.
Which is bizarre – as the service account had been used for months – but after each reboot the services had to be manually started. GPO and other things had been blamed but no one could actually find out why.
I’ve never used @ naming for service accounts, so until I read this I didn’t even know it was possible. Now I’m going to forget again because apparently it’s not a good idea.
I came across a situation this week where I wanted to add the option of running an existing script, for a specific server/database combination. The script currently has no parameters and runs against all servers in the environment from a scheduled task. I wanted to make sure that behavior didn’t change. The other requirement was that if I specified Server, Database should be a mandatory parameter and vice versa.
The final solution was to add the two parameters to a parameter set and make them both mandatory. I also had to add a different DefaultParameterSet (thanks to Andrew for this idea), otherwise it defaulted to the defined parameter set, meaning the script always required both Server and Database parameters.
Click through for a demo.
TL;DR; BULK INSERT doesn’t have an easy way to specify a column list so the columns in the insert file must match the columns in the table unless you use a format file or a staging table.
As simple as they appear on the surface identity columns are fairly complicated things. BULK INSERT, on the other hand, is pretty complicated even on the surface (at least as far as I’m concerned). Because of this, the combination can get even worse. When you do an insert into a table that has an identity column you usually just ignore that column and it’s automatically filled in. However, that’s not true with BULK INSERT. Unless you use a format file, the number of columns (and their order) in the file has to match the table.
Read the whole thing.
A question came across my inbox this week which I decided to investigate: what happens if you have read only filegroups as part of your database, and you want to use Always On Availability Groups? Let’s find out.
First, I have a database with two filegroups: one read write (PRIMARY) and one read only (ROFG).
Click through for a demonstration of this, as well as a different scenario in which you might want only the read-write data on the secondary.
Since I was working on an example for my next performance class, I decided to use an example from that session. In the query below the qp.* part of the column list will return a column named query_plan which is an XML data type.
SELECT DISTINCT pa.,cp.,qp.*
FROM sys.dm_exec_cached_plans) cp
CROSS APPLY sys.dm_exec_query_plan (cp.plan_handle ) qp
CROSS APPLY sys.dm_exec_plan_attributes(cp.plan_handle) As pa
WHERE is_cache_key = 1 AND pa.attribute = ‘set_options’
However, when I add the DISTINCT keyword I get the error below.
Read on for a workaround for this.