Curated SQL – Page 322 – A Fine Slice Of SQL Server

Obscure Changes in SQL Server 2022

Published 2022-05-30 by Kevin Feasel

Aaron Bertrand has a three-parter on obscure changes in SQL Server 2022. First up we have some new information:

You can get the marketing blitz from just about anywhere, and the What’s New documentation for the bigger hitters from the technical side.
But what about the changes that aren’t on the highlight reel at Build and aren’t getting all the attention from the media blitz? I’m a details person, so I get a lot of insight looking around at the little, non-headline-generating things that have changed. I’ve shown before how to sneak a peek under the hood, and I’m going to do it again today:

Then we have feature selection changes:

You will notice some changes in the Feature Selection screen. Some of the options have been consolidated; for example, you now get R, Python, and Java with MLS, instead of picking. One item has been added: SQL Server Extension for Azure. Unfortunately, this option is checked by default in the CTP 2.0 version of setup (click to enlarge):

Finally, we have execution plan changes:

As a former technical product manager for Plan Explorer, I can’t help but snoop around in what has changed in the XSD for showplan. Even though I am not the best person to actually analyze what those changes mean, and even though changes in XSD don’t necessarily reflect changes the engine can produce right now – these usually lay the groundwork for engine changes that will happen later.

There’s quite a lot of information available for those willing to dig, as Aaron shows.

Comments closed

Bounding Box Queries in Azure Data Explorer

Published 2022-05-30 by Kevin Feasel

David Giard draws boxes:

For our current project, we are capturing into ADX the location of vehicles over time. Our customer asked us to create a function that would return all vehicles that are within a given bounding box in a given time period. This is useful information when they want to know when a vehicle returns to a building, a neighborhood, or a city.
In this article, I will show how this can be accomplished using built-in functions, the limitations of those functions, and ways to overcome those limitations.

Read on for the naive approach as well as a very interesting one using S2 cells.

Comments closed

Thoughts on Contained Availability Groups

Published 2022-05-30 by Kevin Feasel

Allan Hirt is quite happy with contained Availability Groups:

All SQL Server availability features except Always On Failover Cluster Instances (FCIs) have a “problem”: when a secondary replica/warm standby/mirror (the term is different for each feature …) takes over as the new boss, some items such as SQL Server Agent jobs, instance level logins, etc., are not there. Going back to the early days of SQL Server when log shipping was not even in the product, this was always a manual process. There are multiple ways to approach this challenge and I am not going to detail them. This “problem” is a longstanding pain point with those who are responsible for managing SQL Server.
Contained AGs solve this issue by having their own master and msdb databases synchronized as part of the AG mechanism.

Click through to see how.

Comments closed

The Benefits of Parquet

Published 2022-05-30 by Kevin Feasel

Maria Zakourdaev explains why the Parquet file format is so useful:

Parquet files organize data in columns, while CSV files organize data in rows.

Columnar storage allows much better compression so Parquet data files need less storage, 1 TB of CSV files can be converted into 100GB of parquet files – which can be a huge money saver when cloud storage is used. This also means that scanning parquet file is much faster than scanning CSV files – fewer data would be scanned and there is no need to load unneeded columns into memory and aggregations will run faster. Parquet files contain both data and metadata, information about data schema and structure. When you load the file, having metadata helps the querying tool define proper data types.

Click through for an example of when Parquet makes sense. It’s not the best format for everything—it’s a columnar file format, so writes are typically slower than row-store formats like CSV or Avro—but it and ORC are outstanding for analytical processing, not least because of the metadata these formats contain.

Comments closed

IF Branches and Dynamic SQL

Published 2022-05-30 by Kevin Feasel

Erik Darling takes us through the scenic route:

I’m going to use the example from yesterday’s post to show you what you can do to further optimize queries like this.
To make the code fit in the post a little better, I’m going to skip the IF branch for the Posts table and go straight to Votes. Using dynamic SQL here will get you the same behavior at stored procedures, though.

Read on for more detail and a wrap-up of Erik’s series on conditional branching logic and performance tuning.

Comments closed

Applying Forecasts with Known Seasonal Behavior in Power BI

Published 2022-05-30 by Kevin Feasel

Imke Feldmann begins a series on seasonality in Power BI:

Please note, that the monthly variation I am using in this example is taken from an existing table and not derived by statistical methods like the Excel FORECAST.ETS.SEASONALITY function. Here, we simply apply an existing distribution to expected future sales, but I will cover forecasts using those statistical methods in upcoming blogposts.
Todays starting point is a request to calculate how many sales to expect until the end of the year and then distribute the expected sales according to a list of monthly %, which are all different, because there is an expected seasonality in the sales:

Click through to see how this all works.

Comments closed

OCR and Character Extraction with R

Published 2022-05-27 by Kevin Feasel

Benjamin Smith analyzes a text:

Since the text that I’m using has with two columns per page, the text will need to be cropped by columns before OCR is applied. Prior to that, the .pdf files will need to be converted to .png format.

Read on to see the code for the entire process, using the tidyverse, magick, and tesseract packages.

Comments closed

The KQL Extend Operator

Published 2022-05-27 by Kevin Feasel

Robert Cain continues a series on learning KQL:

When dealing with data, it’s not at all uncommon to want to create a new column of data by performing a calculation with two other columns. A common example is taking two stored columns, the purchase price of an item, and its shipping cost, then adding them together to get a column which wasn’t stored in your dataset, the total amount of the sale.
The Kusto Query Language lets you accomplish this through the extend operator. This operator allows you to manifest new columns in your output data, based on calculations.

As always, Robert has plenty of examples available to view.

Comments closed

Server Roles in Azure SQL DB and SQL Server 2022

Published 2022-05-27 by Kevin Feasel

Andreas Wolter has some new server roles for us to use:

The new server-roles that can be assigned to server logins to enable customers to assign and delegate job functions for server-wide metadata access and access to certain management commands without requiring Server Admin or AAD Admin privileges. This helps comply with the Principle of Least Privilege and implement role separation (sometimes also interchangeably referred to as Separation of Duties).

There are seven such roles for SQL Server 2022, though we saw three of them already in Azure SQL DB. Andreas takes us through the four new roles in detail.

Comments closed

IF Branching, Local Variables, and Stored Procedures

Published 2022-05-27 by Kevin Feasel

Erik Darling continues a quest. Part 3 involves local variables:

What never seems to get a bad name, despite numerical supremacy in producing terrible results, are local variables.
In this particular scenario, I see developers use them to try to beat “parameter sniffing” to no avail.
A chorus of “it seemed to work at the time”, “I think it made things a little better”, “it worked on my machine”, and all that will ensue.
But we know the truth.

The next part is around stored procedures:

You know and I know and everyone knows that stored procedures are wonderful things that let you tune queries in magickal ways that stupid ORMs and ad hoc queries don’t really allow for.
Sorry about your incessant need to use lesser ways to manifest queries. They just don’t stack up.
But since we’re going to go high brow together, we need to learn how to make sure we don’t do anything to tarnish the image of our beloved stored procedures.

Erik notes that stored procedures are part of the solution but there’s a bit more that we need.

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Curated SQL Posts