Curated SQL – Page 506 – A Fine Slice Of SQL Server

The Benefits of Parquet

Published 2022-05-30 by Kevin Feasel

Maria Zakourdaev explains why the Parquet file format is so useful:

Parquet files organize data in columns, while CSV files organize data in rows.

Columnar storage allows much better compression so Parquet data files need less storage, 1 TB of CSV files can be converted into 100GB of parquet files – which can be a huge money saver when cloud storage is used. This also means that scanning parquet file is much faster than scanning CSV files – fewer data would be scanned and there is no need to load unneeded columns into memory and aggregations will run faster. Parquet files contain both data and metadata, information about data schema and structure. When you load the file, having metadata helps the querying tool define proper data types.

Click through for an example of when Parquet makes sense. It’s not the best format for everything—it’s a columnar file format, so writes are typically slower than row-store formats like CSV or Avro—but it and ORC are outstanding for analytical processing, not least because of the metadata these formats contain.

Comments closed

IF Branches and Dynamic SQL

Published 2022-05-30 by Kevin Feasel

Erik Darling takes us through the scenic route:

I’m going to use the example from yesterday’s post to show you what you can do to further optimize queries like this.
To make the code fit in the post a little better, I’m going to skip the IF branch for the Posts table and go straight to Votes. Using dynamic SQL here will get you the same behavior at stored procedures, though.

Read on for more detail and a wrap-up of Erik’s series on conditional branching logic and performance tuning.

Comments closed

Applying Forecasts with Known Seasonal Behavior in Power BI

Published 2022-05-30 by Kevin Feasel

Imke Feldmann begins a series on seasonality in Power BI:

Please note, that the monthly variation I am using in this example is taken from an existing table and not derived by statistical methods like the Excel FORECAST.ETS.SEASONALITY function. Here, we simply apply an existing distribution to expected future sales, but I will cover forecasts using those statistical methods in upcoming blogposts.
Todays starting point is a request to calculate how many sales to expect until the end of the year and then distribute the expected sales according to a list of monthly %, which are all different, because there is an expected seasonality in the sales:

Click through to see how this all works.

Comments closed

OCR and Character Extraction with R

Published 2022-05-27 by Kevin Feasel

Benjamin Smith analyzes a text:

Since the text that I’m using has with two columns per page, the text will need to be cropped by columns before OCR is applied. Prior to that, the .pdf files will need to be converted to .png format.

Read on to see the code for the entire process, using the tidyverse, magick, and tesseract packages.

Comments closed

The KQL Extend Operator

Published 2022-05-27 by Kevin Feasel

Robert Cain continues a series on learning KQL:

When dealing with data, it’s not at all uncommon to want to create a new column of data by performing a calculation with two other columns. A common example is taking two stored columns, the purchase price of an item, and its shipping cost, then adding them together to get a column which wasn’t stored in your dataset, the total amount of the sale.
The Kusto Query Language lets you accomplish this through the extend operator. This operator allows you to manifest new columns in your output data, based on calculations.

As always, Robert has plenty of examples available to view.

Comments closed

Server Roles in Azure SQL DB and SQL Server 2022

Published 2022-05-27 by Kevin Feasel

Andreas Wolter has some new server roles for us to use:

The new server-roles that can be assigned to server logins to enable customers to assign and delegate job functions for server-wide metadata access and access to certain management commands without requiring Server Admin or AAD Admin privileges. This helps comply with the Principle of Least Privilege and implement role separation (sometimes also interchangeably referred to as Separation of Duties).

There are seven such roles for SQL Server 2022, though we saw three of them already in Azure SQL DB. Andreas takes us through the four new roles in detail.

Comments closed

IF Branching, Local Variables, and Stored Procedures

Published 2022-05-27 by Kevin Feasel

Erik Darling continues a quest. Part 3 involves local variables:

What never seems to get a bad name, despite numerical supremacy in producing terrible results, are local variables.
In this particular scenario, I see developers use them to try to beat “parameter sniffing” to no avail.
A chorus of “it seemed to work at the time”, “I think it made things a little better”, “it worked on my machine”, and all that will ensue.
But we know the truth.

The next part is around stored procedures:

You know and I know and everyone knows that stored procedures are wonderful things that let you tune queries in magickal ways that stupid ORMs and ad hoc queries don’t really allow for.
Sorry about your incessant need to use lesser ways to manifest queries. They just don’t stack up.
But since we’re going to go high brow together, we need to learn how to make sure we don’t do anything to tarnish the image of our beloved stored procedures.

Erik notes that stored procedures are part of the solution but there’s a bit more that we need.

Comments closed

Database Filtering in SSMS

Published 2022-05-27 by Kevin Feasel

Chad Callihan doesn’t want to see all of those databases:

Life can be easier when you block out the noise and only see what you need to see. Did you know you can do that with the list of databases in SSMS? As of the SSMS 16.3 2016 release, database filtering is possible.
Let’s take a look at how to hide databases we don’t want to see.

Out of sight, out of mind.

Comments closed

End-to-End Analytics with Power BI Datamarts

Published 2022-05-27 by Kevin Feasel

Katy Young explains the value of Power BI datamarts:

A datamart can be thought of as a subject-specific data warehouse, often containing data related to one domain or line of business within an organization. Because datamarts are often aligned to a particular business use case, analysts don’t need to write complex queries over large volumes of data as they would against a more traditional data warehouse. This reduces time to insight for organizations and promotes self-service analytics by making structured data accessible to exactly the people who need it.

Read on for more information.

Comments closed

Seeding AG Replicas from Snapshots in SQL Server 2022

Published 2022-05-27 by Kevin Feasel

Anthony Nocentino is excited about using storage snapshots in SQL Server 2022:

But what if I told you that you could seed your Availability Group from a storage-based snapshot and that the re-seeding process can be nearly instantaneous?
In addition to saving you time, this process saves your database systems from the CPU, network, and disk consumption that comes with direct seeding and using backups and restores to seed.
This process described in this post is imlemented on Pure Storage’s FlashArray and works cloud scenarios on Pure’s Cloud Block Store.

Click through to see how.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Curated SQL Posts