Curated SQL – Page 801 – A Fine Slice Of SQL Server

Thinking about Temporary Stats on Snapshots

Published 2020-09-14 by Kevin Feasel

Lonny Niederstadt shares an interesting scenario:

Consider a snapshot database which is created daily. The purpose is to provide analytics reporting access while maintenance or data loads take place in the source database. In the snapshot database, analytics reports have no locking concerns from the activity in the underlying source database. And the temporary statistics provided by SQL Server, combined with the statistics inherited from the source database, provide a lot of information to the optimizer for query plan selection.
But what if significant fact tables are queried in the snapshot and leave a situation like col2 in stats_test? A column which generates an auto-created stat in the snapshot, but never gets a statistic created in the source database. Each day, the cost of creating that statistic and every statistic like it will be paid as part of the workload. Even if the underlying table is a now-stable dimension.

Click through for the demonstration.

Comments closed

The Importance of Specifying Nullability

Published 2020-09-14 by Kevin Feasel

Josh Darnell points out a case where not specifying nullability on columns can lead to confusion:

I always indicate whether a column allows NULL or not in CREATE TABLE and ALTER TABLE statements.
Or, you know, I’m in the habit of doing that.
Like, I try to be in the habit.
I’m doing my best, okay?
Anyway, it’s good to be clear about nullability, because you can run into surprises when you aren’t.

Click through for a surprise.

Comments closed

Managing the SQL Server Error Log

Published 2020-09-14 by Kevin Feasel

Guy Glantser has some tips for managing the SQL Server error log:

SQL Server maintains its own log, also called “SQL Server Error Log”. This log contains messages describing informational and error events, similar to messages that you can find in Windows logs. In fact, many of the messages found in the SQL Server Error Log can also be found in the Windows Application Log. The SQL Server Error Log is a great place to find information about what’s happening on your database server.
SQL Server uses 7 log files to store these messages. One file serves as the current log file, and every new message is written to that file. The other 6 files are archived files, and they contain previous messages. Each time SQL Server is restarted, it recycles the files. What does it mean? First, it means that a new log file is created and becomes the new current log file. Second, the oldest log file (“Archive #6”) is deleted. And third, all the other log files are pushed back one step. The previous current log file becomes “Archive #1”, the previous “Archive #1” log file becomes “Archive #2”, and so on.

Read on to see how you can change this, manage the size of log files, and retain data for a longer time.

Comments closed

Performance Impacts of Computed Columns

Published 2020-09-14 by Kevin Feasel

Robert Sheldon takes us through a few scenarios:

In this article, I walk you through the process of applying these strategies so you have a better sense of the available options. For the examples, I created four similar tables and populated them with identical data, which comes from the WideWorldImporters sample database. Each table includes the same computed column, with the column persisted in two tables and indexed in two tables, resulting in the following mix:
– The Orders1 table includes a non-persisted computed column.
– The Orders2 table includes a persisted computed column.
– The Orders3 table includes an indexed, non-persisted computed column.
– The Orders4 table includes an indexed, persisted computed column.
For each table, I show you the execution plan that’s generated when querying the computed column. The column’s expression is a relatively simple one, and the data set very small. Even so, this should be enough to demonstrate the principles of creating persistent and indexed computed columns and how they can help address performance-related issues.

Click through for Robert’s results.

Comments closed

Finding the earliest and latest date across all Power BI Tables

Published 2020-09-14 by Kevin Feasel

Soheil Bakhshi has an interesting challenge:

Many of you may already thought that we can use CALENDARAUTO() in DAX and we are good to go. Well, that’s not quite right. In many cases there are some Date or DateTime columns that must not be considered in our Date dimension. Like Birth Date or Deceased Date. More on this later in this post.
In this post I share a piece of code I wrote for myself. I was in a situation to identify the Start Date and the End Date of the date dimension many times, so I thought it might help you as well.

Read on for a solution when not all date columns are relevant.

Comments closed

Passing an Array of Arrays as a Parameter in Azure Data Factory

Published 2020-09-14 by Kevin Feasel

Rayis Imayev has a list for us:

In my previous blog post – Setting default values for Array parameters/variables in Azure Data Factory, I had helped myself to remember that arrays could be passed as parameters to my Azure Data Factory (ADF) pipelines. This time I’m helping myself to remember that an array of other arrays can also exist as ADF pipeline parameters’ values.

Read on for the example.

Comments closed

Indexing S3 Data with CDP Data Hub

Published 2020-09-11 by Kevin Feasel

Eva Nahari, et al, show how to perform indexing and serving of S3 data in Cloudera Data Platform:

This blog post will present a simple “hello world” kind of example on how to get data that is stored in S3 indexed and served by an Apache Solr service hosted in a Data Discovery and Exploration cluster in CDP. For the curious: DDE is a pre-templeted Solr-optimized cluster deployment option in CDP, and recently released in tech preview. We will only cover AWS and S3 environments in this blog. Azure and ADLS deployment options are also available in tech preview, but will be covered in a future blog post.
We will depict the simplest scenario to make it easy to get started. There are of course more advanced data pipeline setups and more rich schemas possible, but this is a good starting point for a beginner.

Read on for the instructions.

Comments closed

A Review of KaDeck: Kafka Management Tool

Published 2020-09-11 by Kevin Feasel

Guy Shilo reviews KaDeck:

If you want the real web version then you need to register in their website and create a team first. Yes, KaDeck is aimed at teams of people working together. You create a team online and shortly you receive a mail with team id and a secret key that you will use when running KaDeck web..
It seems that there is no host installation of KaDeck right now (rpm package, binaries or so) but only a Docker image. This forces the user to run it in Docker or a Kubernetes cluster (or one of it’s commercial distributions such as OpenShift). Another thing is that checks your license online each time you start the container. If you want to use t offline you have to do a process of offline activation. You can reach it from the administration menu.

Click through for Guy’s thoughts on the product.

Comments closed

Building an HTML Report with Failed Agent Job Info

Published 2020-09-11 by Kevin Feasel

Garry Bargsley continues a series on alerting when SQL Agent jobs fail:

Welcome to Part 2 in the series about SQL Server Agent Job Failures. In this part you are going to learn how to build an HTML report with Failed Agent Job information and send the report via e-mail to the interested team(s).
Let’s get started putting the pieces together to build our report.

Click through for the details, as well as a full code sample in Powershell.

Comments closed

Spark SQL in Delta Lake

Published 2020-09-11 by Kevin Feasel

Kundan Kumarr walks us through some of the basic SQL operations you can perform with Delta Lake in Apache Spark:

Nowadays Delta lake is a buzz word in the Big Data world, especially among the spark developers because it relegates lots of issues found in the Big Data domain. Delta Lake is an open-source storage layer that brings reliability to data lakes. Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing. It is evolving day by day and adds cool features in its every release. On 19th June 2020, Delta lake version 0.7.0 was released and this is the first release on Spark 3.x. This release involves important key features that can make the spark developer’s work easy.
One of the interesting key features in this release is the support for metastore-defined tables and SQL DDLs. So now we can define Delta tables in the Hive metastore and use the table name in all SQL operations. We can perform SQL DDLs to create tables, insert into tables, explicitly alter the schema of the tables, and so on. So in this blog, we will learn how we can perform SQL DDLs/DMLS/DQL in Delta Lake 0.7.0.

Click through for the examples.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Curated SQL Posts